## Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning and data set parameterization (2006)

### Cached

### Download Links

- [www.cs.cmu.edu]
- [www.ece.duke.edu]
- [www.cs.pitt.edu]
- [people.ee.duke.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | IEEE Transactions on Pattern Analysis and Machine Intelligence |

Citations: | 95 - 5 self |

### BibTeX

@ARTICLE{Lafon06diffusionmaps,

author = {Stéphane Lafon and Ann B. Lee},

title = {Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning and data set parameterization},

journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},

year = {2006},

volume = {28},

pages = {1393--1403}

}

### Years of Citing Articles

### OpenURL

### Abstract

We provide evidence that non-linear dimensionality reduction, clustering and data set parameterization can be solved within one and the same framework. The main idea is to define a system of coordinates with an explicit metric that reflects the connectivity of a given data set and that is robust to noise. Our construction, which is based on a Markov random walk on the data, offers a general scheme of simultaneously reorganizing and subsampling graphs and arbitrarily shaped data sets in high dimensions using intrinsic geometry. We show that clustering in embedding spaces is equivalent to compressing operators. The objective of data partitioning and clustering is to coarse-grain the random walk on the data while at the same time preserving a diffusion operator for the intrinsic geometry or connectivity of the data set up to some accuracy. We show that the quantization distortion in diffusion space bounds the error of compression of the operator, thus giving a rigorous justification for k-means clustering in diffusion space and a precise measure of the performance of general clustering algorithms.

### Citations

8980 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ... By doing so, we lay down a solid foundation for subsequent data analysis. All the geometry of the data set is captured in a diffusion kernel. However, unlike SVM and so-called “kernel methods” [18], =-=[19]-=-, [20], we are working with the embedding coordinates explicitly. Our method is completely data driven: Both the data representation and the kernel are computed directly on the data. The notion of a d... |

2590 | Normalized cuts and image segmentation
- Shi, Malik
- 2000
(Show Context)
Citation Context ...spectral properties (eigenvectors and eigenvalues) of a pairwise similarity matrix for geometric data analysis. These methods can roughly be divided into two main categories: spectral graph cuts [1], =-=[2]-=-, [3] and eigenmaps [4], [5], [6], [7]. The two methodologies were originally developed for different types of applications: segmentation and partitioning of graphs versus locality-preserving embeddin... |

2284 | A tutorial on support vector machines for pattern recognition
- Burges
- 1998
(Show Context)
Citation Context ...itamin TABLE II EXAMPLES OF DIFFUSION CENTERS AND WORDS IN A CLUSTER All the geometry of the data set is captured in a diffusion kernel. However, unlike SVM and so called “kernel methods” [18], [19], =-=[20]-=-, we are working with the embedding coordinates explicitly. Our method is completely data driven: both the data representation and the kernel are computed directly on the data. The notion of a distanc... |

1688 |
A Global Geometric Framework for Nonlinear Dimensionality
- Tenenbaum, Silva, et al.
(Show Context)
Citation Context ...stering and non-linear dim. reduction data set parameterization? explicit metric in embedding space? Spectral graphs [1], [2], [3] not directly addressed no Eigenmaps [4], [5], [6], [7] yes no Isomap =-=[13]-=- yes yes Markov random walk learning [11] no yes Diffusion maps yes yes TABLE I A SIMPLIFIED TABLE OF DIFFERENT METHODS FOR CLUSTERING AND NON-LINEAR DIMENSIONALITY REDUCTION close if they are connect... |

1614 | Nonlinear Dimensionality Reduction by locally linear embedding
- Roweis, Saul
(Show Context)
Citation Context ...genvectors and eigenvalues) of a pairwise similarity matrix for geometric data analysis. These methods can roughly be divided into two main categories: spectral graph cuts [1], [2], [3] and eigenmaps =-=[4]-=-, [5], [6], [7]. The two methodologies were originally developed for different types of applications: segmentation and partitioning of graphs versus locality-preserving embeddings of data sets, respec... |

1048 | Nonlinear component analysis as a kernel eigenvalue problem
- Schölkopf, Smola, et al.
- 1998
(Show Context)
Citation Context ...ty, sugar, vitamin TABLE II EXAMPLES OF DIFFUSION CENTERS AND WORDS IN A CLUSTER All the geometry of the data set is captured in a diffusion kernel. However, unlike SVM and so called “kernel methods” =-=[18]-=-, [19], [20], we are working with the embedding coordinates explicitly. Our method is completely data driven: both the data representation and the kernel are computed directly on the data. The notion ... |

988 |
Spectral Graph Theory
- Chung
- 1997
(Show Context)
Citation Context ...e diffusion framework. Suppose that fx1; ...;xng is a data set of points, and assume that these points form the nodes of a weighted graph with weight function wðx;yÞ. In the graph-theoretic approach =-=[8]-=- to data partitioning, one seeks to divide the set of vertices into disjoint sets, where by some measure, the similarity among the vertices in a set is high and the similarity across different sets is... |

842 | Least squares quantization in pcm
- Lloyd
- 1982
(Show Context)
Citation Context ... to solving the problem of quantizing the diffusion space with k codewords based on the mass distribution of the sample set Ψt(Ω). This optimization issue is often addressed via the k-means algorithm =-=[16]-=- which guarantees convergence towards a local minimum: 1) Step 0: initialize the partition {S (0) i }1≤i≤k at random in the diffusion space, 2) For p > 0, update the partition according to S (p) i whe... |

734 | Laplacian Eigenmaps for Dimensionality Reduction and
- Belkin
(Show Context)
Citation Context ... and eigenvalues) of a pairwise similarity matrix for geometric data analysis. These methods can roughly be divided into two main categories: spectral graph cuts [1], [2], [3] and eigenmaps [4], [5], =-=[6]-=-, [7]. The two methodologies were originally developed for different types of applications: segmentation and partitioning of graphs versus locality-preserving embeddings of data sets, respectively. Be... |

319 | Segmentation using eigenvectors: A unifying view
- Weiss
- 1999
(Show Context)
Citation Context ... the spectral properties (eigenvectors and eigenvalues) of a pairwise similarity matrix for geometric data analysis. These methods can roughly be divided into two main categories: spectral graph cuts =-=[1]-=-, [2], [3] and eigenmaps [4], [5], [6], [7]. The two methodologies were originally developed for different types of applications: segmentation and partitioning of graphs versus locality-preserving emb... |

206 | Partially labeled classification with markov random walks
- Szummer, Jaakkola
- 2006
(Show Context)
Citation Context ...s, unlike the geodesic distance, or the shortest path of a graph, very robust to noise. Similar distance measures have previously been suggested in clustering and data classification, see for example =-=[11]-=-. However, the use of such probabilistic distance measures in data parameterization is completely new. This paper unifies various ideas in eigenmaps, spectral cuts and Markov random walk learning (see... |

166 | A Random Walks View of Spectral Segmentation
- Meila, Shi
- 2001
(Show Context)
Citation Context ...ral properties (eigenvectors and eigenvalues) of a pairwise similarity matrix for geometric data analysis. These methods can roughly be divided into two main categories: spectral graph cuts [1], [2], =-=[3]-=- and eigenmaps [4], [5], [6], [7]. The two methodologies were originally developed for different types of applications: segmentation and partitioning of graphs versus locality-preserving embeddings of... |

156 |
Diffusion maps
- Coifman, Lafon
- 2006
(Show Context)
Citation Context ... ↦→ f(x) = (f1(x), . . . , fp(x)) 3sthat preserves the local neighborhood information. Several algorithms, such as LLE [4], Laplacian eigenmaps [6], Hessian eigenmaps [7], LTSA [5] and diffusion maps =-=[9]-=-, [10], all aim at minimizing distortions of the form Q(f) = � i Qi(f) where Qi(f) is a symmetric, positive semi-definite quadratic form that measures local variations of f around xi. The p-dimensiona... |

149 | Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps
- Coifman, Lafon, et al.
(Show Context)
Citation Context ...(x) = (f1(x), . . . , fp(x)) 3sthat preserves the local neighborhood information. Several algorithms, such as LLE [4], Laplacian eigenmaps [6], Hessian eigenmaps [7], LTSA [5] and diffusion maps [9], =-=[10]-=-, all aim at minimizing distortions of the form Q(f) = � i Qi(f) where Qi(f) is a symmetric, positive semi-definite quadratic form that measures local variations of f around xi. The p-dimensional embe... |

136 | Principal manifolds and nonlinear dimension reduction via local tangent space alignment
- Zhang, Zha
- 2002
(Show Context)
Citation Context ...ctors and eigenvalues) of a pairwise similarity matrix for geometric data analysis. These methods can roughly be divided into two main categories: spectral graph cuts [1], [2], [3] and eigenmaps [4], =-=[5]-=-, [6], [7]. The two methodologies were originally developed for different types of applications: segmentation and partitioning of graphs versus locality-preserving embeddings of data sets, respectivel... |

122 | Kernel K-means: spectral clustering and normalized cuts. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 551–556 (2004
- Dhillon, Guan, et al.
- 2010
(Show Context)
Citation Context ... rigorous justification for k-means clustering in diffusion space. The latter method is, by construction, useful when dealing with data in high dimensions, and can (as in any kernel k-means algorithm =-=[12]-=-) be applied to arbitrarily shaped clusters and abstract graphs.sLAFON AND LEE: DIFFUSION MAPS AND COARSE-GRAINING: A UNIFIED FRAMEWORK FOR DIMENSIONALITY REDUCTION, GRAPH... 1395 The organization of ... |

99 | Hessian eigenmaps: new locally linear embedding techniques for high-dimensional data
- Donoho, Grimes
(Show Context)
Citation Context ...eigenvalues) of a pairwise similarity matrix for geometric data analysis. These methods can roughly be divided into two main categories: spectral graph cuts [1], [2], [3] and eigenmaps [4], [5], [6], =-=[7]-=-. The two methodologies were originally developed for different types of applications: segmentation and partitioning of graphs versus locality-preserving embeddings of data sets, respectively. Below w... |

88 |
Spectral Graph Theory. Number 92
- Chung
- 1994
(Show Context)
Citation Context ...fusion framework. Suppose that X = {x1, ..., xn} is a data set of points, and assume that these points form the nodes of a weighted graph with weight function w(x, y). In the graph-theoretic approach =-=[8]-=- to data partitioning, one seeks to divide the set of vertices into disjoint sets, where by some measure, the similarity among the vertices in a set is high, and the similarity across different sets i... |

88 |
Nonlinear Dimensionality Reduction by Locally
- Roweis, Saul
- 2000
(Show Context)
Citation Context ...genvectors and eigenvalues) of a pairwise similarity matrix for geometric data analysis. These methods can roughly be divided into two main categories: spectral graph cuts [1], [2], [3] and eigenmaps =-=[4]-=-, [5], [6], [7]. The two methodologies were originally developed for different types of applications: segmentation and partitioning of graphs versus locality-preserving embeddings of data sets, respec... |

73 | Diffusion wavelets
- Coifman, Maggioni
- 2006
(Show Context)
Citation Context ...o to take powers of the transition matrix. The time parameter t sets the scale of the analysis. Note also that by using different values of t, we are able to perform a multiscale analysis of the data =-=[22]-=-, [23]. 20st λ ψ2 2 x 10 6 −6 4 2 0 −2 −4 fossil warming underwater africa specimen ecosystem geologic oceanographer noaa wind dioxide farmer antarctica extinct reef debris bird sankvent crater skelet... |

60 | Diffusion maps, spectral clustering and reaction coordinates of dynamical systems
- Nadler, Lafon, et al.
(Show Context)
Citation Context ...istances to the spectral properties of the random walk — and thereby, as we will see in the next section, connect Markov random walk learning on graphs with data parameterization via eigenmaps. As in =-=[14]-=-, we will define the “diffusion distance” Dt between x and y as the weighted L 2 distance where the “weights” 1 φ0(x) density. D 2 t (x, z) = �pt(x, ·) − pt(z, ·)� 2 ω = � (pt(x, y) − pt(z, y)) 2 , (3... |

48 |
The Nature of Statistical Learning Theory, 2 nd Edition
- Vapnik
- 1997
(Show Context)
Citation Context ...gar, vitamin TABLE II EXAMPLES OF DIFFUSION CENTERS AND WORDS IN A CLUSTER All the geometry of the data set is captured in a diffusion kernel. However, unlike SVM and so called “kernel methods” [18], =-=[19]-=-, [20], we are working with the embedding coordinates explicitly. Our method is completely data driven: both the data representation and the kernel are computed directly on the data. The notion of a d... |

18 |
Diffusion wavelets. Applied and Computational Harmonic Analysis
- Coifman, Maggioni
- 2006
(Show Context)
Citation Context ...also to take powers of the transition matrix. The time parametertsets the scale of the analysis. Note also that by using different values oft, we are able to perform a multiscale analysis of the data =-=[22]-=-, [23]. Our other contribution is a novel scheme for simultaneous dimensionality reduction, parameterization, and subsampling of data sets. We show that clustering in embedding spaces is equivalent to... |

15 |
A novel way of computing similarities between nodes of a graph, with application to collaborative filtering and subspace projection of the graph nodes
- Saerens, Fouss
- 2006
(Show Context)
Citation Context ...g and dimensionality reduction. In addition, the diffusion framework makes it possible to directly connect grouping in embedding spaces to spectral graph clustering and data analysis by Markov chains =-=[21]-=-, [11]. In a sense, we are extending Meila and Shi’s work [3] from lumpable Markov chains and piece-wise constant eigenvectors to the general case of arbitrary Markov chains and arbitrary eigenvectors... |

4 |
Iterative Denoising for Cross-Corpus Discovery
- Priebe, Marchette, et al.
- 2004
(Show Context)
Citation Context ...2 50 Diffusion distance 0 0 0.5 1 1.5 2 Fig. 4. Distribution of the geodesic (top) and diffusion (bottom) distances. Each distribution was rescaled in order to have a mean equal to 1. categories (see =-=[17]-=-). Our goal is to cluster words based on their distribution over the documents. From the database, we extract the 20 most common words in each document, which corresponds to 3218 unique words total. O... |