Results 1 - 10
of
96
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions
- IN ICML
, 2003
"... An approach to semi-supervised learning is proposed that is based on a Gaussian random field model. Labeled and unlabeled data are represented as vertices in a weighted graph, with edge weights encoding the similarity between instances. The learning ..."
Abstract
-
Cited by 325 (13 self)
- Add to MetaCart
An approach to semi-supervised learning is proposed that is based on a Gaussian random field model. Labeled and unlabeled data are represented as vertices in a weighted graph, with edge weights encoding the similarity between instances. The learning
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract
-
Cited by 170 (11 self)
- Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacian-based methods in a statistical setting.
Transductive Learning via Spectral Graph Partitioning
- In ICML
, 2003
"... We present a new method for transductive learning, which can be seen as a transductive version of the k nearest-neighbor classifier. ..."
Abstract
-
Cited by 152 (0 self)
- Add to MetaCart
We present a new method for transductive learning, which can be seen as a transductive version of the k nearest-neighbor classifier.
Learning segmentation by random walks
- In Advances in Neural Information Processing
, 2000
"... Abstract We present a new view of image segmentation by pairwise similarities. We interpret the similarities as edge flows in a Markov random walk and study the eigenvalues and eigenvectors of the walk's transition matrix. This interpretation shows that spectral methods for clustering and segmentati ..."
Abstract
-
Cited by 78 (4 self)
- Add to MetaCart
Abstract We present a new view of image segmentation by pairwise similarities. We interpret the similarities as edge flows in a Markov random walk and study the eigenvalues and eigenvectors of the walk's transition matrix. This interpretation shows that spectral methods for clustering and segmentation have a probabilistic foundation. In particular, we prove that the Normalized Cut method arises naturally from our framework. Finally, the framework provides a principled method for learning the similarity function as a combination of features. 1 Introduction Among the most successful methods in image segmentation combine a global optimality segmentation criterion with local similarity features[3]. Similarity between two pixels i; j is defined as a positive function Sij depending on the local image properties of the pixels(e.g. color, texture, edge flow). Local features are not only computationally convenient, they are also supported by neurological evidence about the human perception of shapes.
Event-based analysis of video
- In Proc. CVPR
, 2001
"... Dynamic events can be regarded as long-term temporal objects, which are characterized by spatiotemporal features at multiple temporal scales. Based on this, we design a simple statistical distance measure between video sequences (possibly of different lengths) based on their behavioral content. This ..."
Abstract
-
Cited by 68 (2 self)
- Add to MetaCart
Dynamic events can be regarded as long-term temporal objects, which are characterized by spatiotemporal features at multiple temporal scales. Based on this, we design a simple statistical distance measure between video sequences (possibly of different lengths) based on their behavioral content. This measure is non-parametric and can thus handle a wide range of dynamic events. Having an event-based distance measure between sequences, we use it for a variety of tasks, including: (i) event-based search and indexing into long video sequences (for “intelligent fast forward”), (ii) temporal segmentation of long video sequences based on behavioral content, and (iii) clustering events within long video sequence into event-consistent sub-sequences (i.e., into event-consistent “clusters”). These tasks are performed without prior knowledge of the types of events, their models, or their temporal extents. Our simple event representation and associated distance measure supports event-based search and indexing even when only one short example-clip is available. However, when multiple example-clips of the same event are available (either as a result of the clustering process, or supplied manually), these can be used to refine the event representation, the associated distance measure, and accordingly the quality of the detection and clustering process. 1
The Second Eigenvalue of the Google Matrix
, 2003
"... We determine analytically the modulus of the second eigenvalue for the web hyperlink matrix used by Google for computing PageRank. Specifically, we prove the following statement: "For any matrix A = [cP + (1 , where P is an n n row-stochastic matrix, E is a nonnegative nn rank-one row-st ..."
Abstract
-
Cited by 62 (8 self)
- Add to MetaCart
We determine analytically the modulus of the second eigenvalue for the web hyperlink matrix used by Google for computing PageRank. Specifically, we prove the following statement: "For any matrix A = [cP + (1 , where P is an n n row-stochastic matrix, E is a nonnegative nn rank-one row-stochastic matrix, and 0 1, the second eigenvalue of A has modulus |#2 | # c. Furthermore, if P has at least two irreducible closed subsets, the second eigenvalue #2 = c." This statement has implications for the convergence rate of the standard PageRank algorithm as the web scales, for the stability of PageRank to perturbations to the link structure of the web, for the detection of Google spammers, and for the design of algorithms to speed up PageRank.
Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning and data set parameterization
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2006
"... We provide evidence that non-linear dimensionality reduction, clustering and data set parameterization can be solved within one and the same framework. The main idea is to define a system of coordinates with an explicit metric that reflects the connectivity of a given data set and that is robust to ..."
Abstract
-
Cited by 58 (5 self)
- Add to MetaCart
We provide evidence that non-linear dimensionality reduction, clustering and data set parameterization can be solved within one and the same framework. The main idea is to define a system of coordinates with an explicit metric that reflects the connectivity of a given data set and that is robust to noise. Our construction, which is based on a Markov random walk on the data, offers a general scheme of simultaneously reorganizing and subsampling graphs and arbitrarily shaped data sets in high dimensions using intrinsic geometry. We show that clustering in embedding spaces is equivalent to compressing operators. The objective of data partitioning and clustering is to coarse-grain the random walk on the data while at the same time preserving a diffusion operator for the intrinsic geometry or connectivity of the data set up to some accuracy. We show that the quantization distortion in diffusion space bounds the error of compression of the operator, thus giving a rigorous justification for k-means clustering in diffusion space and a precise measure of the performance of general clustering algorithms.
The Principal Components Analysis of a Graph, and its Relationships to Spectral Clustering
- Proceedings of the 15th European Conference on Machine Learning (ECML 2004). Lecture Notes in Artificial Intelligence
, 2004
"... This work presents a novel procedure for computing (1) distances between nodes of a weighted, undirected, graph, called the Euclidean Commute Time Distance (ECTD), and (2) a subspace projection of the nodes of the graph that preserves as much variance as possible, in terms of the ECTD -- a princi ..."
Abstract
-
Cited by 50 (13 self)
- Add to MetaCart
This work presents a novel procedure for computing (1) distances between nodes of a weighted, undirected, graph, called the Euclidean Commute Time Distance (ECTD), and (2) a subspace projection of the nodes of the graph that preserves as much variance as possible, in terms of the ECTD -- a principal components analysis of the graph. It is based on a Markov-chain model of random walk through the graph. The model assigns transition probabilities to the links between nodes, so that a random walker can jump from node to node. A quantity, called the average commute time, computes the average time taken by a random walker for reaching node j when starting from node i, and coming back to node i. The square root of this quantity, the ECTD, is a distance measure between any two nodes, and has the nice property of decreasing when the number of paths connecting two nodes increases and when the "length" of any path decreases. The ECTD can be computed from the pseudoinverse of the Laplacian matrix of the graph, which is a kernel. We finally define the Principal Components Analysis (PCA) of a graph as the subspace projection that preserves as much variance as possible, in terms of the ECTD. This graph PCA has some interesting links with spectral graph theory, in particular spectral clustering.
A Comparison of Spectral Clustering Algorithms
, 2003
"... Spectral Clustering has become quite popular over the last few years and several new algorithms have been published. In this paper, we compare several of the best-known algorithms from the point of view of clustering quality over arti cial and real datasets. We implement many variations of the ex ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
Spectral Clustering has become quite popular over the last few years and several new algorithms have been published. In this paper, we compare several of the best-known algorithms from the point of view of clustering quality over arti cial and real datasets. We implement many variations of the existing spectral algorithms and compare their performance to see which features are more important. We also demonstrate that spectral methods show competitive performance on real dataset with respect to existing methods.
A Unifying Theorem for Spectral Embedding and Clustering
, 2003
"... Spectral methods use selected eigenvectors of a data affinity matrix to obtain a data representation that can be trivially clustered or embedded in a low-dimensional space. We present a theorem that explains, for broad classes of affinity matrices and eigenbases, why this works: For successive ..."
Abstract
-
Cited by 45 (0 self)
- Add to MetaCart
Spectral methods use selected eigenvectors of a data affinity matrix to obtain a data representation that can be trivially clustered or embedded in a low-dimensional space. We present a theorem that explains, for broad classes of affinity matrices and eigenbases, why this works: For successively smaller eigenbases (i.e., using fewer and fewer of the affinity matrix's dominant eigenvalues and eigenvectors), the angles between "similar" vectors in the new representation shrink while the angles between "dissimilar" vectors grow. Specifically, the sum of the squared cosines of the angles is strictly increasing as the dimensionality of the representation decreases. Thus spectral methods work because the truncated eigenbasis amplifies structure in the data so that any heuristic post-processing is more likely to succeed. We use this result to construct a nonlinear dimensionality reduction (NLDR) algorithm for data sampled from manifolds whose intrinsic coordinate system has linear and cyclic axes, and a novel clustering-by-projections algorithm that requires no post-processing and gives superior performance on "challenge problems" from the recent literature.

