Results 1  10
of
62
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract

Cited by 282 (15 self)
 Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacianbased methods in a statistical setting.
On the Nyström Method for Approximating a Gram Matrix for Improved KernelBased Learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... A problem for many kernelbased methods is that the amount of computation required to find the solution scales as O(n³), where n is the number of training examples. We develop and analyze an algorithm to compute an easilyinterpretable lowrank approximation to an nn Gram matrix G such that compu ..."
Abstract

Cited by 108 (7 self)
 Add to MetaCart
A problem for many kernelbased methods is that the amount of computation required to find the solution scales as O(n³), where n is the number of training examples. We develop and analyze an algorithm to compute an easilyinterpretable lowrank approximation to an nn Gram matrix G such that computations of interest may be performed more rapidly. The approximation is of the form G k = CW , where C is a matrix consisting of a small number c of columns of G and W k is the best rankk approximation to W , the matrix formed by the intersection between those c columns of G and the corresponding c rows of G. An important aspect of the algorithm is the probability distribution used to randomly sample the columns; we will use a judiciouslychosen and datadependent nonuniform probability distribution. Let F denote the spectral norm and the Frobenius norm, respectively, of a matrix, and let G k be the best rankk approximation to G. We prove that by choosing O(k/# ) columns both in expectation and with high probability, for both # = 2, F , and for all k : 0 rank(W ). This approximation can be computed using O(n) additional space and time, after making two passes over the data from external storage. The relationships between this algorithm, other related matrix decompositions, and the Nyström method from integral equation theory are discussed.
Diffusion Wavelets
, 2004
"... We present a multiresolution construction for efficiently computing, compressing and applying large powers of operators that have high powers with low numerical rank. This allows the fast computation of functions of the operator, notably the associated Green’s function, in compressed form, and their ..."
Abstract

Cited by 72 (12 self)
 Add to MetaCart
We present a multiresolution construction for efficiently computing, compressing and applying large powers of operators that have high powers with low numerical rank. This allows the fast computation of functions of the operator, notably the associated Green’s function, in compressed form, and their fast application. Classes of operators satisfying these conditions include diffusionlike operators, in any dimension, on manifolds, graphs, and in nonhomogeneous media. In this case our construction can be viewed as a farreaching generalization of Fast Multipole Methods, achieved through a different point of view, and of the nonstandard wavelet representation of CalderónZygmund and pseudodifferential operators, achieved through a different multiresolution analysis adapted to the operator. We show how the dyadic powers of an operator can be used to induce a multiresolution analysis, as in classical LittlewoodPaley and wavelet theory, and we show how to construct, with fast and stable algorithms, scaling function and wavelet bases associated to this multiresolution analysis, and the corresponding downsampling operators, and use them to compress the corresponding powers of the operator. This allows to extend multiscale signal processing to general spaces (such as manifolds and graphs) in a very natural way, with corresponding fast algorithms.
Protovalue functions: A laplacian framework for learning representation and control in markov decision processes
 Journal of Machine Learning Research
, 2006
"... This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by d ..."
Abstract

Cited by 66 (9 self)
 Add to MetaCart
This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by diagonalizing symmetric diffusion operators (ii) A specific instantiation of this approach where global basis functions called protovalue functions (PVFs) are formed using the eigenvectors of the graph Laplacian on an undirected graph formed from state transitions induced by the MDP (iii) A threephased procedure called representation policy iteration comprising of a sample collection phase, a representation learning phase that constructs basis functions from samples, and a final parameter estimation phase that determines an (approximately) optimal policy within the (linear) subspace spanned by the (current) basis functions. (iv) A specific instantiation of the RPI framework using leastsquares policy iteration (LSPI) as the parameter estimation method (v) Several strategies for scaling the proposed approach to large discrete and continuous state spaces, including the Nyström extension for outofsample interpolation of eigenfunctions, and the use of Kronecker sum factorization to construct compact eigenfunctions in product spaces such as factored MDPs (vi) Finally, a series of illustrative discrete and continuous control tasks, which both illustrate the concepts and provide a benchmark for evaluating the proposed approach. Many challenges remain to be addressed in scaling the proposed framework to large MDPs, and several elaboration of the proposed framework are briefly summarized at the end.
RELATIVEERROR CUR MATRIX DECOMPOSITIONS
 SIAM J. MATRIX ANAL. APPL
, 2008
"... Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the ..."
Abstract

Cited by 37 (9 self)
 Add to MetaCart
Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the input data. In this paper, we propose and study matrix approximations that are explicitly expressed in terms of a small number of columns and/or rows of the data matrix, and thereby more amenable to interpretation in terms of the original data. Our main algorithmic results are two randomized algorithms which take as input an m × n matrix A and a rank parameter k. In our first algorithm, C is chosen, and we let A ′ = CC + A, where C + is the Moore–Penrose generalized inverse of C. In our second algorithm C, U, R are chosen, and we let A ′ = CUR. (C and R are matrices that consist of actual columns and rows, respectively, of A, and U is a generalized inverse of their intersection.) For each algorithm, we show that with probability at least 1 − δ, ‖A − A ′ ‖F ≤ (1 + ɛ) ‖A − Ak‖F, where Ak is the “best ” rankk approximation provided by truncating the SVD of A, and where ‖X‖F is the Frobenius norm of the matrix X. The number of columns of C and rows of R is a lowdegree polynomial in k, 1/ɛ, and log(1/δ). Both the Numerical Linear Algebra community and the Theoretical Computer Science community have studied variants
ShapeGoogle: geometric words and expressions for invariant shape retrieval
, 2010
"... The computer vision and pattern recognition communities have recently witnessed a surge of featurebased methods in object recognition and image retrieval applications. These methods allow representing images as collections of “visual words ” and treat them using text search approaches following the ..."
Abstract

Cited by 33 (5 self)
 Add to MetaCart
The computer vision and pattern recognition communities have recently witnessed a surge of featurebased methods in object recognition and image retrieval applications. These methods allow representing images as collections of “visual words ” and treat them using text search approaches following the “bag of features ” paradigm. In this paper, we explore analogous approaches in the 3D world applied to the problem of nonrigid shape retrieval in large databases. Using multiscale diffusion heat kernels as “geometric words”, we construct compact and informative shape descriptors by means of the “bag of features ” approach. We also show that considering pairs of “geometric words ” (“geometric expressions”) allows creating spatiallysensitive bags of features with better discriminativity. Finally, adopting metric learning approaches, we show that shapes can be efficiently represented as binary codes. Our approach achieves stateoftheart results on the SHREC 2010 largescale shape retrieval benchmark.
Threedimensional Point Cloud Recognition via Distributions of Geometric Distances
, 2008
"... A geometric framework for the recognition of threedimensional objects represented by point clouds is introduced in this paper. The proposed approach is based on comparing distributions of intrinsic measurements on the point cloud. In particular, intrinsic distances are exploited as signatures for r ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
A geometric framework for the recognition of threedimensional objects represented by point clouds is introduced in this paper. The proposed approach is based on comparing distributions of intrinsic measurements on the point cloud. In particular, intrinsic distances are exploited as signatures for representing the point clouds. The first signature we introduce is the histogram of pairwise diffusion distances between all points on the shape surface. These distances represent the probability of traveling from one point to another in a fixed number of random steps, the average intrinsic distances of all possible paths of a given number of steps between the two points. This signature is augmented by the histogram of the actual pairwise geodesic distances in the point cloud, the distribution of the ratio between these two distances, as well as the distribution of the number of times each point lies on the shortest paths between other points. These signatures are not only geometric but also invariant to bends. We further augment these signatures by the distribution of a curvature function and the distribution of a curvature weighted distance. These
Graph laplacians and their convergence on random neighborhood graphs
 Journal of Machine Learning Research
, 2006
"... Given a sample from a probability measure with support on a submanifold in Euclidean space one can construct a neighborhood graph which can be seen as an approximation of the submanifold. The graph Laplacian of such a graph is used in several machine learning methods like semisupervised learning, d ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
Given a sample from a probability measure with support on a submanifold in Euclidean space one can construct a neighborhood graph which can be seen as an approximation of the submanifold. The graph Laplacian of such a graph is used in several machine learning methods like semisupervised learning, dimensionality reduction and clustering. In this paper we determine the pointwise limit of three different graph Laplacians used in the literature as the sample size increases and the neighborhood size approaches zero. We show that for a uniform measure on the submanifold all graph Laplacians have the same limit up to constants. However in the case of a nonuniform measure on the submanifold only the so called random walk graph Laplacian converges to the weighted LaplaceBeltrami operator.
Regularization on graphs with functionadapted diffusion process
, 2006
"... Harmonic analysis and diffusion on discrete data has been shown to lead to stateoftheart algorithms for machine learning tasks, especially in the context of semisupervised and transductive learning. The success of these algorithms rests on the assumption that the function(s) to be studied (learn ..."
Abstract

Cited by 23 (5 self)
 Add to MetaCart
Harmonic analysis and diffusion on discrete data has been shown to lead to stateoftheart algorithms for machine learning tasks, especially in the context of semisupervised and transductive learning. The success of these algorithms rests on the assumption that the function(s) to be studied (learned, interpolated, etc.) are smooth with respect to the geometry of the data. In this paper we present a method for modifying the given geometry so the function(s) to be studied are smoother with respect to the modified geometry, and thus more amenable to treatment using harmonic analysis methods. Among the many possible applications, we consider the problems of image denoising and transductive classification. In both settings, our approach improves on standard diffusion based methods.