Results 1 - 10
of
54
Using the Nyström Method to Speed Up Kernel Machines
- Advances in Neural Information Processing Systems 13
, 2001
"... A major problem for kernel-based predictors (such as Support Vector Machines and Gaussian processes) is that the amount of computation required to find the solution scales as O(n ), where n is the number of training examples. We show that an approximation to the eigendecomposition of the Gram matrix ..."
Abstract
-
Cited by 207 (6 self)
- Add to MetaCart
A major problem for kernel-based predictors (such as Support Vector Machines and Gaussian processes) is that the amount of computation required to find the solution scales as O(n ), where n is the number of training examples. We show that an approximation to the eigendecomposition of the Gram matrix can be computed by the Nyström method (which is used for the numerical solution of eigenproblems). This is achieved by carrying out an eigendecomposition on a smaller system of size m < n, and then expanding the results back up to n dimensions. The computational complexity of a predictor using this approximation is O(m n). We report experiments on the USPS and abalone data sets and show that we can set m n without any significant decrease in the accuracy of the solution.
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract
-
Cited by 170 (11 self)
- Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacian-based methods in a statistical setting.
Spectral grouping using the Nyström method
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2004
"... Spectral graph theoretic methods have recently shown great promise for the problem of image segmentation. However, due to the computational demands of these approaches, applications to large problems such as spatiotemporal data and high resolution imagery have been slow to appear. The contribution ..."
Abstract
-
Cited by 117 (1 self)
- Add to MetaCart
Spectral graph theoretic methods have recently shown great promise for the problem of image segmentation. However, due to the computational demands of these approaches, applications to large problems such as spatiotemporal data and high resolution imagery have been slow to appear. The contribution of this paper is a method that substantially reduces the computational requirements of grouping algorithms based on spectral partitioning making it feasible to apply them to very large grouping problems. Our approach is based on a technique for the numerical solution of eigenfunction problems knownas the Nyström method. This method allows one to extrapolate the complete grouping solution using only a small number of "typical" samples. In doing so, we leverage the fact that there are far fewer coherent groups in a scene than pixels.
Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering
- In Advances in Neural Information Processing Systems
, 2004
"... Several unsupervised learning algorithms based on an eigendecomposition provide either an embedding or a clustering only for given training points, with no straightforward extension for out-of-sample examples short of recomputing eigenvectors. This paper provides a unified framework for extending Lo ..."
Abstract
-
Cited by 69 (1 self)
- Add to MetaCart
Several unsupervised learning algorithms based on an eigendecomposition provide either an embedding or a clustering only for given training points, with no straightforward extension for out-of-sample examples short of recomputing eigenvectors. This paper provides a unified framework for extending Local Linear Embedding (LLE), Isomap, Laplacian Eigenmaps, Multi-Dimensional Scaling (for dimensionality reduction) as well as for Spectral Clustering. This framework is based on seeing these algorithms as learning eigenfunctions of a data-dependent kernel.
Proto-value functions: A laplacian framework for learning representation and control in markov decision processes
- Journal of Machine Learning Research
, 2006
"... This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by d ..."
Abstract
-
Cited by 45 (8 self)
- Add to MetaCart
This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by diagonalizing symmetric diffusion operators (ii) A specific instantiation of this approach where global basis functions called proto-value functions (PVFs) are formed using the eigenvectors of the graph Laplacian on an undirected graph formed from state transitions induced by the MDP (iii) A three-phased procedure called representation policy iteration comprising of a sample collection phase, a representation learning phase that constructs basis functions from samples, and a final parameter estimation phase that determines an (approximately) optimal policy within the (linear) subspace spanned by the (current) basis functions. (iv) A specific instantiation of the RPI framework using least-squares policy iteration (LSPI) as the parameter estimation method (v) Several strategies for scaling the proposed approach to large discrete and continuous state spaces, including the Nyström extension for out-of-sample interpolation of eigenfunctions, and the use of Kronecker sum factorization to construct compact eigenfunctions in product spaces such as factored MDPs (vi) Finally, a series of illustrative discrete and continuous control tasks, which both illustrate the concepts and provide a benchmark for evaluating the proposed approach. Many challenges remain to be addressed in scaling the proposed framework to large MDPs, and several elaboration of the proposed framework are briefly summarized at the end.
Learning Eigenfunctions Links Spectral Embedding And Kernel PCA
- NEURAL COMPUTATION
, 2004
"... In this paper, we show a direct relation between spectral embedding methods and kernel PCA, and how both are special cases of a more general learning problem, that of learning the principal eigenfunctions of an operator defined from a kernel and the unknown data generating density. Whereas ..."
Abstract
-
Cited by 44 (5 self)
- Add to MetaCart
In this paper, we show a direct relation between spectral embedding methods and kernel PCA, and how both are special cases of a more general learning problem, that of learning the principal eigenfunctions of an operator defined from a kernel and the unknown data generating density. Whereas
The Effect of the Input Density Distribution on Kernel-based Classifiers
- Proceedings of the 17th International Conference on Machine Learning
, 2000
"... The eigenfunction expansion of a kernel function K(x, y) as used in support vector machines or Gaussian process predictors is studied when the input data is drawn from a distribution p(x). In this case it is shown that the eigenfunctions f i g obey the equation K(x, y)p(x) i (x)dx = i i (y). This ha ..."
Abstract
-
Cited by 40 (5 self)
- Add to MetaCart
The eigenfunction expansion of a kernel function K(x, y) as used in support vector machines or Gaussian process predictors is studied when the input data is drawn from a distribution p(x). In this case it is shown that the eigenfunctions f i g obey the equation K(x, y)p(x) i (x)dx = i i (y). This has a number of consequences including (i) the eigenvalues/vectors of the n × n Gram matrix K obtained by evaluating the kernel at all pairs of training points K(x i , x j ) converge to the eigenvalues and eigenfunctions of the integral equation above as n ! 1 and (ii) the dependence of the eigenfunctions on p(x) may be useful for the class-discrimination task. We show that on a number of datasets using the RBF kernel the eigenvalue spectrum of the Gram matrix decays rapidly, and discuss how this property might be used to speed up kernel-based predictors.
Efficient spatiotemporal grouping using the Nyström method
- In Proc. IEEE Conf. Comput. Vision and Pattern Recognition
, 2001
"... Spectral graph theoretic methods have recently shown great promise for the problem of image segmentation, but due to the computational demands, applications of such methods to spatiotemporal data have been slow to appear. For even a short video sequence, the set of all pairwise voxel similarities is ..."
Abstract
-
Cited by 33 (5 self)
- Add to MetaCart
Spectral graph theoretic methods have recently shown great promise for the problem of image segmentation, but due to the computational demands, applications of such methods to spatiotemporal data have been slow to appear. For even a short video sequence, the set of all pairwise voxel similarities is a huge quantity of data: one second of a � � ¢ � � sequence captured at Hz entails on the order of pairwise similarities. The contribution of this paper is a method that substantially reduces the computational requirements of grouping algorithms based on spectral partitioning, making it feasible to apply them to very large spatiotemporal grouping problems. Our approach is based on a technique for the numerical solution of eigenfunction problems known as the Nyström method. This method allows extrapolation of the complete grouping solution using only a small number of “typical ” samples. In doing so, we successfully exploit the fact that there are far fewer coherent groups in an image sequence than pixels. 1
Learning representation and control in continuous Markov decision processes
- In Proceedings of the 21st National Conference on Artificial Intelligence
, 2006
"... This paper presents a novel framework for simultaneously learning representation and control in continuous Markov decision processes. Our approach builds on the framework of proto-value functions, in which the underlying representation or basis functions are automatically derived from a spectral ana ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
This paper presents a novel framework for simultaneously learning representation and control in continuous Markov decision processes. Our approach builds on the framework of proto-value functions, in which the underlying representation or basis functions are automatically derived from a spectral analysis of the state space manifold. The proto-value functions correspond to the eigenfunctions of the graph Laplacian. We describe an approach to extend the eigenfunctions to novel states using the Nyström extension. A least-squares policy iteration method is used to learn the control policy, where the underlying subspace for approximating the value function is spanned by the learned proto-value functions. A detailed set of experiments is presented using classic benchmark tasks, including the inverted pendulum and the mountain car, showing the sensitivity in performance to various parameters, and including comparisons with a parametric radial basis function method.
On the eigenspectrum of the gram matrix and its relationship to the operator eigenspectrum
- Eds.): ALT 2002, LNAI 2533
, 2002
"... Abstract. In this paper we analyze the relationships between the eigenvalues of the m × m Gram matrix K for a kernel k(·, ·) corresponding to a sample x1,...,xm drawn from a density p(x) and the eigenvalues of the corresponding continuous eigenproblem. We bound the differences between the two spectr ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
Abstract. In this paper we analyze the relationships between the eigenvalues of the m × m Gram matrix K for a kernel k(·, ·) corresponding to a sample x1,...,xm drawn from a density p(x) and the eigenvalues of the corresponding continuous eigenproblem. We bound the differences between the two spectra and provide a performance bound on kernel PCA. 1

