Results 1  10
of
84
Using the Nyström Method to Speed Up Kernel Machines
 Advances in Neural Information Processing Systems 13
, 2001
"... A major problem for kernelbased predictors (such as Support Vector Machines and Gaussian processes) is that the amount of computation required to find the solution scales as O(n ), where n is the number of training examples. We show that an approximation to the eigendecomposition of the Gram matrix ..."
Abstract

Cited by 283 (6 self)
 Add to MetaCart
A major problem for kernelbased predictors (such as Support Vector Machines and Gaussian processes) is that the amount of computation required to find the solution scales as O(n ), where n is the number of training examples. We show that an approximation to the eigendecomposition of the Gram matrix can be computed by the Nyström method (which is used for the numerical solution of eigenproblems). This is achieved by carrying out an eigendecomposition on a smaller system of size m < n, and then expanding the results back up to n dimensions. The computational complexity of a predictor using this approximation is O(m n). We report experiments on the USPS and abalone data sets and show that we can set m n without any significant decrease in the accuracy of the solution.
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract

Cited by 282 (15 self)
 Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacianbased methods in a statistical setting.
Spectral grouping using the Nyström method
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2004
"... Spectral graph theoretic methods have recently shown great promise for the problem of image segmentation. However, due to the computational demands of these approaches, applications to large problems such as spatiotemporal data and high resolution imagery have been slow to appear. The contribution ..."
Abstract

Cited by 188 (1 self)
 Add to MetaCart
Spectral graph theoretic methods have recently shown great promise for the problem of image segmentation. However, due to the computational demands of these approaches, applications to large problems such as spatiotemporal data and high resolution imagery have been slow to appear. The contribution of this paper is a method that substantially reduces the computational requirements of grouping algorithms based on spectral partitioning making it feasible to apply them to very large grouping problems. Our approach is based on a technique for the numerical solution of eigenfunction problems knownas the Nyström method. This method allows one to extrapolate the complete grouping solution using only a small number of "typical" samples. In doing so, we leverage the fact that there are far fewer coherent groups in a scene than pixels.
OutofSample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering
 In Advances in Neural Information Processing Systems
, 2004
"... Several unsupervised learning algorithms based on an eigendecomposition provide either an embedding or a clustering only for given training points, with no straightforward extension for outofsample examples short of recomputing eigenvectors. This paper provides a unified framework for extending Lo ..."
Abstract

Cited by 94 (2 self)
 Add to MetaCart
Several unsupervised learning algorithms based on an eigendecomposition provide either an embedding or a clustering only for given training points, with no straightforward extension for outofsample examples short of recomputing eigenvectors. This paper provides a unified framework for extending Local Linear Embedding (LLE), Isomap, Laplacian Eigenmaps, MultiDimensional Scaling (for dimensionality reduction) as well as for Spectral Clustering. This framework is based on seeing these algorithms as learning eigenfunctions of a datadependent kernel.
Protovalue functions: A laplacian framework for learning representation and control in markov decision processes
 Journal of Machine Learning Research
, 2006
"... This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by d ..."
Abstract

Cited by 66 (9 self)
 Add to MetaCart
This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by diagonalizing symmetric diffusion operators (ii) A specific instantiation of this approach where global basis functions called protovalue functions (PVFs) are formed using the eigenvectors of the graph Laplacian on an undirected graph formed from state transitions induced by the MDP (iii) A threephased procedure called representation policy iteration comprising of a sample collection phase, a representation learning phase that constructs basis functions from samples, and a final parameter estimation phase that determines an (approximately) optimal policy within the (linear) subspace spanned by the (current) basis functions. (iv) A specific instantiation of the RPI framework using leastsquares policy iteration (LSPI) as the parameter estimation method (v) Several strategies for scaling the proposed approach to large discrete and continuous state spaces, including the Nyström extension for outofsample interpolation of eigenfunctions, and the use of Kronecker sum factorization to construct compact eigenfunctions in product spaces such as factored MDPs (vi) Finally, a series of illustrative discrete and continuous control tasks, which both illustrate the concepts and provide a benchmark for evaluating the proposed approach. Many challenges remain to be addressed in scaling the proposed framework to large MDPs, and several elaboration of the proposed framework are briefly summarized at the end.
Learning Eigenfunctions Links Spectral Embedding And Kernel PCA
 NEURAL COMPUTATION
, 2004
"... In this paper, we show a direct relation between spectral embedding methods and kernel PCA, and how both are special cases of a more general learning problem, that of learning the principal eigenfunctions of an operator defined from a kernel and the unknown data generating density. Whereas ..."
Abstract

Cited by 65 (6 self)
 Add to MetaCart
In this paper, we show a direct relation between spectral embedding methods and kernel PCA, and how both are special cases of a more general learning problem, that of learning the principal eigenfunctions of an operator defined from a kernel and the unknown data generating density. Whereas
The Effect of the Input Density Distribution on Kernelbased Classifiers
 Proceedings of the 17th International Conference on Machine Learning
, 2000
"... The eigenfunction expansion of a kernel function K(x, y) as used in support vector machines or Gaussian process predictors is studied when the input data is drawn from a distribution p(x). In this case it is shown that the eigenfunctions f i g obey the equation K(x, y)p(x) i (x)dx = i i (y). This ha ..."
Abstract

Cited by 49 (6 self)
 Add to MetaCart
The eigenfunction expansion of a kernel function K(x, y) as used in support vector machines or Gaussian process predictors is studied when the input data is drawn from a distribution p(x). In this case it is shown that the eigenfunctions f i g obey the equation K(x, y)p(x) i (x)dx = i i (y). This has a number of consequences including (i) the eigenvalues/vectors of the n × n Gram matrix K obtained by evaluating the kernel at all pairs of training points K(x i , x j ) converge to the eigenvalues and eigenfunctions of the integral equation above as n ! 1 and (ii) the dependence of the eigenfunctions on p(x) may be useful for the classdiscrimination task. We show that on a number of datasets using the RBF kernel the eigenvalue spectrum of the Gram matrix decays rapidly, and discuss how this property might be used to speed up kernelbased predictors.
Efficient spatiotemporal grouping using the Nyström method
 In Proc. IEEE Conf. Comput. Vision and Pattern Recognition
, 2001
"... Spectral graph theoretic methods have recently shown great promise for the problem of image segmentation, but due to the computational demands, applications of such methods to spatiotemporal data have been slow to appear. For even a short video sequence, the set of all pairwise voxel similarities is ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
Spectral graph theoretic methods have recently shown great promise for the problem of image segmentation, but due to the computational demands, applications of such methods to spatiotemporal data have been slow to appear. For even a short video sequence, the set of all pairwise voxel similarities is a huge quantity of data: one second of a � � ¢ � � sequence captured at Hz entails on the order of pairwise similarities. The contribution of this paper is a method that substantially reduces the computational requirements of grouping algorithms based on spectral partitioning, making it feasible to apply them to very large spatiotemporal grouping problems. Our approach is based on a technique for the numerical solution of eigenfunction problems known as the Nyström method. This method allows extrapolation of the complete grouping solution using only a small number of “typical ” samples. In doing so, we successfully exploit the fact that there are far fewer coherent groups in an image sequence than pixels. 1
Fast Embedding of Sparse Music Similarity Graphs
 In Advances in Neural Information Processing Systems
, 2004
"... This paper applies fast sparse multidimensional scaling (MDS) to a large graph of music similarity, with 267K vertices that represent artists, albums, and tracks; and 3.22M edges that represent similarity between those entities. Once vertices are assigned locations in a Euclidean space, the loca ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
This paper applies fast sparse multidimensional scaling (MDS) to a large graph of music similarity, with 267K vertices that represent artists, albums, and tracks; and 3.22M edges that represent similarity between those entities. Once vertices are assigned locations in a Euclidean space, the locations can be used to browse music and to generate playlists.
Modeling Transfer Relationships Between Learning Tasks for Improved Inductive Transfer
"... Abstract. In this paper, we propose a novel graphbased method for knowledge transfer. We model the transfer relationships between source tasks by embedding the set of learned source models in a graph using transferability as the metric. Transfer to a new problem proceeds by mapping the problem into ..."
Abstract

Cited by 20 (8 self)
 Add to MetaCart
Abstract. In this paper, we propose a novel graphbased method for knowledge transfer. We model the transfer relationships between source tasks by embedding the set of learned source models in a graph using transferability as the metric. Transfer to a new problem proceeds by mapping the problem into the graph, then learning a function on this graph that automatically determines the parameters to transfer to the new learning task. This method is analogous to inductive transfer along a manifold that captures the transfer relationships between the tasks. We demonstrate improved transfer performance using this method against existing approaches in several realworld domains. 1