Results 1  10
of
102
Probabilistic nonlinear principal component analysis with Gaussian process latent variable models
 Journal of Machine Learning Research
, 2005
"... Summarising a high dimensional data set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component ..."
Abstract

Cited by 134 (14 self)
 Add to MetaCart
Summarising a high dimensional data set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component analysis (PCA) that we term dual probabilistic PCA (DPPCA). The DPPCA model has the additional advantage that the linear mappings from the embedded space can easily be nonlinearised through Gaussian processes. We refer to this model as a Gaussian process latent variable model (GPLVM). Through analysis of the GPLVM objective function, we relate the model to popular spectral techniques such as kernel PCA and multidimensional scaling. We then review a practical algorithm for GPLVMs in the context of large data sets and develop it to also handle discrete valued data and missing attributes. We demonstrate the model on a range of realworld and artificially generated data sets.
Principal manifolds and nonlinear dimensionality reduction via tangent space alignment zhenyue zhang, hongyuan zha
 SIAM Journal on Scientific Computing
, 2004
"... Abstract. Nonlinear manifold learning from unorganized data points is a very challenging unsupervised learning and data visualization problem with a great variety of applications. In this paper we present a new algorithm for manifold learning and nonlinear dimension reduction. Based on a set of unor ..."
Abstract

Cited by 133 (8 self)
 Add to MetaCart
Abstract. Nonlinear manifold learning from unorganized data points is a very challenging unsupervised learning and data visualization problem with a great variety of applications. In this paper we present a new algorithm for manifold learning and nonlinear dimension reduction. Based on a set of unorganized data points sampled with noise from the manifold, we represent the local geometry of the manifold using tangent spaces learned by fitting an affine subspace in a neighborhood of each data point. Those tangent spaces are aligned to give the internal global coordinates of the data points with respect to the underlying manifold by way of a partial eigendecomposition of the neighborhood connection matrix. We present a careful error analysis of our algorithm and show that the reconstruction errors are of secondorder accuracy. We illustrate our algorithm using curves and surfaces both in 2D/3D and higher dimensional Euclidean spaces, and 64by64 pixel face images with various pose and lighting conditions. We also address several theoretical and algorithmic issues for further research and improvements.
Gaussian process latent variable models for visualisation of high dimensional data
 Adv. in Neural Inf. Proc. Sys
, 2004
"... We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the ex ..."
Abstract

Cited by 132 (5 self)
 Add to MetaCart
We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the exact marginal likelihood of the nonlinear latent variable model. The maximization of the variational lower bound provides a Bayesian training procedure that is robust to overfitting and can automatically select the dimensionality of the nonlinear latent space. We demonstrate our method on real world datasets. The focus in this paper is on dimensionality reduction problems, but the methodology is more general. For example, our algorithm is immediately applicable for training Gaussian process models in the presence of missing or uncertain inputs. 1
Metric Learning by Collapsing Classes
"... We present an algorithm for learning a quadratic Gaussian metric (Mahalanobis distance) for use in classification tasks. Our method relies on the simple geometric intuition that a good metric is one under which points in the same class are simultaneously near each other and far from points in th ..."
Abstract

Cited by 130 (2 self)
 Add to MetaCart
We present an algorithm for learning a quadratic Gaussian metric (Mahalanobis distance) for use in classification tasks. Our method relies on the simple geometric intuition that a good metric is one under which points in the same class are simultaneously near each other and far from points in the other classes. We construct a convex optimization problem whose solution generates such a metric by trying to collapse all examples in the same class to a single point and push examples in other classes infinitely far away. We show that when the metric we learn is used in simple classifiers, it yields substantial improvements over standard alternatives on a variety of problems. We also discuss how the learned metric may be used to obtain a compact low dimensional feature representation of the original input space, allowing more efficient classification with very little reduction in performance.
Spectral hashing
 In Proc. NIPS
, 2008
"... Semantic hashing[1] seeks compact binary codes of datapoints so that the Hamming distance between codewords correlates with semantic similarity. In this paper, we show that the problem of finding a best code for a given dataset is closely related to the problem of graph partitioning and can be show ..."
Abstract

Cited by 108 (2 self)
 Add to MetaCart
Semantic hashing[1] seeks compact binary codes of datapoints so that the Hamming distance between codewords correlates with semantic similarity. In this paper, we show that the problem of finding a best code for a given dataset is closely related to the problem of graph partitioning and can be shown to be NP hard. By relaxing the original problem, we obtain a spectral method whose solutions are simply a subset of thresholded eigenvectors of the graph Laplacian. By utilizing recent results on convergence of graph Laplacian eigenvectors to the LaplaceBeltrami eigenfunctions of manifolds, we show how to efficiently calculate the code of a novel datapoint. Taken together, both learning the code and applying it to a novel point are extremely simple. Our experiments show that our codes outperform the stateofthe art. 1
On Contrastive Divergence Learning
"... Maximumlikelihood (ML) learning of Markov random fields is challenging because it requires estimates of averages that have an exponential number of terms. Markov chain Monte Carlo methods typically take a long time to converge on unbiased estimates, but Hinton (2002) showed that if the Markov ..."
Abstract

Cited by 82 (15 self)
 Add to MetaCart
Maximumlikelihood (ML) learning of Markov random fields is challenging because it requires estimates of averages that have an exponential number of terms. Markov chain Monte Carlo methods typically take a long time to converge on unbiased estimates, but Hinton (2002) showed that if the Markov chain is only run for a few steps, the learning can still work well and it approximately minimizes a di#erent function called "contrastive divergence" (CD). CD learning has been successfully applied to various types of random fields. Here, we study the properties of CD learning and show that it provides biased estimates in general, but that the bias is typically very small. Fast CD learning can therefore be used to get close to an ML solution and slow ML learning can then be used to finetune the CD solution.
Finding Approximate POMDP Solutions Through Belief Compression
, 2003
"... Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the ent ..."
Abstract

Cited by 62 (2 self)
 Add to MetaCart
Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the entire belief space. However, in realworld POMDP problems, computing the optimal policy for the full belief space is often unnecessary for good control even for problems with complicated policy classes. The beliefs experienced by the controller often lie near a structured, lowdimensional manifold embedded in the highdimensional belief space. Finding a good approximation to the optimal value function for only this manifold can be much easier than computing the full value function. We introduce a new method for solving largescale POMDPs by reducing the dimensionality of the belief space. We use Exponential family Principal Components Analysis (Collins, Dasgupta, & Schapire, 2002) to represent sparse, highdimensional belief spaces using lowdimensional sets of learned features of the belief state. We then plan only in terms of the lowdimensional belief features. By planning in this lowdimensional space, we can find policies for POMDP models that are orders of magnitude larger than models that can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and on mobile robot navigation tasks. 1.
Local distance preservation in the gplvm through back constraints
 In ICML
, 2006
"... The Gaussian process latent variable model (GPLVM) is a generative approach to nonlinear low dimensional embedding, that provides a smooth probabilistic mapping from latent to data space. It is also a nonlinear generalization of probabilistic PCA (PPCA) (Tipping & Bishop, 1999). While most approac ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
The Gaussian process latent variable model (GPLVM) is a generative approach to nonlinear low dimensional embedding, that provides a smooth probabilistic mapping from latent to data space. It is also a nonlinear generalization of probabilistic PCA (PPCA) (Tipping & Bishop, 1999). While most approaches to nonlinear dimensionality methods focus on preserving local distances in data space, the GPLVM focusses on exactly the opposite. Being a smooth mapping from latent to data space, it focusses on keeping things apart in latent space that are far apart in data space. In this paper we first provide an overview of dimensionality reduction techniques, placing the emphasis on the kind of distance relation preserved. We then show how the GPLVM can be generalized, through back constraints, to additionally preserve local distances. We give illustrative experiments on common data sets. 1.