Results 1  10
of
394
GTM: The generative topographic mapping
 Neural Computation
, 1998
"... Latent variable models represent the probability density of data in a space of several dimensions in terms of a smaller number of latent, or hidden, variables. A familiar example is factor analysis which is based on a linear transformations between the latent space and the data space. In this paper ..."
Abstract

Cited by 275 (5 self)
 Add to MetaCart
Latent variable models represent the probability density of data in a space of several dimensions in terms of a smaller number of latent, or hidden, variables. A familiar example is factor analysis which is based on a linear transformations between the latent space and the data space. In this paper we introduce a form of nonlinear latent variable model called the Generative Topographic Mapping for which the parameters of the model can be determined using the EM algorithm. GTM provides a principled alternative to the widely used SelfOrganizing Map (SOM) of Kohonen (1982), and overcomes most of the significant limitations of the SOM. We demonstrate the performance of the GTM algorithm on a toy problem and on simulated data from flow diagnostics for a multiphase oil pipeline. Copyright c○MIT Press (1998). 1
Independent Factor Analysis
 Neural Computation
, 1999
"... We introduce the independent factor analysis (IFA) method for recovering independent hidden sources from their observed mixtures. IFA generalizes and unifies ordinary factor analysis (FA), principal component analysis (PCA), and independent component analysis (ICA), and can handle not only square no ..."
Abstract

Cited by 219 (9 self)
 Add to MetaCart
We introduce the independent factor analysis (IFA) method for recovering independent hidden sources from their observed mixtures. IFA generalizes and unifies ordinary factor analysis (FA), principal component analysis (PCA), and independent component analysis (ICA), and can handle not only square noiseless mixing, but also the general case where the number of mixtures differs from the number of sources and the data are noisy. IFA is a twostep procedure. In the first step, the source densities, mixing matrix and noise covariance are estimated from the observed data by maximum likelihood. For this purpose we present an expectationmaximization (EM) algorithm, which performs unsupervised learning of an associated probabilistic model of the mixing situation. Each source in our model is described by a mixture of Gaussians, thus all the probabilistic calculations can be performed analytically. In the second step, the sources are reconstructed from the observed data by an optimal nonlinear ...
A Variational Bayesian Framework for Graphical Models
 In Advances in Neural Information Processing Systems 12
, 2000
"... This paper presents a novel practical framework for Bayesian model averaging and model selection in probabilistic graphical models. Our approach approximates full posterior distributions over model parameters and structures, as well as latent variables, in an analytical manner. These posteriors ..."
Abstract

Cited by 189 (6 self)
 Add to MetaCart
This paper presents a novel practical framework for Bayesian model averaging and model selection in probabilistic graphical models. Our approach approximates full posterior distributions over model parameters and structures, as well as latent variables, in an analytical manner. These posteriors fall out of a freeform optimization procedure, which naturally incorporates conjugate priors. Unlike in large sample approximations, the posteriors are generally nonGaussian and no Hessian needs to be computed. Predictive quantities are obtained analytically. The resulting algorithm generalizes the standard Expectation Maximization algorithm, and its convergence is guaranteed. We demonstrate that this approach can be applied to a large class of models in several domains, including mixture models and source separation. 1 Introduction A standard method to learn a graphical model 1 from data is maximum likelihood (ML). Given a training dataset, ML estimates a single optimal value f...
Probabilistic nonlinear principal component analysis with Gaussian process latent variable models
 Journal of Machine Learning Research
, 2005
"... Summarising a high dimensional data set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component ..."
Abstract

Cited by 134 (14 self)
 Add to MetaCart
Summarising a high dimensional data set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component analysis (PCA) that we term dual probabilistic PCA (DPPCA). The DPPCA model has the additional advantage that the linear mappings from the embedded space can easily be nonlinearised through Gaussian processes. We refer to this model as a Gaussian process latent variable model (GPLVM). Through analysis of the GPLVM objective function, we relate the model to popular spectral techniques such as kernel PCA and multidimensional scaling. We then review a practical algorithm for GPLVMs in the context of large data sets and develop it to also handle discrete valued data and missing attributes. We demonstrate the model on a range of realworld and artificially generated data sets.
Gaussian process latent variable models for visualisation of high dimensional data
 Adv. in Neural Inf. Proc. Sys
, 2004
"... We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the ex ..."
Abstract

Cited by 132 (5 self)
 Add to MetaCart
We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the exact marginal likelihood of the nonlinear latent variable model. The maximization of the variational lower bound provides a Bayesian training procedure that is robust to overfitting and can automatically select the dimensionality of the nonlinear latent space. We demonstrate our method on real world datasets. The focus in this paper is on dimensionality reduction problems, but the methodology is more general. For example, our algorithm is immediately applicable for training Gaussian process models in the presence of missing or uncertain inputs. 1
Probabilistic Matrix Factorization
"... Many existing approaches to collaborative filtering can neither handle very large datasets nor easily deal with users who have very few ratings. In this paper we present the Probabilistic Matrix Factorization (PMF) model which scales linearly with the number of observations and, more importantly, pe ..."
Abstract

Cited by 132 (4 self)
 Add to MetaCart
Many existing approaches to collaborative filtering can neither handle very large datasets nor easily deal with users who have very few ratings. In this paper we present the Probabilistic Matrix Factorization (PMF) model which scales linearly with the number of observations and, more importantly, performs well on the large, sparse, and very imbalanced Netflix dataset. We further extend the PMF model to include an adaptive prior on the model parameters and show how the model capacity can be controlled automatically. Finally, we introduce a constrained version of the PMF model that is based on the assumption that users who have rated similar sets of movies are likely to have similar preferences. The resulting model is able to generalize considerably better for users with very few ratings. When the predictions of multiple PMF models are linearly combined with the predictions of Restricted Boltzmann Machines models, we achieve an error rate of 0.8861, that is nearly 7 % better than the score of Netflix’s own system. 1
Incremental Learning for Robust Visual Tracking
, 2008
"... Visual tracking, in essence, deals with nonstationary image streams that change over time. While most existing algorithms are able to track objects well in controlled environments, they usually fail in the presence of significant variation of the object’s appearance or surrounding illumination. On ..."
Abstract

Cited by 122 (10 self)
 Add to MetaCart
Visual tracking, in essence, deals with nonstationary image streams that change over time. While most existing algorithms are able to track objects well in controlled environments, they usually fail in the presence of significant variation of the object’s appearance or surrounding illumination. One reason for such failures is that many algorithms employ fixed appearance models of the target. Such models are trained using only appearance data available before tracking begins, which in practice limits the range of appearances that are modeled, and ignores the large volume of information (such as shape changes or specific lighting conditions) that becomes available during tracking. In this paper, we present a tracking method that incrementally learns a lowdimensional subspace representation, efficiently adapting online to changes in the appearance of the target. The model update, based on incremental algorithms for principal component analysis, includes two important features: a method for correctly updating the sample mean, and a for
Generalized principal component analysis (GPCA)
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2003
"... This paper presents an algebrogeometric solution to the problem of segmenting an unknown number of subspaces of unknown and varying dimensions from sample data points. We represent the subspaces with a set of homogeneous polynomials whose degree is the number of subspaces and whose derivatives at a ..."
Abstract

Cited by 117 (29 self)
 Add to MetaCart
This paper presents an algebrogeometric solution to the problem of segmenting an unknown number of subspaces of unknown and varying dimensions from sample data points. We represent the subspaces with a set of homogeneous polynomials whose degree is the number of subspaces and whose derivatives at a data point give normal vectors to the subspace passing through the point. When the number of subspaces is known, we show that these polynomials can be estimated linearly from data; hence, subspace segmentation is reduced to classifying one point per subspace. We select these points optimally from the data set by minimizing certain distance function, thus dealing automatically with moderate noise in the data. A basis for the complement of each subspace is then recovered by applying standard PCA to the collection of derivatives (normal vectors). Extensions of GPCA that deal with data in a highdimensional space and with an unknown number of subspaces are also presented. Our experiments on lowdimensional data show that GPCA outperforms existing algebraic algorithms based on polynomial factorization and provides a good initialization to iterative techniques such as Ksubspaces and Expectation Maximization. We also present applications of GPCA to computer vision problems such as face clustering, temporal video segmentation, and 3D motion segmentation from point correspondences in multiple affine views.
A generalization of principal component analysis to the exponential family
 Advances in Neural Information Processing Systems
, 2001
"... Principal component analysis (PCA) is a commonly applied technique for dimensionality reduction. PCA implicitly minimizes a squared loss function, which may be inappropriate for data that is not realvalued, such as binaryvalued data. This paper draws on ideas from the Exponential family, Generaliz ..."
Abstract

Cited by 108 (0 self)
 Add to MetaCart
Principal component analysis (PCA) is a commonly applied technique for dimensionality reduction. PCA implicitly minimizes a squared loss function, which may be inappropriate for data that is not realvalued, such as binaryvalued data. This paper draws on ideas from the Exponential family, Generalized linear models, and Bregman distances, to give a generalization of PCA to loss functions that we argue are better suited to other data types. We describe algorithms for minimizing the loss functions, and give examples on simulated data. 1
Incremental Online Learning in High Dimensions
 Neural Computation
, 2005
"... Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally e ..."
Abstract

Cited by 104 (15 self)
 Add to MetaCart
Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally e#cient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it i) learns rapidly with second order learning methods based on incremental training, ii) uses statistically sound stochastic leaveoneout cross validation for learning without the need to memorize training data, iii) adjusts its weighting kernels based only on local information in order to minimize the danger of negative interference of incremental learning, iv) has a computational complexity that is linear in the number of inputs, and v) can deal with a large number of  possibly redundant  inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and e#ciently operate in very high dimensional spaces.