Results 1  10
of
34
Mixtures of Probabilistic Principal Component Analysers
, 1998
"... Principal component analysis (PCA) is one of the most popular techniques for processing, compressing and visualising data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a com ..."
Abstract

Cited by 398 (6 self)
 Add to MetaCart
Principal component analysis (PCA) is one of the most popular techniques for processing, compressing and visualising data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However, conventional PCA does not correspond to a probability density, and so there is no unique way to combine PCA models. Previous attempts to formulate mixture models for PCA have therefore to some extent been ad hoc. In this paper, PCA is formulated within a maximumlikelihood framework, based on a specific form of Gaussian latent variable model. This leads to a welldefined mixture model for probabilistic principal component analysers, whose parameters can be determined using an EM algorithm. We discuss the advantages of this model in the context of clustering, density modelling and local dimensionality reduction, and we demonstrate its applicat...
Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifolds
 Journal of Machine Learning Research
, 2003
"... The problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation. ..."
Abstract

Cited by 252 (8 self)
 Add to MetaCart
The problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation.
Charting a Manifold
 Advances in Neural Information Processing Systems 15
, 2003
"... this paper we use m i ( j ) N ( j ; i , s ), with the scale parameter s specifying the expected size of a neighborhood on the manifold in sample space. A reasonable choice is s = r/2, so that 2erf(2) > 99.5% of the density of m i ( j ) is contained in the area around y i where the manifold is e ..."
Abstract

Cited by 162 (7 self)
 Add to MetaCart
this paper we use m i ( j ) N ( j ; i , s ), with the scale parameter s specifying the expected size of a neighborhood on the manifold in sample space. A reasonable choice is s = r/2, so that 2erf(2) > 99.5% of the density of m i ( j ) is contained in the area around y i where the manifold is expected to be locally linear. With uniform p i and i , m i ( j ) and fixed, the MAP estimates of the GMM covariances are S i = m i ( j ) (y j i )(y j i ) # + ( j i )( j i ) # +S j m i ( j ) . (3) Note that each covariance S i is dependent on all other S j . The MAP estimators for all covariances can be arranged into a set of fully constrained linear equations and solved exactly for their mutually optimal values. This key step brings nonlocal information about the manifold's shape into the local description of each neighborhood, ensuring that adjoining neighborhoods have similar covariances and small angles between their respective subspaces. Even if a local subset of data points are dense in a direction perpendicular to the manifold, the prior encourages the local chart to orient parallel to the manifold as part of a globally optimal solution, protecting against a pathology noted in [8]. Equation (3) is easily adapted to give a reduced number of charts and/or charts centered on local centroids. 4 Connecting the charts We now build a connection for set of charts specified as an arbitrary nondegenerate GMM. A GMM gives a soft partitioning of the dataset into neighborhoods of mean k and covariance S k . The optimal variancepreserving lowdimensional coordinate system for each neighborhood derives from its weighted principal component analysis, which is exactly specified by the eigenvectors of its covariance matrix: Eigendecompose V k L k V # k S k with...
Modeling the manifolds of images of handwritten digits
 IEEE Transactions on Neural Networks
, 1997
"... description length, density estimation. ..."
Dimension Reduction by Local Principal Component Analysis
, 1997
"... Reducing or eliminating statistical redundancy between the components of highdimensional vector data enables a lowerdimensional representation without significant loss of information. Recognizing the limitations of principal component analysis (PCA), researchers in the statistics and neural networ ..."
Abstract

Cited by 99 (0 self)
 Add to MetaCart
Reducing or eliminating statistical redundancy between the components of highdimensional vector data enables a lowerdimensional representation without significant loss of information. Recognizing the limitations of principal component analysis (PCA), researchers in the statistics and neural network communities have developed nonlinear extensions of PCA. This article develops a local linear approach to dimension reduction that provides accurate representations and is fast to compute. We exercise the algorithms on speech and image data, and compare performance with PCA and with neural network implementations of nonlinear PCA. We find that both nonlinear techniques can provide more accurate representations than PCA and show that the local linear techniques outperform neural network implementations.
Performance Animation from Lowdimensional Control Signals
 ACM Transactions on Graphics
, 2005
"... This paper introduces an approach to performance animation that employs video cameras and a small set of retroreflective markers to create a lowcost, easytouse system that might someday be practical for home use. The lowdimensional control signals from the user's performance are supplemented by ..."
Abstract

Cited by 83 (18 self)
 Add to MetaCart
This paper introduces an approach to performance animation that employs video cameras and a small set of retroreflective markers to create a lowcost, easytouse system that might someday be practical for home use. The lowdimensional control signals from the user's performance are supplemented by a database of prerecorded human motion. At run time, the system automatically learns a series of local models from a set of motion capture examples that are a close match to the marker locations captured by the cameras. These local models are then used to reconstruct the motion of the user as a fullbody animation. We demonstrate the power of this approach with realtime control of six different behaviors using two video cameras and a small set of retroreflective markers. We compare the resulting animation to animation from commercial motion capture equipment with a full set of markers.
Nonlinear manifold learning for visual speech recognition
 Proceedings of the Fifth International Conference on Computer Vision
, 1995
"... A technique for representing and learning smooth nonlinear manifolds is presented and applied to several lip reading tasks. Given a set of points drawn from a smooth manifold in an abstract feature space, the technique is capable of determining the structure of the surface and offinding the closest ..."
Abstract

Cited by 80 (2 self)
 Add to MetaCart
A technique for representing and learning smooth nonlinear manifolds is presented and applied to several lip reading tasks. Given a set of points drawn from a smooth manifold in an abstract feature space, the technique is capable of determining the structure of the surface and offinding the closest manifold point to a given query point. We use this technique to learn the "space of lips " in a visual speech recognition task. The learned manifold is used for tracking and extracting the lips, for interpolating between frames in an image sequence and for providing features for recognition. We describe a system based on Hidden Markov Models and this learned lip manifold that significantly improves the performance of acoustic speech recognizers in degraded environments. We also present preliminary results on a purely visual lip reader. 1
Global Coordination of Local Linear Models
 Advances in Neural Information Processing Systems 14
, 2002
"... High dimensional data that lies on or near a low dimensional manifold can be described by a collection of local linear models. Such a description, however, does not provide a global parameterization of the manifoldarguably an important goal of unsupervised learning. In this paper, we show how ..."
Abstract

Cited by 76 (2 self)
 Add to MetaCart
High dimensional data that lies on or near a low dimensional manifold can be described by a collection of local linear models. Such a description, however, does not provide a global parameterization of the manifoldarguably an important goal of unsupervised learning. In this paper, we show how to learn a collection of local linear models that solves this more difficult problem. Our local linear models are represented by a mixture of factor analyzers, and the "global coordination " of these models is achieved by adding a regularizing term to the standard maximum likelihood objective function. The regularizer breaks a degeneracy in the mixture model's parameter space, favoring models whose internal coordinate systems are aligned in a consistent way. As a result, the internal coordinates change smoothly and continuously as one traverses a connected path on the manifoldeven when the path crosses the domains of many different local models. The regularizer takes the form of a KullbackLeibler divergence and illustrates an unexpected application of variational methods: not to perform approximate inference in intractable probabilistic models, but to learn more useful internal representations in tractable ones.
Mapping a manifold of perceptual observations
 Advances in Neural Information Processing Systems 10
, 1998
"... Nonlinear dimensionality reduction is formulated here as the problem of trying to find a Euclidean featurespace embedding of a set of observations that preserves as closely as possible their intrinsic metric structure – the distances between points on the observation manifold as measured along geod ..."
Abstract

Cited by 73 (2 self)
 Add to MetaCart
Nonlinear dimensionality reduction is formulated here as the problem of trying to find a Euclidean featurespace embedding of a set of observations that preserves as closely as possible their intrinsic metric structure – the distances between points on the observation manifold as measured along geodesic paths. Our isometric feature mapping procedure, or isomap, is able to reliably recover lowdimensional nonlinear structure in realistic perceptual data sets, such as a manifold of face images, where conventional global mapping methods find only local minima. The recovered map provides a canonical set of globally meaningful features, which allows perceptual transformations such as interpolation, extrapolation, and analogy – highly nonlinear transformations in the original observation space – to be computed with simple linear operations in feature space. 1
Maximum Likelihood and Minimum Classification Error Factor Analysis for Automatic Speech Recognition
 IEEE Transactions on Speech and Audio Processing
, 1997
"... Hidden Markov models (HMMs) for automatic speech recognition rely on high dimensional feature vectors to summarize the shorttime properties of speech. Correlations between features can arise when the speech signal is nonstationary or corrupted by noise. We investigate how to model these correlatio ..."
Abstract

Cited by 36 (3 self)
 Add to MetaCart
Hidden Markov models (HMMs) for automatic speech recognition rely on high dimensional feature vectors to summarize the shorttime properties of speech. Correlations between features can arise when the speech signal is nonstationary or corrupted by noise. We investigate how to model these correlations using factor analysis, a statistical method for dimensionality reduction. Factor analysis uses a small number of parameters to model the covariance structure of high dimensional data. These parameters can be chosen in two ways: (i) to maximize the likelihood of observed speech signals, or (ii) to minimize the number of classification errors. We derive an ExpectationMaximization (EM) algorithm for maximum likelihood estimation and a gradient descent algorithm for improved class discrimination. Speech recognizers are evaluated on two tasks, one smallsized vocabulary (connected alphadigits) and one mediumsized vocabulary (New Jersey town names). We find that modeling feature correlations...