Results 1  10
of
76
Probabilistic Principal Component Analysis
 Journal of the Royal Statistical Society, Series B
, 1999
"... Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximumlikelihood estimation of paramet ..."
Abstract

Cited by 476 (5 self)
 Add to MetaCart
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximumlikelihood estimation of parameters in a latent variable model closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss, with illustrative examples, the advantages conveyed by this probabilistic approach to PCA. Keywords: Principal component analysis
Mixtures of Probabilistic Principal Component Analysers
, 1998
"... Principal component analysis (PCA) is one of the most popular techniques for processing, compressing and visualising data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a com ..."
Abstract

Cited by 398 (6 self)
 Add to MetaCart
Principal component analysis (PCA) is one of the most popular techniques for processing, compressing and visualising data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However, conventional PCA does not correspond to a probability density, and so there is no unique way to combine PCA models. Previous attempts to formulate mixture models for PCA have therefore to some extent been ad hoc. In this paper, PCA is formulated within a maximumlikelihood framework, based on a specific form of Gaussian latent variable model. This leads to a welldefined mixture model for probabilistic principal component analysers, whose parameters can be determined using an EM algorithm. We discuss the advantages of this model in the context of clustering, density modelling and local dimensionality reduction, and we demonstrate its applicat...
On the distribution of the largest eigenvalue in principal components analysis
 Ann. Statist
, 2001
"... Let x �1 � denote the square of the largest singular value of an n × p matrix X, all of whose entries are independent standard Gaussian variates. Equivalently, x �1 � is the largest principal component variance of the covariance matrix X ′ X, or the largest eigenvalue of a pvariate Wishart distribu ..."
Abstract

Cited by 197 (2 self)
 Add to MetaCart
Let x �1 � denote the square of the largest singular value of an n × p matrix X, all of whose entries are independent standard Gaussian variates. Equivalently, x �1 � is the largest principal component variance of the covariance matrix X ′ X, or the largest eigenvalue of a pvariate Wishart distribution on n degrees of freedom with identity covariance. Consider the limit of large p and n with n/p = γ ≥ 1. When centered by µ p = � √ n − 1 + √ p � 2 and scaled by σ p = � √ n − 1 + √ p��1 / √ n − 1 + 1 / √ p � 1/3 � the distribution of x �1 � approaches the Tracy–Widom lawof order 1, which is defined in terms of the Painlevé II differential equation and can be numerically evaluated and tabulated in software. Simulations showthe approximation to be informative for n and p as small as 5. The limit is derived via a corresponding result for complex Wishart matrices using methods from random matrix theory. The result suggests that some aspects of large p multivariate distribution theory may be easier to apply in practice than their fixed p counterparts. 1. Introduction. The
Estimation of highdimensional prior and posterior covariance matrices in Kalman filter variants
 Journal of Multivariate Analysis
, 2007
"... This work studies the effect of using Monte Carlo based methods to estimate highdimensional systems. Recent focus in the geosciences has been on representing the atmospheric state using a probability density function, and, for extremely highdimensional systems, various sample based Kalman filter t ..."
Abstract

Cited by 49 (5 self)
 Add to MetaCart
This work studies the effect of using Monte Carlo based methods to estimate highdimensional systems. Recent focus in the geosciences has been on representing the atmospheric state using a probability density function, and, for extremely highdimensional systems, various sample based Kalman filter techniques have been developed to address the problem of realtime assimilation of system information and observations. As the employed sample sizes are typically several orders of magnitude smaller than the system dimension, such sampling techniques inevitably induces considerable variability into the state estimate, primarily through prior and posterior sample covariance matrices. In this article we quantify this variability with mean squared error measures for two MonteCarlo based Kalman filter variants, the ensemble Kalman filter and the squareroot filter. Under weak assumptions, we derive exact expressions of the error measures. In other cases, we rely on matrix expansions and provide approximations. We show that covarianceshrinking (tapering) based on the Schur product of the prior sample covariance matrix and a positive definite function is a simple, computationally feasible, and very effective technique to reduce sample variability and to address rankdeficient sample covariances. We propose practical rules for obtaining optimally tapered sample covariance matrices. The theoretical results are verified and illustrated with extensive simulations.
On the Distribution of the Largest Principal Component
 ANN. STATIST
, 2000
"... Let x (1) denote square of the largest singular value of an n p matrix X, all of whose entries are independent standard Gaussian variates. Equivalently, x (1) is the largest principal component of the covariance matrix X 0 X, or the largest eigenvalue of a p variate Wishart distribution on n degr ..."
Abstract

Cited by 48 (0 self)
 Add to MetaCart
Let x (1) denote square of the largest singular value of an n p matrix X, all of whose entries are independent standard Gaussian variates. Equivalently, x (1) is the largest principal component of the covariance matrix X 0 X, or the largest eigenvalue of a p variate Wishart distribution on n degrees of freedom with identity covariance. Consider the limit of large p and n with n=p = 1: When centered by p = ( p n 1+ p p) 2 and scaled by p = ( p n 1+ p p)(1= p n 1+1= p p) 1=3 the distribution of x (1) approaches the TracyWidom law of order 1, which is dened in terms of the Painleve II dierential equation, and can be numerically evaluated and tabulated in software. Simulations show the approximation to be informative for n and p as small as 5. The limit is derived via a corresponding result for complex Wishart matrices using methods from random matrix theory. The result suggests that some aspects of large p multivariate distribution theory may be easier to ...
TracyWidom limit for the largest eigenvalue of a large class of complex sample covariance matrices
 ANN. PROBAB
, 2007
"... We consider the asymptotic fluctuation behavior of the largest eigenvalue of certain sample covariance matrices in the asymptotic regime where both dimensions of the corresponding data matrix go to infinity. More precisely, let X be an n × p matrix, and let its rows be i.i.d. complex normal vectors ..."
Abstract

Cited by 45 (6 self)
 Add to MetaCart
We consider the asymptotic fluctuation behavior of the largest eigenvalue of certain sample covariance matrices in the asymptotic regime where both dimensions of the corresponding data matrix go to infinity. More precisely, let X be an n × p matrix, and let its rows be i.i.d. complex normal vectors with mean 0 and covariance �p. We show that for a large class of covariance matrices �p, the largest eigenvalue of X ∗ X is asymptotically distributed (after recentering and rescaling) as the Tracy–Widom distribution that appears in the study of the Gaussian unitary ensemble. We give explicit formulas for the centering and scaling sequences that are easy to implement and involve only the spectral distribution of the population covariance, n and p. The main theorem applies to a number of covariance models found in applications. For example, wellbehaved Toeplitz matrices as well as covariance matrices whose spectral distribution is a sum of atoms (under some conditions on the mass of the atoms) are among the models the theorem can handle. Generalizations of the theorem to certain spiked versions of our models and a.s. results about the largest eigenvalue are given. We also discuss a simple corollary that does not require normality of the entries of the data matrix and some consequences for applications in multivariate statistics.
Latent Variable Models for Neural Data Analysis
, 1999
"... The brain is perhaps the most complex system to have ever been subjected to rigorous scientific investigation. The scale is staggering: over 1011 neurons, each making an average of 10 3 synapses, with computation occurring on scales ranging from a single dendritic spine, to an entire cortical area. ..."
Abstract

Cited by 42 (5 self)
 Add to MetaCart
The brain is perhaps the most complex system to have ever been subjected to rigorous scientific investigation. The scale is staggering: over 1011 neurons, each making an average of 10 3 synapses, with computation occurring on scales ranging from a single dendritic spine, to an entire cortical area. Slowly, we are beginning to acquire experimental tools that can gather the massive amounts of data needed to characterize this system. However, to understand and interpret these data will also require substantial strides in inferential and statistical techniques. This dissertation attempts to meet this need, extending and applying the modern tools of latent variable modeling to problems in neural data analysis. It is divided
Principal Component Analysis based on Robust Estimators of the Covariance or Correlation Matrix: Influence Functions and Efficiencies
 BIOMETRIKA
, 2000
"... A robust principal component analysis can be easily performed by computing the eigenvalues and eigenvectors of a robust estimator of the covariance or correlation matrix. In this paper we derive the influence functions and the corresponding asymptotic variances for these robust estimators of eige ..."
Abstract

Cited by 36 (6 self)
 Add to MetaCart
A robust principal component analysis can be easily performed by computing the eigenvalues and eigenvectors of a robust estimator of the covariance or correlation matrix. In this paper we derive the influence functions and the corresponding asymptotic variances for these robust estimators of eigenvalues and eigenvectors. The behavior of several of these estimators is investigated by a simulation study. Finally, the use of empirical influence functions is illustrated by a real data example.
Latent variable models
 Learning in Graphical Models
, 1999
"... Abstract. A powerful approach to probabilistic modelling involves supplementing a set of observed variables with additional latent, or hidden, variables. By defining a joint distribution over visible and latent variables, the corresponding distribution of the observed variables is then obtained by m ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
Abstract. A powerful approach to probabilistic modelling involves supplementing a set of observed variables with additional latent, or hidden, variables. By defining a joint distribution over visible and latent variables, the corresponding distribution of the observed variables is then obtained by marginalization. This allows relatively complex distributions to be expressed in terms of more tractable joint distributions over the expanded variable space. One wellknown example of a hidden variable model is the mixture distribution in which the hidden variable is the discrete component label. In the case of continuous latent variables we obtain models such as factor analysis. The structure of such probabilistic models can be made particularly transparent by giving them a graphical representation, usually in terms of a directed acyclic graph, or Bayesian network. In this chapter we provide an overview of latent variable models for representing continuous variables. We show how a particular form of linear latent variable model can be used to provide a probabilistic formulation of the wellknown technique of principal components analysis (PCA). By extending this technique to mixtures, and hierarchical mixtures, of probabilistic PCA models we are led to a powerful interactive algorithm for data visualization. We also show how the probabilistic PCA approach can be generalized to nonlinear latent variable models leading to the Generative Topographic Mapping algorithm (GTM). Finally, we show how GTM can itself be extended to model temporal data.
SPECTRUM ESTIMATION FOR LARGE DIMENSIONAL COVARIANCE MATRICES USING RANDOM MATRIX THEORY
 SUBMITTED TO THE ANNALS OF STATISTICS
"... Estimating the eigenvalues of a population covariance matrix from a sample covariance matrix is a problem of fundamental importance in multivariate statistics; the eigenvalues of covariance matrices play a key role in many widely techniques, in particular in Principal Component Analysis (PCA). In ma ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
Estimating the eigenvalues of a population covariance matrix from a sample covariance matrix is a problem of fundamental importance in multivariate statistics; the eigenvalues of covariance matrices play a key role in many widely techniques, in particular in Principal Component Analysis (PCA). In many modern data analysis problems, statisticians are faced with large datasets where the sample size, n, is of the same order of magnitude as the number of variables p. Random matrix theory predicts that in this context, the eigenvalues of the sample covariance matrix are not good estimators of the eigenvalues of the population covariance. We propose to use a fundamental result in random matrix theory, the MarčenkoPastur equation, to better estimate the eigenvalues of large dimensional covariance matrices. The MarčenkoPastur equation holds in very wide generality and under weak assumptions. The estimator we obtain can be thought of as “shrinking ” in a non linear fashion the eigenvalues of the sample covariance matrix to estimate the population eigenvalues. Inspired by ideas of random matrix theory, we also suggest a change of point of view when thinking about estimation of highdimensional vectors: we do not try to estimate directly the vectors but rather a probability measure that describes them. We think this is a theoretically more fruitful way to think about these problems. Our estimator gives fast and good or very good results in extended simulations. Our algorithmic approach is based on convex optimization. We also show that the proposed estimator is consistent.