Results 1 - 10
of
51
Probabilistic Principal Component Analysis
- Journal of the Royal Statistical Society, Series B
, 1999
"... Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of paramet ..."
Abstract
-
Cited by 359 (5 self)
- Add to MetaCart
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss, with illustrative examples, the advantages conveyed by this probabilistic approach to PCA. Keywords: Principal component analysis
Mixtures of Probabilistic Principal Component Analysers
, 1998
"... Principal component analysis (PCA) is one of the most popular techniques for processing, compressing and visualising data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a com ..."
Abstract
-
Cited by 334 (6 self)
- Add to MetaCart
Principal component analysis (PCA) is one of the most popular techniques for processing, compressing and visualising data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However, conventional PCA does not correspond to a probability density, and so there is no unique way to combine PCA models. Previous attempts to formulate mixture models for PCA have therefore to some extent been ad hoc. In this paper, PCA is formulated within a maximum-likelihood framework, based on a specific form of Gaussian latent variable model. This leads to a well-defined mixture model for probabilistic principal component analysers, whose parameters can be determined using an EM algorithm. We discuss the advantages of this model in the context of clustering, density modelling and local dimensionality reduction, and we demonstrate its applicat...
Principal Components Analysis to Summarize Microarray Experiments: Application to Sporulation Time Series
- in Pacific Symposium on Biocomputing
, 2000
"... A series of microarray experiments produces observations of differential expression for thousands of genes across multiple conditions. It is often not clear whether a set of experiments are measuring fundamentally different gene expression states or are measuring similar states created through diffe ..."
Abstract
-
Cited by 94 (2 self)
- Add to MetaCart
A series of microarray experiments produces observations of differential expression for thousands of genes across multiple conditions. It is often not clear whether a set of experiments are measuring fundamentally different gene expression states or are measuring similar states created through different mechanisms. It is useful, therefore, to define a core set of independent features for the expression states that allow them to be compared directly. Principal components analysis (PCA) is a statistical technique for determining the key variables in a multidimensional data set that explain the differences in the observations, and can be used to simplify the analysis and visualization of multidimensional data sets. We show that application of PCA to expression data (where the experimental conditions are the variables, and the gene expression measurements are the observations) allows us to summarize the ways in which gene responses vary under different conditions. Examination of the components also provides insight into the underlying factors that are measured in the experiments. We applied PCA to the publicly released yeast sporulation data set (Chu et al. 1998). In that work, 7 different measurements of gene expression were made over time. PCA on the time-points suggests that much of the observed variability in the experiment can be summarized in just 2 components—i.e. 2 variables capture most of the information. These components appear to represent (1) overall induction level and (2) change in induction level over time. We also examined the clusters proposed in the original paper, and show how they are manifested in principal component space. Our results are available on the internet at
Probabilistic non-linear principal component analysis with Gaussian process latent variable models
- Journal of Machine Learning Research
, 2005
"... Summarising a high dimensional data set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component ..."
Abstract
-
Cited by 71 (10 self)
- Add to MetaCart
Summarising a high dimensional data set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component analysis (PCA) that we term dual probabilistic PCA (DPPCA). The DPPCA model has the additional advantage that the linear mappings from the embedded space can easily be nonlinearised through Gaussian processes. We refer to this model as a Gaussian process latent variable model (GP-LVM). Through analysis of the GP-LVM objective function, we relate the model to popular spectral techniques such as kernel PCA and multidimensional scaling. We then review a practical algorithm for GP-LVMs in the context of large data sets and develop it to also handle discrete valued data and missing attributes. We demonstrate the model on a range of real-world and artificially generated data sets.
Latent variable models
- Learning in Graphical Models
, 1999
"... Abstract. A powerful approach to probabilistic modelling involves supplementing a set of observed variables with additional latent, or hidden, variables. By defining a joint distribution over visible and latent variables, the corresponding distribution of the observed variables is then obtained by m ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
Abstract. A powerful approach to probabilistic modelling involves supplementing a set of observed variables with additional latent, or hidden, variables. By defining a joint distribution over visible and latent variables, the corresponding distribution of the observed variables is then obtained by marginalization. This allows relatively complex distributions to be expressed in terms of more tractable joint distributions over the expanded variable space. One well-known example of a hidden variable model is the mixture distribution in which the hidden variable is the discrete component label. In the case of continuous latent variables we obtain models such as factor analysis. The structure of such probabilistic models can be made particularly transparent by giving them a graphical representation, usually in terms of a directed acyclic graph, or Bayesian network. In this chapter we provide an overview of latent variable models for representing continuous variables. We show how a particular form of linear latent variable model can be used to provide a probabilistic formulation of the well-known technique of principal components analysis (PCA). By extending this technique to mixtures, and hierarchical mixtures, of probabilistic PCA models we are led to a powerful interactive algorithm for data visualization. We also show how the probabilistic PCA approach can be generalized to non-linear latent variable models leading to the Generative Topographic Mapping algorithm (GTM). Finally, we show how GTM can itself be extended to model temporal data.
Geometric Methods for Feature Extraction and Dimensional Reduction
- In L. Rokach and O. Maimon (Eds.), Data
, 2005
"... Abstract We give a tutorial overview of several geometric methods for feature extraction and dimensional reduction. We divide the methods into projective methods and methods that model the manifold on which the data lies. For projective methods, we review projection pursuit, principal component anal ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
Abstract We give a tutorial overview of several geometric methods for feature extraction and dimensional reduction. We divide the methods into projective methods and methods that model the manifold on which the data lies. For projective methods, we review projection pursuit, principal component analysis (PCA), kernel PCA, probabilistic PCA, and oriented PCA; and for the manifold methods, we review multidimensional scaling (MDS), landmark MDS, Isomap, locally linear embedding, Laplacian eigenmaps and spectral clustering. The Nyström method, which links several of the algorithms, is also reviewed. The goal is to provide a self-contained review of the concepts and mathematics underlying these algorithms.
Correlated Bayesian Factor Analysis
, 1998
"... Factor analysis is a method in multivariate statistical analysis that can help scientists determine which variables to study in a field and their relationships. We extend the Bayesian approach to factor analysis developed in 1989 by Press and Shigemasu (henceforth PS89) and revised in 1997 to model ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
Factor analysis is a method in multivariate statistical analysis that can help scientists determine which variables to study in a field and their relationships. We extend the Bayesian approach to factor analysis developed in 1989 by Press and Shigemasu (henceforth PS89) and revised in 1997 to model correlated observation vectors, factor score vectors, and factor loadings. Further, we place a prior distribution on the number of factors and obtain posterior estimates. Hitherto,
Fast Dimensionality Reduction and Simple PCA
, 1997
"... A fast and simple algorithm for approximately calculating the principal components (PCs) of a data set and so reducing its dimensionality is described. This Simple Principal Components Analysis (SPCA) method was used for dimensionality reduction of two high-dimensional image databases, one of handwr ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
A fast and simple algorithm for approximately calculating the principal components (PCs) of a data set and so reducing its dimensionality is described. This Simple Principal Components Analysis (SPCA) method was used for dimensionality reduction of two high-dimensional image databases, one of handwritten digits and one of handwritten Japanese characters. It was tested and compared with other techniques. On both databases SPCA shows a fast convergence rate compared with other methods and robustness to the reordering of the samples. KEYWORDS: Principal component analysis, matrix diagonalization, Hebbian learning, image compression. All correspondance should be addressed to this author. y Permanent address: Instituto de Fisica Rosario, Bvd. 27 de Febrero 210 Bis, 2000 Rosario, Argentina. 1 Introduction High dimensional data analysis is becoming increasingly common as new problems are placing greater demands on computing resources. With high dimensional data, it is difficult to unde...
M.: Segmentation of the liver using a 3D statistical shape model
, 2004
"... This paper presents an automatic approach for segmentation of the liver from computer tomography (CT) images based on a 3D statistical shape model. Segmentation of the liver is an important prerequisite in liver surgery planning. One of the major challenges in building a 3D shape model from a traini ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
This paper presents an automatic approach for segmentation of the liver from computer tomography (CT) images based on a 3D statistical shape model. Segmentation of the liver is an important prerequisite in liver surgery planning. One of the major challenges in building a 3D shape model from a training set of segmented instances of an object is the determination of the correspondence between different surfaces. We propose to use a geometric approach that is based on minimizing the distortion of the correspondence mapping between two different surfaces. For the adaption of the shape model to the image data a profile model based on the grey value appearance of the liver and its surrounding tissues in contrast enhanced CT data was developed. The robustness of this method results from a previous nonlinear diffusion filtering of the image data. Special focus is turned to the quantitative evaluation of the segmentation process. Several

