Results 11  20
of
149
Generative models for discovering sparse distributed representations
 Philosophical Transactions of the Royal Society B
, 1997
"... We describe a hierarchical, generative model that can be viewed as a nonlinear generalization of factor analysis and can be implemented in a neural network. The model uses bottomup, topdown and lateral connections to perform Bayesian perceptual inference correctly. Once perceptual inference has b ..."
Abstract

Cited by 138 (6 self)
 Add to MetaCart
(Show Context)
We describe a hierarchical, generative model that can be viewed as a nonlinear generalization of factor analysis and can be implemented in a neural network. The model uses bottomup, topdown and lateral connections to perform Bayesian perceptual inference correctly. Once perceptual inference has been performed the connection strengths can be updated using a very simple learning rule that only requires locally available information. We demonstrate that the network learns to extract sparse, distributed, hierarchical representations.
Bayesian Parameter Estimation Via Variational Methods
, 1999
"... We consider a logistic regression model with a Gaussian prior distribution over the parameters. We show that an accurate variational transformation can be used to obtain a closed form approximation to the posterior distribution of the parameters thereby yielding an approximate posterior predictiv ..."
Abstract

Cited by 129 (6 self)
 Add to MetaCart
(Show Context)
We consider a logistic regression model with a Gaussian prior distribution over the parameters. We show that an accurate variational transformation can be used to obtain a closed form approximation to the posterior distribution of the parameters thereby yielding an approximate posterior predictive model. This approach is readily extended to binary graphical model with complete observations. For graphical models with incomplete observations we utilize an additional variational transformation and again obtain a closed form approximation to the posterior. Finally, we show that the dual of the regression problem gives a latent variable density model, the variational formulation of which leads to exactly solvable EM updates.
EM Algorithms for PCA and SPCA
 in Advances in Neural Information Processing Systems
, 1998
"... I present an expectationmaximization (EM) algorithm for principal component analysis (PCA). The algorithm allows a few eigenvectors and eigenvalues to be extracted from large collections of high dimensional data. It is computationally very efficient in space and time. It also naturally accommodates ..."
Abstract

Cited by 126 (1 self)
 Add to MetaCart
(Show Context)
I present an expectationmaximization (EM) algorithm for principal component analysis (PCA). The algorithm allows a few eigenvectors and eigenvalues to be extracted from large collections of high dimensional data. It is computationally very efficient in space and time. It also naturally accommodates missing information. I also introduce a new variant of PCA called sensible principal component analysis (SPCA) which defines a proper density model in the data space. Learning for SPCA is also done with an EM algorithm. I report results on synthetic and real data showing that these EM algorithms correctly and efficiently find the leading eigenvectors of the covariance of datasets in a few iterations using up to hundreds of thousands of datapoints in thousands of dimensions.
A SplitMerge Markov Chain Monte Carlo Procedure for the Dirichlet Process Mixture Model
 Journal of Computational and Graphical Statistics
, 2000
"... . We propose a splitmerge Markov chain algorithm to address the problem of inefficient sampling for conjugate Dirichlet process mixture models. Traditional Markov chain Monte Carlo methods for Bayesian mixture models, such as Gibbs sampling, can become trapped in isolated modes corresponding to an ..."
Abstract

Cited by 124 (0 self)
 Add to MetaCart
(Show Context)
. We propose a splitmerge Markov chain algorithm to address the problem of inefficient sampling for conjugate Dirichlet process mixture models. Traditional Markov chain Monte Carlo methods for Bayesian mixture models, such as Gibbs sampling, can become trapped in isolated modes corresponding to an inappropriate clustering of data points. This article describes a MetropolisHastings procedure that can escape such local modes by splitting or merging mixture components. Our MetropolisHastings algorithm employs a new technique in which an appropriate proposal for splitting or merging components is obtained by using a restricted Gibbs sampling scan. We demonstrate empirically that our method outperforms the Gibbs sampler in situations where two or more components are similar in structure. Key words: Dirichlet process mixture model, Markov chain Monte Carlo, MetropolisHastings algorithm, Gibbs sampler, splitmerge updates 1 Introduction Mixture models are often applied to density estim...
Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance
 Psychological Bulletin
, 1989
"... Addresses issues related to partial measurement in variance using a tutorial approach based on the LISREL confirmatory factor analytic model. Specifically, we demonstrate procedures for (a) using "sensitivity analyses " to establish stable and substantively wellfitting baseline models, (b ..."
Abstract

Cited by 122 (4 self)
 Add to MetaCart
(Show Context)
Addresses issues related to partial measurement in variance using a tutorial approach based on the LISREL confirmatory factor analytic model. Specifically, we demonstrate procedures for (a) using "sensitivity analyses " to establish stable and substantively wellfitting baseline models, (b) determining partially invariant measurement parameters, and (c) testing for the invariance of factor covariance and mean structures, given partial measurement invariance. We also show, explicitly, the transformation of parameters from an all^fto an ally model specification, for purposes of testing mean structures. These procedures are illustrated with multidimensional selfconcept data from low ( « = 248) and high (n = 582) academically tracked high school adolescents. An important assumption in testing for mean differences is that the measurement (Drasgow & Kanfer, 1985; Labouvie,
Maximum Likelihood and Covariant Algorithms for Independent Component Analysis
, 1996
"... Bell and Sejnowski (1995) have derived a blind signal processing algorithm for a nonlinear feedforward network from an information maximization viewpoint. This paper first shows that the same algorithm can be viewed as a maximum likelihood algorithm for the optimization of a linear generative model ..."
Abstract

Cited by 114 (1 self)
 Add to MetaCart
Bell and Sejnowski (1995) have derived a blind signal processing algorithm for a nonlinear feedforward network from an information maximization viewpoint. This paper first shows that the same algorithm can be viewed as a maximum likelihood algorithm for the optimization of a linear generative model. Second, a covariant version of the algorithm is derived. This algorithm is simpler and somewhat more biologically plausible, involving no matrix inversions; and it converges in a smaller number of iterations. Third, this paper gives a partial proof of the `folktheorem' that any mixture of sources with highkurtosis histograms is separable by the classic ICA algorithm. Fourth, a collection of formulae are given that may be useful for the adaptation of the nonlinearity in the ICA algorithm. 1 Blind separation Algorithms for blind separation (Jutten and Herault 1991; Comon et al. 1991; Bell and Sejnowski 1995; Hendin et al. 1994) attempt to recover source signals s from observations x whic...
A Hierarchical Latent Variable Model for Data Visualization
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1997
"... Visualization has proven to be a powerful and widelyapplicable tool for the analysis and interpretation of multivariate data. Most visualization algorithms aim to find a projection from the data space down to a twodimensional visualization space. However, for complex data sets living in a highdi ..."
Abstract

Cited by 98 (10 self)
 Add to MetaCart
(Show Context)
Visualization has proven to be a powerful and widelyapplicable tool for the analysis and interpretation of multivariate data. Most visualization algorithms aim to find a projection from the data space down to a twodimensional visualization space. However, for complex data sets living in a highdimensional space it is unlikely that a single twodimensional projection can reveal all of the interesting structure. We therefore introduce a hierarchical visualization algorithm which allows the complete data set to be visualized at the top level, with clusters and subclusters of data points visualized at deeper levels. The algorithm is based on a hierarchical mixture of latent variable models, whose parameters are estimated using the expectationmaximization algorithm. We demonstrate the principle of the approach on a toy data set, and we then apply the algorithm to the visualization of a synthetic data set in 12 dimensions obtained from a simulation of multiphase flows in oil pipelines,...
Using Generative Models for Handwritten Digit Recognition
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 1996
"... We describe a method of recognizing handwritten digits by fitting generative models that are built from deformable Bsplines with Gaussian "ink generators" spaced along the length of the spline. The splines are adjusted using a novel elastic matching procedure based on the Expectation Maxi ..."
Abstract

Cited by 73 (8 self)
 Add to MetaCart
We describe a method of recognizing handwritten digits by fitting generative models that are built from deformable Bsplines with Gaussian "ink generators" spaced along the length of the spline. The splines are adjusted using a novel elastic matching procedure based on the Expectation Maximization (EM) algorithm that maximizes the likelihood of the model generating the data. This approach has many advantages. (1) After identifying the model most likely to have generated the data, the system not only produces a classification of the digit but also a rich description of the instantiation parameters which can yield information such as the writing style. (2) During the process of explaining the image, generative models can perform recognition driven segmentation. (3) The method involves a relatively small number of parameters and hence training is relatively easy and fast. (4) Unlike many other recognition schemes it does not rely on some form of prenormalization of input images, but can ...
Blind Source Separation and Deconvolution: The Dynamic Component Analysis Algorithm
 Neural Computation
, 1998
"... We derive a novel family of unsupervised learning algorithms for blind separation of mixed and convolved sources. Our approach is based on formulating the separation problem as a learning task of a spatiotemporal generative model, whose parameters are adapted iteratively to minimize suitable error ..."
Abstract

Cited by 48 (6 self)
 Add to MetaCart
(Show Context)
We derive a novel family of unsupervised learning algorithms for blind separation of mixed and convolved sources. Our approach is based on formulating the separation problem as a learning task of a spatiotemporal generative model, whose parameters are adapted iteratively to minimize suitable error functions, thus ensuring stability of the algorithms. The resulting learning rules achieve separation by exploiting highorder spatiotemporal statistics of the mixture data. Different rules are obtained by learning generative models in the frequency and time domains, whereas a hybrid frequency/time model leads to the best performance. These algorithms generalize independent component analysis to the case of convolutive mixtures and exhibit superior performance on instantaneous mixtures. An extension of the relativegradient concept to the spatiotemporal case leads to fast and efficient learning rules with equivariant properties. Our approach can incorporate information about the mixing sit...
Switching StateSpace Models
 King’s College Road, Toronto M5S 3H5
, 1996
"... We introduce a statistical model for times series data with nonlinear dynamics which iteratively segments the data into regimes with approximately linear dynamics and learns the parameters of each of those regimes. This model combines and generalizes two of the most widely used stochastic time se ..."
Abstract

Cited by 45 (2 self)
 Add to MetaCart
(Show Context)
We introduce a statistical model for times series data with nonlinear dynamics which iteratively segments the data into regimes with approximately linear dynamics and learns the parameters of each of those regimes. This model combines and generalizes two of the most widely used stochastic time series modelsthe hidden Markov model and the linear dynamical systemand is related to models that are widely used in the control and econometrics literatures. It can also be derived by extending the mixture of experts neural network model (Jacobs et al., 1991) to its fully dynamical version, in which both expert and gating networks are recurrent. Inferring the posterior probabilities of the hidden states of this model is computationally intractable, and therefore the exact Expectation Maximization (EM) alogithm cannot be applied. However, we present a variational approximation which maximizes a lower bound on the log likelihood and makes use of both the forwardbackward recursio...