Results 1  10
of
105
Probabilistic Inference Using Markov Chain Monte Carlo Methods
, 1993
"... Probabilistic inference is an attractive approach to uncertain reasoning and empirical learning in artificial intelligence. Computational difculties arise, however, because probabilistic models with the necessary realism and flexibility lead to complex distributions over highdimensional spaces. Rel ..."
Abstract

Cited by 567 (20 self)
 Add to MetaCart
Probabilistic inference is an attractive approach to uncertain reasoning and empirical learning in artificial intelligence. Computational difculties arise, however, because probabilistic models with the necessary realism and flexibility lead to complex distributions over highdimensional spaces. Related problems in other fields have been tackled using Monte Carlo methods based on sampling using Markov chains, providing a rich array of techniques that can be applied to problems in artificial intelligence. The "Metropolis algorithm" has been used to solve difficult problems in statistical physics for over forty years, and, in the last few years, the related method of "Gibbs sampling" has been applied to problems of statistical inference. Concurrently, an alternative method for solving problems in statistical physics by means of dynamical simulation has been developed as well, and has recently been unified with the Metropolis algorithm to produce the "hybrid Monte Carlo" method. In computer science, Markov chain sampling is the basis of the heuristic optimization technique of "simulated annealing", and has recently been used in randomized algorithms for approximate counting of large sets. In this review, I outline the role of probabilistic inference in artificial intelligence, and present the theory of Markov chains, and describe various Markov chain Monte Carlo algorithms, along with a number of supporting techniques. I try to present a comprehensive picture of the range of methods that have been developed, including techniques from the varied literature that have not yet seen wide application in artificial intelligence, but which appear relevant. As illustrative examples, I use the problems of probabilitic inference in expert systems, discovery of latent classes from data, and Bayesian learning for neural networks.
A Unifying Review of Linear Gaussian Models
, 1999
"... Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observa ..."
Abstract

Cited by 260 (17 self)
 Add to MetaCart
Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observations and derivations made by many previous authors and introducing a new way of linking discrete and continuous state models using a simple nonlinearity. Through the use of other nonlinearities, we show how independent component analysis is also a variation of the same basic generative model. We show that factor analysis and mixtures of gaussians can be implemented in autoencoder neural networks and learned using squared error plus the same regularization term. We introduce a new model for static data, known as sensible principal component analysis, as well as a novel concept of spatially adaptive observation noise. We also review some of the literature involving global and local mixtures of the basic models and provide pseudocode for inference and learning for all the basic models.
The EM Algorithm for Mixtures of Factor Analyzers
, 1997
"... Factor analysis, a statistical method for modeling the covariance structure of high dimensional data using a small number of latent variables, can be extended by allowing different local factor models in different regions of the input space. This results in a model which concurrently performs cluste ..."
Abstract

Cited by 225 (18 self)
 Add to MetaCart
Factor analysis, a statistical method for modeling the covariance structure of high dimensional data using a small number of latent variables, can be extended by allowing different local factor models in different regions of the input space. This results in a model which concurrently performs clustering and dimensionality reduction, and can be thought of as a reduced dimension mixture of Gaussians. We present an exact ExpectationMaximization algorithm for fitting the parameters of this mixture of factor analyzers. 1 Introduction Clustering and dimensionality reduction have long been considered two of the fundamental problems in unsupervised learning (Duda & Hart, 1973; Chapter 6). In clustering, the goal is to group data points by similarity between their features. Conversely, in dimensionality reduction, the goal is to group (or compress) features that are highly correlated. In this paper we present an EM learning algorithm for a method which combines one of the basic forms of dime...
Independent Factor Analysis
 Neural Computation
, 1999
"... We introduce the independent factor analysis (IFA) method for recovering independent hidden sources from their observed mixtures. IFA generalizes and unifies ordinary factor analysis (FA), principal component analysis (PCA), and independent component analysis (ICA), and can handle not only square no ..."
Abstract

Cited by 219 (9 self)
 Add to MetaCart
We introduce the independent factor analysis (IFA) method for recovering independent hidden sources from their observed mixtures. IFA generalizes and unifies ordinary factor analysis (FA), principal component analysis (PCA), and independent component analysis (ICA), and can handle not only square noiseless mixing, but also the general case where the number of mixtures differs from the number of sources and the data are noisy. IFA is a twostep procedure. In the first step, the source densities, mixing matrix and noise covariance are estimated from the observed data by maximum likelihood. For this purpose we present an expectationmaximization (EM) algorithm, which performs unsupervised learning of an associated probabilistic model of the mixing situation. Each source in our model is described by a mixture of Gaussians, thus all the probabilistic calculations can be performed analytically. In the second step, the sources are reconstructed from the observed data by an optimal nonlinear ...
The Helmholtz Machine
, 1995
"... Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative model ..."
Abstract

Cited by 194 (22 self)
 Add to MetaCart
Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative models, each pattern can be generated in exponentially many ways. It is thus intractable to adjust the parameters to maximize the probability of the observed patterns. We describe a way of finessing this combinatorial explosion by maximizing an easily computed lower bound on the probability of the observations. Our method can be viewed as a form of hierarchical selfsupervised learning that may relate to the function of bottomup and topdown cortical processing pathways.
Learning with Labeled and Unlabeled Data
, 2001
"... In this paper, on the one hand, we aim to give a review on literature dealing with the problem of supervised learning aided by additional unlabeled data. On the other hand, being a part of the author's first year PhD report, the paper serves as a frame to bundle related work by the author as well as ..."
Abstract

Cited by 165 (3 self)
 Add to MetaCart
In this paper, on the one hand, we aim to give a review on literature dealing with the problem of supervised learning aided by additional unlabeled data. On the other hand, being a part of the author's first year PhD report, the paper serves as a frame to bundle related work by the author as well as numerous suggestions for potential future work. Therefore, this work contains more speculative and partly subjective material than the reader might expect from a literature review. We give a rigorous definition of the problem and relate it to supervised and unsupervised learning. The crucial role of prior knowledge is put forward, and we discuss the important notion of inputdependent regularization. We postulate a number of baseline methods, being algorithms or algorithmic schemes which can more or less straightforwardly be applied to the problem, without the need for genuinely new concepts. However, some of them might serve as basis for a genuine method. In the literature revi...
Parameter estimation for linear dynamical systems
, 1996
"... Linear systems have been used extensively in engineering to model and control the behavior of dynamical systems. In this note, we present the Expectation Maximization (EM) algorithm for estimating the parameters of linear systems (Shumway and Sto er, 1982). We also point out the relationship between ..."
Abstract

Cited by 156 (7 self)
 Add to MetaCart
Linear systems have been used extensively in engineering to model and control the behavior of dynamical systems. In this note, we present the Expectation Maximization (EM) algorithm for estimating the parameters of linear systems (Shumway and Sto er, 1982). We also point out the relationship between linear dynamical systems, factor analysis, and hidden Markov models.
Modeling the manifolds of images of handwritten digits
 IEEE Transactions on Neural Networks
, 1997
"... description length, density estimation. ..."
Generative models for discovering sparse distributed representations
 Philosophical Transactions of the Royal Society B
, 1997
"... We describe a hierarchical, generative model that can be viewed as a nonlinear generalization of factor analysis and can be implemented in a neural network. The model uses bottomup, topdown and lateral connections to perform Bayesian perceptual inference correctly. Once perceptual inference has b ..."
Abstract

Cited by 120 (5 self)
 Add to MetaCart
We describe a hierarchical, generative model that can be viewed as a nonlinear generalization of factor analysis and can be implemented in a neural network. The model uses bottomup, topdown and lateral connections to perform Bayesian perceptual inference correctly. Once perceptual inference has been performed the connection strengths can be updated using a very simple learning rule that only requires locally available information. We demonstrate that the network learns to extract sparse, distributed, hierarchical representations.
Bayesian Parameter Estimation Via Variational Methods
, 1999
"... We consider a logistic regression model with a Gaussian prior distribution over the parameters. We show that an accurate variational transformation can be used to obtain a closed form approximation to the posterior distribution of the parameters thereby yielding an approximate posterior predictiv ..."
Abstract

Cited by 106 (5 self)
 Add to MetaCart
We consider a logistic regression model with a Gaussian prior distribution over the parameters. We show that an accurate variational transformation can be used to obtain a closed form approximation to the posterior distribution of the parameters thereby yielding an approximate posterior predictive model. This approach is readily extended to binary graphical model with complete observations. For graphical models with incomplete observations we utilize an additional variational transformation and again obtain a closed form approximation to the posterior. Finally, we show that the dual of the regression problem gives a latent variable density model, the variational formulation of which leads to exactly solvable EM updates.