Results 1 
7 of
7
Approximating Posterior Distributions in Belief Networks using Mixtures
 Advances in Neural Information Processing Systems 10
, 1998
"... Exact inference in densely connected Bayesian networks is computationally intractable, and so there is considerable interest in developing effective approximation schemes. One approach which has been adopted is to bound the log likelihood using a meanfield approximating distribution. While this lea ..."
Abstract

Cited by 29 (8 self)
 Add to MetaCart
Exact inference in densely connected Bayesian networks is computationally intractable, and so there is considerable interest in developing effective approximation schemes. One approach which has been adopted is to bound the log likelihood using a meanfield approximating distribution. While this leads to a tractable algorithm, the mean field distribution is assumed to be factorial and hence unimodal. In this paper we demonstrate the feasibility of using a richer class of approximating distributions based on mixtures of mean field distributions. We derive an efficient algorithm for updating the mixture parameters and apply it to the problem of learning in sigmoid belief networks. Our results demonstrate a systematic improvement over simple mean field theory as the number of mixture components is increased. 1 Introduction Bayesian belief networks can be regarded as a fully probabilistic interpretation of feedforward neural networks. Maximum likelihood learning for Bayesian n...
Ensemble learning in Bayesian neural networks
 Neural Networks and Machine Learning
, 1998
"... Bayesian treatments of learning in neural networks are typically based either on a local Gaussian approximation to a mode of the posterior weight distribution, or on Markov chain Monte Carlo simulations. A third approach, called ensemble learning, was introduced by Hinton and van Camp (1993). It aim ..."
Abstract

Cited by 23 (5 self)
 Add to MetaCart
Bayesian treatments of learning in neural networks are typically based either on a local Gaussian approximation to a mode of the posterior weight distribution, or on Markov chain Monte Carlo simulations. A third approach, called ensemble learning, was introduced by Hinton and van Camp (1993). It aims to approximate the posterior distribution by minimizing the KullbackLeibler divergence between the true posterior and a parametric approximating distribution. The original derivation of a deterministic algorithm relied on the use of a Gaussian approximating distribution with a diagonal covariance matrix and hence was unable to capture the posterior correlations between parameters. In this chapter we show how the ensemble learning approach can be extended to fullcovariance Gaussian distributions while remaining computationally tractable. We also extend the framework to deal with hyperparameters, leading to a simple reestimation procedure. One of the benefits of our approach is that it yields a strict lower bound on the marginal likelihood, in contrast to other approximate procedures. 1
Variational learning in nonlinear Gaussian belief networks
 Neural Computation
, 1999
"... We view perceptual tasks such as vision and speech recognition as inference problems where the goal is to estimate the posterior distribution over latent variables (e.g., depth in stereo vision) given the sensory input. The recent flurry of research in independent component analysis exemplifies the ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
We view perceptual tasks such as vision and speech recognition as inference problems where the goal is to estimate the posterior distribution over latent variables (e.g., depth in stereo vision) given the sensory input. The recent flurry of research in independent component analysis exemplifies the importance of inferring the continuousvalued latent variables of input data. The latent variables found by this method are linearly related to the input, but perception requires nonlinear inferences such as classification and depth estimation. In this paper, we present a unifying framework for stochastic neural networks with nonlinear latent variables. Nonlinear units are obtained by passing the outputs of linear Gaussian units through various nonlinearities. We present a general variational method that maximizes a lower bound on the likelihood of a training set and give results on two visual feature extraction problems. We also show how the variational method can be used for pattern classification and compare the performance of these nonlinear networks with other methods on the problem of handwritten digit recognition. 1
Advances in Algorithms for Inference and Learning in Complex Probability Models for Vision
 IEEE Trans. PAMI
, 2002
"... Computer vision is currently one of the most exciting areas of artificial intelligence research, largely because it has recently become possible to record, store and process large amounts of visual data. While impressive achievements have been made in pattern classification problems such as handw ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
Computer vision is currently one of the most exciting areas of artificial intelligence research, largely because it has recently become possible to record, store and process large amounts of visual data. While impressive achievements have been made in pattern classification problems such as handwritten character recognition and face detection, it is even more exciting that researchers may be on the verge of introducing computer vision systems that perform scene analysis, decomposing a video into its constituent objects, lighting conditions, motion patterns, and so on. Two of the main challenges in computer vision are finding efficient models of the physics of visual scenes and finding efficient algorithms for inference and learning in these models. In this paper, we advocate the use of graphbased generative probability models and their associated inference and learning algorithms for computer vision and scene analysis. We review exact techniques and various approximate, computationally efficient techniques, including iterative conditional modes, the expectation maximization algorithm, the mean field method, variational techniques, structured variational techniques, Gibbs sampling, the sumproduct algorithm and "loopy" belief propagation. We describe how each technique can be applied to an illustrative example of inference and learning in models of multiple, occluding objects, and compare the performances of the techniques.
Mixture Representations for Inference and Learning in Boltzmann Machines
 Uncertainty in Artificial Intelligence: Proceedings of the Fourteenth Conference
, 1998
"... Boltzmann machines are undirected graphical models with twostate stochastic variables, in which the logarithms of the clique potentials are quadratic functions of the node states. They have been widely studied in the neural computing literature, although their practical applicability has been ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
Boltzmann machines are undirected graphical models with twostate stochastic variables, in which the logarithms of the clique potentials are quadratic functions of the node states. They have been widely studied in the neural computing literature, although their practical applicability has been limited by the difficulty of finding an effective learning algorithm. One wellestablished approach, known as mean field theory, represents the stochastic distribution using a factorized approximation. However, the corresponding learning algorithm often fails to find a good solution. We conjecture that this is due to the implicit unimodality of the mean field approximation which is therefore unable to capture multimodality in the true distribution. In this paper we use variational methods to approximate the stochastic distribution using multimodal mixtures of factorized distributions. We present results for both inference and learning to demonstrate the effectiveness of t...
A Variational Bayesian Committee of Neural Networks
, 1999
"... Exact inference in Bayesian neural networks is non analytic to compute, approximate methods such as the evidence procedure, MonteCarlo sampling and variational inference have been proposed. In this paper we present a general overview of the Bayesian approach, with a particular emphasis on the varia ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
Exact inference in Bayesian neural networks is non analytic to compute, approximate methods such as the evidence procedure, MonteCarlo sampling and variational inference have been proposed. In this paper we present a general overview of the Bayesian approach, with a particular emphasis on the variational procedure. We then present a new approximating distribution based on mixtures of Gaussian distributions and show how it may be implemented. We present results on a simple toy problem and on two real world data sets
Markovian Inference in Belief Networks
 Presented at Machines That Learn
, 1998
"... Bayesian belief networks can represent the complicated probabilistic processes that form natural sensory inputs. Once the parameters of the network have been learned, nonlinear inferences about the input can be made by computing the posterior distribution over the hidden units (e.g., depth in stereo ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Bayesian belief networks can represent the complicated probabilistic processes that form natural sensory inputs. Once the parameters of the network have been learned, nonlinear inferences about the input can be made by computing the posterior distribution over the hidden units (e.g., depth in stereo vision) given the input. Computing the posterior distribution exactly is not practical in richlyconnected networks, but it turns out that by using a variational (a.k.a., mean field) method, it is easy to find a productform distribution that approximates the true posterior distribution. This approximation assumes that the hidden variables are independent given the current input. In this paper, we explore a more powerful variational technique that models the posterior distribution using a Markov chain. We compare this method with inference using mean fields and mixtures of mean fields in randomly generated networks. Submitted to NIP98, Algorithms and Architectures, oral presentation. 1 Intr...