Results 1 
9 of
9
AN INTRODUCTION TO VARIATIONAL METHODS FOR GRAPHICAL MODELS
 TO APPEAR: M. I. JORDAN, (ED.), LEARNING IN GRAPHICAL MODELS
"... ..."
Mean Field Theory for Sigmoid Belief Networks
 Journal of Artificial Intelligence Research
, 1996
"... We develop a mean field theory for sigmoid belief networks based on ideas from statistical mechanics. ..."
Abstract

Cited by 123 (12 self)
 Add to MetaCart
We develop a mean field theory for sigmoid belief networks based on ideas from statistical mechanics.
A Mean Field Learning Algorithm For Unsupervised Neural Networks
, 1999
"... . We introduce a learning algorithm for unsupervised neural networks based on ideas from statistical mechanics. The algorithm is derived from a mean field approximation for large, layered sigmoid belief networks. We show how to (approximately) infer the statistics of these networks without resort to ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
. We introduce a learning algorithm for unsupervised neural networks based on ideas from statistical mechanics. The algorithm is derived from a mean field approximation for large, layered sigmoid belief networks. We show how to (approximately) infer the statistics of these networks without resort to sampling. This is done by solving the mean field equations, which relate the statistics of each unit to those of its Markov blanket. Using these statistics as target values, the weights in the network are adapted by a local delta rule. We evaluate the strengths and weaknesses of these networks for problems in statistical pattern recognition. 1. Introduction Multilayer neural networks trained by backpropagation provide a versatile framework for statistical pattern recognition. They are popular for many reasons, including the simplicity of the learning rule and the potential for discovering hidden, distributed representations of the problem space. Nevertheless, there are many issues that are...
Attractor Dynamics in Feedforward Neural Networks
"... this article, we show that this linkage of attractor dynamics and probabilistic inference is not limited to symmetric networks or (equivalently) to models represented as undirected graphs. We investigate an attractor dynamics for feedforward networks, or directed acyclic graphs (DAGs); these are net ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
this article, we show that this linkage of attractor dynamics and probabilistic inference is not limited to symmetric networks or (equivalently) to models represented as undirected graphs. We investigate an attractor dynamics for feedforward networks, or directed acyclic graphs (DAGs); these are networks with directed edges but no directed loops. The probabilistic models represented by DAGs are known as Bayesian networks, and together with MRFs, they comprise the class of probabilistic models known as graphical models (Lauritzen, 1996). Like their undirected counterparts, Bayesian networks have been proposed as models of both artificial and biological intelligence (Pearl, 1988).
An Introduction to Variational Methods for Graphical Methods
 Machine Learning
, 1998
"... . This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields). We present a number of examples of graphical models, including the QMRDT database, the sigmoid belief network, the Boltzmann m ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
. This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields). We present a number of examples of graphical models, including the QMRDT database, the sigmoid belief network, the Boltzmann machine, and several variants of hidden Markov models, in which it is infeasible to run exact inference algorithms. We then introduce variational methods, which exploit laws of large numbers to transform the original graphical model into a simplified graphical model in which inference is efficient. Inference in the simpified model provides bounds on probabilities of interest in the original model. We describe a general framework for generating variational transformations based on convex duality. Finally we return to the examples and demonstrate how variational algorithms can be formulated in each case.
Prior Information and Generalized Questions
, 1996
"... In learning problems available information is usually divided into two categories: examples of function values (or training data) and prior information (e.g. a smoothness constraint). ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
In learning problems available information is usually divided into two categories: examples of function values (or training data) and prior information (e.g. a smoothness constraint).
VapnikChervonenkis entropy of the spherical perceptron
, 1996
"... Perceptron learning of randomly labeled patterns is analyzed using a Gibbs distribution on the set of realizable labelings of the patterns. The entropy ofthis distribution is an extension of the VapnikChervonenkis (VC) entropy, reducing to it exactly in the limit of in nite temperature. The close r ..."
Abstract
 Add to MetaCart
Perceptron learning of randomly labeled patterns is analyzed using a Gibbs distribution on the set of realizable labelings of the patterns. The entropy ofthis distribution is an extension of the VapnikChervonenkis (VC) entropy, reducing to it exactly in the limit of in nite temperature. The close relationship between the VC and Gardner entropies can be seen within the replica formalism. There has been recent progress towards understanding the relationship between the statistical physics and VapnikChervonenkis (VC) approaches to learning theory[1, 2, 3, 4]. The two approaches can be uni ed in a statistical mechanics based on the VC entropy. This paper treats the case of learning randomly labeled patterns, or the capacity problem, and extends some of the results of previous work[5, 6] to nite temperature. As will be explained in a companion paper, this extension is important for treating the generalization problem, which occurs in the context of learning patterns labeled by a target rule. Our general framework is illustrated for the simple perceptron sgn(w x), which maps an Ndimensional realvalued input x to a 1valued output. Given asampleX =(x1�:::�xm) of inputs, the weight vector w determines a labeling L =(l1�:::�lm) of the sample via li = sgn(w xi). The weight vector w de nes a normal hyperplane that separates the positive from the negative examples. The training error of a labeling L with respect to a reference labeling L0 is de ned by et(L � L 0) = 1 m mX i=1 1; lil 0
Exploration of MeanField Approximation for FeedForward Networks
, 1999
"... We present a formulation of meanfield approximation for layered feedforward stochastic networks. In this formulation, one can obtain not only estimates of averages for state variables of the networks but also those of intralayer correlations, the latter of which cannot be obtained by the conventi ..."
Abstract
 Add to MetaCart
We present a formulation of meanfield approximation for layered feedforward stochastic networks. In this formulation, one can obtain not only estimates of averages for state variables of the networks but also those of intralayer correlations, the latter of which cannot be obtained by the conventional meanfield approximation. Moreover, this formulation provides a framework to treat "conditional" expectations, expectations under the constraint that external information about statistics are fed to some layers of the network, which plays an important role in several applications such as the Helmholtz machine.