## Mean Field Theory for Sigmoid Belief Networks (1996)

### Cached

### Download Links

Venue: | Journal of Artificial Intelligence Research |

Citations: | 116 - 12 self |

### BibTeX

@ARTICLE{Saul96meanfield,

author = {Lawrence K. Saul and Tommi Jaakkola and Michael I. Jordan},

title = {Mean Field Theory for Sigmoid Belief Networks},

journal = {Journal of Artificial Intelligence Research},

year = {1996},

volume = {4},

pages = {61--76}

}

### Years of Citing Articles

### OpenURL

### Abstract

We develop a mean field theory for sigmoid belief networks based on ideas from statistical mechanics. Our mean field theory provides a tractable approximation to the true probability distribution in these networks

### Citations

8564 | Elements of Information Theory - Cover, Thomas - 2006 |

8094 | Maximum likelihood from incomplete data via the em algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context .... In this case, the energy of each network configuration is given (up to a constant) by minus the logarithm of its probability under 2 A similar average is performed in the E-step of an EM algorithm (=-=Dempster, Laird, & Rubin, 1977-=-); the difference here is that the average is performed over the mean field distribution,sQ(HjV ), rather than the true posterior, P (HjV ). For a related discussion, see Neal & Hinton (1993). 3 Our t... |

7053 |
Probabilistic Reasoning in Intelligent Systems
- Pearl
- 1988
(Show Context)
Citation Context ...ice of Naval Research contract N00014-94-1-0777. The authors were also supported by NSF grant CDA-9404932, ATR Research Laboratories, and Siemens Corporation. 1 Introduction Bayesian belief networks (=-=Pearl, 1988-=-; Lauritzen & Spiegelhalter, 1988) provide a rich graphical representation of probabilistic models. The nodes in these networks represent random variables, while the links represent causal influences.... |

3720 |
ªStochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images,º
- Geman, Geman
- 1984
(Show Context)
Citation Context ...arge number of hidden states. One approach to dealing with such networks has been to use Gibbs sampling (Pearl, 1988), a stochastic simulation methodology with roots in statistical mechanics (Geman & =-=Geman, 1984-=-). Our approach in this paper relies on a different tool from statistical mechanics--- namely, mean field theory (Parisi, 1988). The mean field approximation is well known for probabilistic models tha... |

1774 |
Introduction To The Theory Of Neural Computation
- Hertz, Krogh, et al.
- 1991
(Show Context)
Citation Context ...amma X i h i S i + (14) X i ln 2 4 1 + exp 0 @ X j J ij S j + h i 1 A 3 5 ; as follows from eq. (6). The first two terms in this equation are familiar from Markov networks with pairwise interactions (=-=Hertz, Krogh, & Palmer, 1991-=-); the last term is peculiar to sigmoid belief networks. Note that the overall energy is neither a linear function of the weights nor a polynomial function of the units. This is the price we pay in si... |

1364 |
Generalized linear models
- MacCullagh, Nelder
- 1989
(Show Context)
Citation Context ... propagate beliefs. In particular, P (S i jpa(S i )) depends on pa(S i ) only through a sum of weighted inputs, where the weights may be viewed as the parameters in a logistic regression (McCullagh & =-=Nelder, 1983-=-). The conditional probability distribution for S i may be summarized as: P (S i jpa(S i )) = exp hi P j J ij S j + h i j S i i 1 + exp h P j J ij S j + h i i : (4) Note that substituting S i = 1 in e... |

1284 | Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems (with Discussion - Lauritzen, Spiegelhalter - 1988 |

977 | Quantum Field Theory - Itzykson, Zuber - 1980 |

765 | A view of the em algorithm that justifies incremental, sparse, and other variants - Neal, Hinton - 1998 |

702 |
Numerical Recipes
- Press, Flannery, et al.
- 1986
(Show Context)
Citation Context ... belong to different units in the network. The minimization over f�� i g therefore reduces to N independent minimizations over the interval [0; 1]. These can be done by any number of standard meth=-=ods [10]. Fi-=-nding the mean activities f�� i g, on the other hand, is not so straightforward. One may naively perform gradient descent in the mean field free energy, eq. (25); the resulting gradients, however,... |

580 |
The computational complexity of probabilistic inference using bayesian belief networks
- Cooper
- 1990
(Show Context)
Citation Context ...conditional distributions required for inference; second, it provides a lower bound on the likelihoods required for learning. The problem of computing exact likelihoods in belief networks is NP-hard (=-=Cooper, 1990-=-); the same is true for approximating likelihoods to within a guaranteed degree of accuracy (Dagum & Luby, 1993). It follows that one cannot establish universal guarantees for the accuracy of the mean... |

431 | A learning algorithm for Boltzmann machines - Ackley, Hinton, et al. - 1985 |

248 | Operations for learning with graphical model
- Buntine
- 1994
(Show Context)
Citation Context ...f planning, reasoning, and uncertainty. Inference and learning in belief networks are possible insofar as one can efficiently compute (or approximate) the likelihood of observed patterns of evidence (=-=Buntine, 1994-=-; Russell, Binder, Koller, & Kanazawa, 1995). There exist provably efficient algorithms for computing likelihoods in belief networks with tree or chain-like architectures. In practice, these algorithm... |

239 |
Statistical Field Theory
- Parisi
- 1988
(Show Context)
Citation Context ...hastic simulation methodology with roots in statistical mechanics (Geman & Geman, 1984). Our approach in this paper relies on a different tool from statistical mechanics--- namely, mean field theory (=-=Parisi, 1988-=-). The mean field approximation is well known for probabilistic models that can be represented as undirected graphs---so-called Markov networks. For example, in Boltzmann machines (Ackley, Hinton, & S... |

224 | The wake-sleep algorithm for unsupervised neural networks - Hinton, Dayan, et al. - 1995 |

193 | The Helmholtz machine - Dayan, Hinton, et al. - 1995 |

181 |
Connectionist learning of belief networks
- Neal
- 1992
(Show Context)
Citation Context ...resentations as undirected graphs. As we shall see, avoiding this complexity and working directly on DAGs requires an extension of existing methods. In this paper we focus on sigmoid belief networks (=-=Neal, 1992-=-), for which the resulting mean field theory is most straightforward. These are networks of binary random variables whose local conditional distributions are based on log-linear models. We develop a m... |

144 | A mean field theory learning algorithm for neural networks - Peterson, Anderson - 1987 |

118 |
Introduction to the Theory of Neural Computation (Addison-Wesley
- Hertz, A, et al.
- 1996
(Show Context)
Citation Context ...lower bound on the likelihood of any partial instantiation of the network's activity. The feedforward directionality of belief networks gives rise to terms that do not appear in the mean field theory =-=[6, 7, 8]-=- for symmetric networks of binary units. These terms motivate the introduction of extra mean field parameters, in addition to the mean activities of the units in the network. The tightest possible bou... |

98 | Exploiting tractable substructures in intractable networks - Saul, Jordan - 1995 |

76 | K.: Local learning in probabilistic networks with hidden variables - Russell, Binder, et al. - 1995 |

66 |
Statistical field theory, Addison–Wesley
- Parisi
- 1988
(Show Context)
Citation Context ...lower bound on the likelihood of any partial instantiation of the network's activity. The feedforward directionality of belief networks gives rise to terms that do not appear in the mean field theory =-=[6, 7, 8]-=- for symmetric networks of binary units. These terms motivate the introduction of extra mean field parameters, in addition to the mean activities of the units in the network. The tightest possible bou... |

46 | Kjaerul , U.: Blocking gibbs sampling in very large probabilistic expert systems - Jensen, Kong - 1993 |

41 | M.: Computing upper and lower bounds on likelihoods in intractable networks - Jaakkola, Jordan - 1996 |

33 | A View of the EM Algorithm that Justi es Incremental, Sparse, and Other Variants - Neal, Hinton - 1998 |

26 | Algebraic transformations of objective functions. Neural Networks
- Mjolsness, Garrett
- 1990
(Show Context)
Citation Context ...descent in the mean field free energy, eq. (25); the resulting gradients, however, are quite complicated. We can make the problem more manageable by first performing a simple algebraic transformation =-=[11]-=-. In our case, the transformation is based on the following observation: let OE andsdenote positive real numbers; then for all , ln OEs[OE \Gamma (1 + ln )] ; (26) with strict equality holding fors= 1... |

21 | Fast learning by bounding likelihoods in sigmoid type belief networks - Jaakkola, Saul, et al. - 1996 |

19 | Approximately probabilistic reasoning in Bayesian belief networks is NP-hard - Dagum, Luby - 1993 |

9 | Annealed theories of learning
- Seung
- 1995
(Show Context)
Citation Context ...4]. As in symmetric networks [6, 7], we expect the resulting mean field algorithms to scale much better to large problems than Gibbs sampling. Acknowledgements We are grateful to the authors of refs. =-=[4, 5, 9]-=- for sending us early versions of their manuscripts and for providing many stimulating discussions about this work. We also thank P. Dayan and J. Tenenbaum for helpful comments on a rough draft. Refer... |

9 |
A mean eld theory learning algorithm for neural networks
- Peterson, Anderson
- 1987
(Show Context)
Citation Context ...works. For example, in Boltzmann machines (Ackley, Hinton, & Sejnowski, 1985), mean eld learning rules have been shown to yield tremendous savings in time and computation over sampling-based methods (=-=Peterson & Anderson, 1987-=-). The main motivation for this work was to extend the mean eld approximation for undirected graphical models to their directed counterparts. Since belief networks can be transformed to Markov network... |

6 | Does the wake-sleep algorithm learn good density estimators - Frey, Hinton, et al. - 1996 |

2 | Mixture model approximations for belief networks. Manuscript in preparation - Jaakkola, Jordan - 1996 |

1 | Numerical Recipes. Cambrige - Press, Flannery, et al. - 1986 |

1 | Theory for Sigmoid Belief Networks - Field - 1995 |