## Variational inference for continuous sigmoidal Bayesian networks (1996)

Venue: | In Sixth International Workshop on Artificial Intelligence and Statistics |

Citations: | 2 - 2 self |

### BibTeX

@INPROCEEDINGS{Frey96variationalinference,

author = {Brendan J. Frey},

title = {Variational inference for continuous sigmoidal Bayesian networks},

booktitle = {In Sixth International Workshop on Artificial Intelligence and Statistics},

year = {1996}

}

### OpenURL

### Abstract

Latent random variables can be useful for modelling covariance relationships between observed variables. The choice of whether specific latent variables ought to be continuous or discrete is often an arbitrary one. In a previous paper, I presented a "unit" that could adapt to be continuous or binary, as appropriate for the current problem, and showed how a Markov chain Monte Carlo method could be used for inference and parameter estimation in Bayesian networks of these units. In this paper, I develop a variational inference technique in the hope that it will prove to be more computationally efficient than Monte Carlo methods. After presenting promising inference results on a toy problem, I discuss why the variational technique does not work well for parameter estimation as compared to Monte Carlo.

### Citations

7053 |
Probabilistic Reasoning in Intelligent Systems
- Pearl
- 1988
(Show Context)
Citation Context ... are best represented using real values. A great deal of work has been done on Gaussian random variables that are linked linearly such that the joint distribution over all variables is also Gaussian (=-=Pearl 1988; Shachter-=- and Kenley 1989; Spiegelhalter 1990; Heckerman and Geiger 1995) --- see also "factor analysis" (Everitt 1984). Lauritzen and Wermuth (1989) and Lauritzen, Dawid, Larsen, and Leimer (1990) h... |

1363 |
Generalized linear models
- McCullagh, Nelder
- 1990
(Show Context)
Citation Context ...y, there has been a surge of interest in inference and parameter estimation in Bayesian networks with discrete-valued variables whose conditional distributions are modelled using logistic regression (=-=McCullagh and Nelder 1983-=-). Approximate inference methods for richlyconnected Bayesian networks of this sort have been developed, including Markov chain Monte Carlo methods (Neal 1992), Helmholtz machines (Dayan et. al. 1995;... |

1284 |
Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems (with Discussion
- Lauritzen, Spiegelhalter
- 1988
(Show Context)
Citation Context ...nte Carlo. Introduction Inference in multiply-connected Bayesian networks with real-valued random variables is a difficult problem. Methods such as probability propagation (Gallager 1963; Pearl 1986; =-=Lauritzen and Spiegelhalter 1988-=-) are exact only for singly-connected networks, and techniques for converting multiply-connected networks to singly-connected networks often lead to overly-complex cluster variables. Real-valued varia... |

1081 |
Practical Methods of Optimization
- Fletcher
- 1981
(Show Context)
Citation Context ...idge. The upper bound on F (Q; P ) given by the variance bound is obviously not quadratic in thesj 's or s j 's. I have implemented a variational optimization algorithm that uses conjugate gradients (=-=Fletcher 1987) to minim-=-ize the upper bound on F (Q; P ). Results The variational inference algorithm described above was applied to a toy network that was estimated using the "slice sampling" Markov chain Monte Ca... |

889 | Low-density parity-check codes
- Gallager
- 1962
(Show Context)
Citation Context ...stimation as compared to Monte Carlo. Introduction Inference in multiply-connected Bayesian networks with real-valued random variables is a difficult problem. Methods such as probability propagation (=-=Gallager 1963-=-; Pearl 1986; Lauritzen and Spiegelhalter 1988) are exact only for singly-connected networks, and techniques for converting multiply-connected networks to singly-connected networks often lead to overl... |

764 | A view of the EM algorithm that justifies incremental sparse and other variants
- Neal, Hinton
- 1998
(Show Context)
Citation Context ...by Ghahramani, Jaakkola, Jordan and Saul (Saul et. al. 1996; Jaakkola et. al. 1996; Ghahramani and Jordan 1996). It is based on a variational interpretation of the expectation maximization algorithm (=-=Neal and Hinton 1993-=-) that was used to devise the Helmholtz machine (Dayan et. al. 1995; Hinton et. al. 1995). The idea is to introduce a second parametric distribution Q(fx j g j2H ) over the hidden variables, whose par... |

580 |
The computational complexity of probabilistic inference using bayesian belief networks
- Cooper
- 1990
(Show Context)
Citation Context ...f a set of visible (observed) variables fx i g i2V , inferring the distribution P (fx j g j2H jfx i g i2V ) over the remaining set of hidden (unobserved) variables fx j g j2H , is in general NP-hard (=-=Cooper 1990-=-). Inference is especially difficult when the variables are realvalued. Efficient algorithms such as probability propagation (Gallager 1963; Pearl 1986; Lauritzen and Spiegelhalter 1988) are exact onl... |

489 | Factorial Hidden Markov Models - Ghahramani, Jordan - 1998 |

376 |
Evaluating influence diagrams
- Shachter
- 1986
(Show Context)
Citation Context ...presented using real values. A great deal of work has been done on Gaussian random variables that are linked linearly such that the joint distribution over all variables is also Gaussian (Pearl 1988; =-=Shachter and Kenley 1989; Spiegelh-=-alter 1990; Heckerman and Geiger 1995) --- see also "factor analysis" (Everitt 1984). Lauritzen and Wermuth (1989) and Lauritzen, Dawid, Larsen, and Leimer (1990) have included discrete rand... |

283 | Learning and relearning in Boltzmann machines - Hinton, Sejnowski - 1986 |

224 | The wake-sleep algorithm for unsupervised neural networks - Hinton, Dayan, et al. - 1995 |

197 | Sequential updating of conditional probabilities on directed graphical structures - Spiegelhalter, Lauritzen - 1990 |

193 | The Helmholtz machine - Dayan, Hinton, et al. - 1995 |

181 |
Connectionist learning of belief networks
- Neal
- 1992
(Show Context)
Citation Context ...sing logistic regression (McCullagh and Nelder 1983). Approximate inference methods for richlyconnected Bayesian networks of this sort have been developed, including Markov chain Monte Carlo methods (=-=Neal 1992), Helmhol-=-tz machines (Dayan et. al. 1995; Hinton et. al. 1995), and variational (sometimes called "mean field") techniques (Saul et. al. 1996; Jaakkola et. al. 1996). However, some hidden variables, ... |

164 | Graphical models for associations between variables, some of which are qualitative and some quantitative’, Annals of Statistics - Lauritzen, Wermuth - 1989 |

140 | Independence properties of directed markov fields - Lauritzen, Dawid, et al. - 1990 |

132 |
An introduction to latent variable models
- Everitt
- 1984
(Show Context)
Citation Context ...ked linearly such that the joint distribution over all variables is also Gaussian (Pearl 1988; Shachter and Kenley 1989; Spiegelhalter 1990; Heckerman and Geiger 1995) --- see also "factor analys=-=is" (Everitt 1984-=-). Lauritzen and Wermuth (1989) and Lauritzen, Dawid, Larsen, and Leimer (1990) have included discrete random variables within the linear Gaussian framework. Recently, inference in networks of Gaussia... |

116 | M.: Mean field theory for sigmoid belief networks - Saul, Jaakkola, et al. - 1996 |

46 | Markov chain Monte Carlo methods based on `slicing' the density function
- Neal
- 1997
(Show Context)
Citation Context ...opagation (Gallager 1963; Pearl 1986; Lauritzen and Spiegelhalter 1988) are exact only for singly-connected networks. In (Frey 1997), I used a Markov chain Monte Carlo method called "slice sampli=-=ng" (Neal 1996-=-) to obtain an approximate sample from P (fx j g j2H jfx i g i2V ), which could then be used for inference. In contrast to both the rather unprincipled approach of applying probability propagation to ... |

43 | Learning Bayesian networks: A unification for discrete and Gaussian domains - Heckerman, Geiger - 1995 |

40 | Bayesian neural networks and density networks - MacKay - 1995 |

21 | Implementation of continuous Bayesian networks usings sums of weighted Gaussians - Driver, Morrel - 1995 |

21 | Discovering structure in continuous variables using Bayesian networks - Hofmann, Tresp - 1996 |

21 | Fast learning by bounding likelihoods in sigmoid type belief networks - Jaakkola, Saul, et al. - 1996 |

17 | optimization of latent-variable density models - Bishop, Svensén, et al. - 1996 |

1 |
Continuous sigmoidal Bayesian networks trained using slice sampling
- Frey
- 1997
(Show Context)
Citation Context ...Tresp (1996) consider the case of inference and learning in continuous belief networks that may be richly connected. They use mixture models and Parzen windows to implement conditional densities. In (=-=Frey 1997), I prese-=-nted a simple, but versatile, real-valued random "unit" that can operate in several different modes ranging from deterministic to binary stochastic to continuous stochastic. This spectrum of... |