## Bayesian Parameter Estimation Via Variational Methods (1999)

Citations: | 105 - 5 self |

### BibTeX

@MISC{Jaakkola99bayesianparameter,

author = {Tommi S. Jaakkola and Michael I. Jordan},

title = {Bayesian Parameter Estimation Via Variational Methods},

year = {1999}

}

### Years of Citing Articles

### OpenURL

### Abstract

We consider a logistic regression model with a Gaussian prior distribution over the parameters. We show that an accurate variational transformation can be used to obtain a closed form approximation to the posterior distribution of the parameters thereby yielding an approximate posterior predictive model. This approach is readily extended to binary graphical model with complete observations. For graphical models with incomplete observations we utilize an additional variational transformation and again obtain a closed form approximation to the posterior. Finally, we show that the dual of the regression problem gives a latent variable density model, the variational formulation of which leads to exactly solvable EM updates.

### Citations

3278 | Convex analysis - Rockafellar - 1970 |

1375 | Generalized Linear Models - McCullagh, Nelder - 1989 |

1042 |
Renewal theory and its
- Smith
- 1958
(Show Context)
Citation Context ...orm treatment of uncertainty at all levels of the modeling process. The formalism also allows ready incorporation of prior knowledge and the seamless combination of such knowledge with observed data (=-=Bernardo & Smith 1994-=-, Gelman 1995, Heckerman et al. 1995). The elegant semantics, however, often comes at a sizable computational cost|posterior distributions resulting from the incorporation of observed data must be rep... |

905 | Learning Bayesian networks: the combination of knowledge and statistical
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...levels of the modeling process. The formalism also allows ready incorporation of prior knowledge and the seamless combination of such knowledge with observed data (Bernardo & Smith 1994, Gelman 1995, =-=Heckerman et al. 1995-=-). The elegant semantics, however, often comes at a sizable computational cost|posterior distributions resulting from the incorporation of observed data must be represented and updated, and this gener... |

835 | An Introduction to Variational Methods for Graphical Models - Jordan, Ghahramani, et al. - 1998 |

564 | Probabilistic inference using Markov chain Monte Carlo methods
- Neal
- 1993
(Show Context)
Citation Context ...es into improved accuracy of the approximation. Variational methods can also be contrasted with sampling techniques, which have become the method of choice in Bayesian statistics (Thomas et al. 1992, =-=Neal 1993-=-, Gilks et al. 1996). Sampling techniques enjoy wide applicability and can be powerful in evaluating multi-dimensional integrals and representing posterior distributions. They do not, however, yield c... |

549 |
Markov Chain Monte Carlo in Practice
- Gilks, Richardson, et al.
- 1996
(Show Context)
Citation Context ...roved accuracy of the approximation. Variational methods can also be contrasted with sampling techniques, which have become the method of choice in Bayesian statistics (Thomas et al. 1992, Neal 1993, =-=Gilks et al. 1996-=-). Sampling techniques enjoy wide applicability and can be powerful in evaluating multi-dimensional integrals and representing posterior distributions. They do not, however, yield closed form solution... |

196 |
Sequential updating of conditional probabilities on directed graphical structures. Networks
- Spiegelhalter, Lauritzen
- 1990
(Show Context)
Citation Context ...osterior (conjugacy), which we optimize variationally. This procedure is iterated for each successive data point. Our methods can be compared to the Laplace approximation for logistic regression (cf. =-=Spiegelhalter & Lauritzen 1990-=-), a closely related method which also utilizes a Gaussian approximation to the posterior. To anticipate the discussion in following sections, we will see that the variational approach has an advantag... |

132 |
An introduction to latent variable models
- Everitt
- 1984
(Show Context)
Citation Context ... . s s s (1) (2) (3) x (1) (2) (3) q q q a) b) Figure 9: a) Bayesian regression problem. b) The dual problem. density model over binary vectors is akin to the standard factor analysis model (see e.g. =-=Everitt 1984-=-). This model has already been used to facilitate visualization of high dimensional binary vectors (Tipping 1999). We now turn to a more technical treatment of this latent variable model. The joint di... |

128 | Keeping neural networks simple by minimizing the description length of the weights - Hinton, Camp - 1993 |

98 | Exploiting tractable substructures in intractable networks - Saul, Jordan - 1996 |

20 |
Variational methods in statistics
- Rustagi
- 1976
(Show Context)
Citation Context ... generically as variational methods. Variational techniques have been used extensively in the physics literature (see, e.g., Parisi 1988, Sakurai 1985) and have also found applications in statistics (=-=Rustagi 1976-=-). Roughly speaking, the objective of these methods is to transform the problem of interest into an optimization problem via the introduction of extra degrees of freedom known as variational parameter... |

20 | Probabilistic visualisation of high-dimensional binary data
- Tipping
- 1999
(Show Context)
Citation Context ...nsity model over binary vectors is akin to the standard factor analysis model (see e.g. Everitt 1984). This model has already been used to facilitate visualization of high dimensional binary vectors (=-=Tipping 1999).-=- We now turn to a more technical treatment of this latent variable model. The joint distribution is given by P (S 1 ; : : : ; S n jX) = Z " Y i P (S i jX i ; ) # P ()d (34) where the conditional ... |

17 |
Bayesian Data Analysis. Boca
- Gelman, Carlin, et al.
- 2004
(Show Context)
Citation Context ...ainty at all levels of the modeling process. The formalism also allows ready incorporation of prior knowledge and the seamless combination of such knowledge with observed data (Bernardo & Smith 1994, =-=Gelman 1995-=-, Heckerman et al. 1995). The elegant semantics, however, often comes at a sizable computational cost|posterior distributions resulting from the incorporation of observed data must be represented and ... |

13 | Mean theory for sigmoid belief networks - Saul, Jaakkola, et al. - 1996 |

10 | Duality between learning machines: a bridge between supervised and unsupervised learning
- Nadal, Parga
- 1994
(Show Context)
Citation Context ...st variational approximation remains accurate. 6 The dual problem In the logistic regression formulation (eq. (1)), the parameters and the explanatory variables X play a dual or symmetric role (cf. N=-=adal and Parga 199-=-4). In the Bayesian logistic regression setting, the symmetry is broken by associating the same parameter vector with multiple occurences of the explanatory variables X as shown in Figure 9. Alternat... |

9 | Ensemble learning for hidden Markov models. Unpublished manuscript - MacKay - 1997 |

9 |
Spin-Glass Theory and
- Mézard, Parisi, et al.
(Show Context)
Citation Context ...deterministic approximation methods that we develop in this paper are known generically as variational methods. Variational techniques have been used extensively in the physics literature (see, e.g., =-=Parisi 1988-=-, Sakurai 1985) and have also found applications in statistics (Rustagi 1976). Roughly speaking, the objective of these methods is to transform the problem of interest into an optimization problem via... |