MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Efficient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables (1996) [126 citations — 7 self]

by David Maxwell Chickering ,  David Heckerman
Machine Learning
Add To MetaCart

Abstract:

We discuss Bayesian methods for model averaging and model selection among Bayesiannetwork models with hidden variables. In particular, we examine large-sample approximations for the marginal likelihood of naive-Bayes models in which the root node is hidden. Such models are useful for clustering or unsupervised learning. We consider a Laplace approximation and the less accurate but more computationally efficient approximation known as the Bayesian Information Criterion (BIC), which is equivalent to Rissanen's (1987) Minimum Description Length (MDL). Also, we consider approximations that ignore some off-diagonal elements of the observed information matrix and an approximation proposed by Cheeseman and Stutz (1995). We evaluate the accuracy of these approximations using a Monte-Carlo gold standard. In experiments with artificial and real examples, we find that (1) none of the approximations are accurate when used for model averaging, (2) all of the approximations, with the exception of BI...

Citations

4750 Maximum Likelihood from Incomplete Data Using the EM Algorithm – Dempster, Laird, et al. - 1977
2451 Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images – Geman, Geman - 1984
2227 UCI repository of machine learning databases – Blake, Merz - 1998
974 Estimating the dimension of a model – Schwarz - 1978
727 A bayesian method for the induction of probabilistic networks from data – Cooper, Hersovits - 1992
684 Statistical Decision Theory and Bayesian Analysis, 2nd Edition – Berger - 1985
616 Learning Bayesian networks: The combination of knowledge and statistical data – Heckerman, Geiger, et al. - 1995
601 Bayesian Theory – Bernardo, Smith - 2000
509 Bayes factors – Kass, Raftery - 1995
362 Bayesian classification (AutoClass): Theory and results – CHEESEMAN, STUTZ - 1996
355 Bayesian interpolation – MacKay - 1992
345 Probabilistic inference using Markov chain Monte Carlo methods – Neal - 1993
308 Markov Chain Monte Carlo in Practice – Gilks, Richardson, et al. - 1996
304 A practical Bayesian framework for backpropagation networks – MacKay - 1992
271 Causation, Prediction and Search – Spirtes, Glymour, et al. - 2000
241 Theory of Probability – Jeffreys - 1948
221 Learning Bayesian Networks – Heckerman, Geiger, et al. - 1994
178 Operations for learning with graphical models – Buntine - 1994
164 Bayesian analysis in expert systems – Spiegelhalter, Dawid, et al. - 1993
128 Marginal likelihood from the Gibbs output – Chib - 1995
128 Bayesian graphical models for discrete data – Madigan, York - 1995
119 Bayesian updating in recursive graphical models by local computation – Jensen, Lauritzen, et al. - 1990
108 Bayesian model selection in social research (with discussion – Raftery - 1995
102 Learning by being told and learning from examples: An experimental comparison of the two methods of knowledge acquisition in the context of developing an expert system for soybean disease diagnosis – Michalski, Chilausky - 1980
100 Inference and missing data – Rubin - 1976
97 Mean field theory for sigmoid belief networks – Saul, Jaakkola, et al. - 1996
83 Assessment and propagation of model uncertainty (with discussion – Draper - 1995
78 Improving the convergence of back propagation learning with second-order methods – Becker - 1988
72 Learning Gaussian networks – Geiger, Heckerman - 1994
71 Approximate Bayes factors and accounting for model uncertainty in generalised linear models – Raftery - 1996
69 A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion – Kass, 'vVasserman - 1995
63 Local learning in probabilistic networks with hidden variables – Russell, Binder, et al. - 1995
49 PROTOS - an exemplar-based learning apprentice – Bareiss, Porter - 1987
44 The Estimation of Probabilities – Good - 1965
39 On the choice of a model to fit data from an exponential familly – Haughton - 1988
34 Using EM to obtain asymptotic variance-covariance matrices: the SEM algorithm – Meng, Rubin - 1991
32 Asymptotic Model Selection for Directed Networks with Hidden Variables – Geiger, Heckerman, et al. - 1996
27 Hypothesis testing and model selection – Raftery - 1996
25 Stochastic Complexity (with discussion – Rissanen - 1987
22 Computing second derivatives in feed-forward networks: A review – Buntine, Weigend - 1994
21 Bayesian Mixture Modeling by Monte Carlo Simulation – Neal - 1991
15 A guide to the literature on learning graphical models – Buntine - 1996
15 Optimal discriminant plane for a small number of samples and design method of classifier on the plane – Hong, Yang - 1991
13 Asymptotics in bayesian computation – Kass, Tierney, et al. - 1988
12 Choice of basis for the Laplace approximation – MacKay - 1996
10 An application of the Laplace method to finite mixture distributions – Crawford - 1994
7 Likelihoods and priors for Bayesian networks – Heckerman, Geiger - 1996
6 Latent class models. In Handbook of statistical modeling for the social and behavioral sciences – Clogg - 1995
5 Laplace's method approximations for probabilistic inference in belief networks with continuous variables – Azevedo-Filho - 1994
3 Quantified maximum entropy. memsys5 user's manual – Gull - 1991