We discuss Bayesian methods for model averaging and model selection among Bayesiannetwork models with hidden variables. In particular, we examine large-sample approximations for the marginal likelihood of naive-Bayes models in which the root node is hidden. Such models are useful for clustering or unsupervised learning. We consider a Laplace approximation and the less accurate but more computationally efficient approximation known as the Bayesian Information Criterion (BIC), which is equivalent to Rissanen's (1987) Minimum Description Length (MDL). Also, we consider approximations that ignore some off-diagonal elements of the observed information matrix and an approximation proposed by Cheeseman and Stutz (1995). We evaluate the accuracy of these approximations using a Monte-Carlo gold standard. In experiments with artificial and real examples, we find that (1) none of the approximations are accurate when used for model averaging, (2) all of the approximations, with the exception of BI...
|
4750
|
Maximum Likelihood from Incomplete Data Using the EM Algorithm
– Dempster, Laird, et al.
- 1977
|
|
2451
|
Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images
– Geman, Geman
- 1984
|
|
2227
|
UCI repository of machine learning databases
– Blake, Merz
- 1998
|
|
974
|
Estimating the dimension of a model
– Schwarz
- 1978
|
|
727
|
A bayesian method for the induction of probabilistic networks from data
– Cooper, Hersovits
- 1992
|
|
684
|
Statistical Decision Theory and Bayesian Analysis, 2nd Edition
– Berger
- 1985
|
|
616
|
Learning Bayesian networks: The combination of knowledge and statistical data
– Heckerman, Geiger, et al.
- 1995
|
|
601
|
Bayesian Theory
– Bernardo, Smith
- 2000
|
|
509
|
Bayes factors
– Kass, Raftery
- 1995
|
|
362
|
Bayesian classification (AutoClass): Theory and results
– CHEESEMAN, STUTZ
- 1996
|
|
355
|
Bayesian interpolation
– MacKay
- 1992
|
|
345
|
Probabilistic inference using Markov chain Monte Carlo methods
– Neal
- 1993
|
|
308
|
Markov Chain Monte Carlo in Practice
– Gilks, Richardson, et al.
- 1996
|
|
304
|
A practical Bayesian framework for backpropagation networks
– MacKay
- 1992
|
|
271
|
Causation, Prediction and Search
– Spirtes, Glymour, et al.
- 2000
|
|
241
|
Theory of Probability
– Jeffreys
- 1948
|
|
221
|
Learning Bayesian Networks
– Heckerman, Geiger, et al.
- 1994
|
|
178
|
Operations for learning with graphical models
– Buntine
- 1994
|
|
164
|
Bayesian analysis in expert systems
– Spiegelhalter, Dawid, et al.
- 1993
|
|
128
|
Marginal likelihood from the Gibbs output
– Chib
- 1995
|
|
128
|
Bayesian graphical models for discrete data
– Madigan, York
- 1995
|
|
119
|
Bayesian updating in recursive graphical models by local computation
– Jensen, Lauritzen, et al.
- 1990
|
|
108
|
Bayesian model selection in social research (with discussion
– Raftery
- 1995
|
|
102
|
Learning by being told and learning from examples: An experimental comparison of the two methods of knowledge acquisition in the context of developing an expert system for soybean disease diagnosis
– Michalski, Chilausky
- 1980
|
|
100
|
Inference and missing data
– Rubin
- 1976
|
|
97
|
Mean field theory for sigmoid belief networks
– Saul, Jaakkola, et al.
- 1996
|
|
83
|
Assessment and propagation of model uncertainty (with discussion
– Draper
- 1995
|
|
78
|
Improving the convergence of back propagation learning with second-order methods
– Becker
- 1988
|
|
72
|
Learning Gaussian networks
– Geiger, Heckerman
- 1994
|
|
71
|
Approximate Bayes factors and accounting for model uncertainty in generalised linear models
– Raftery
- 1996
|
|
69
|
A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion
– Kass, 'vVasserman
- 1995
|
|
63
|
Local learning in probabilistic networks with hidden variables
– Russell, Binder, et al.
- 1995
|
|
49
|
PROTOS - an exemplar-based learning apprentice
– Bareiss, Porter
- 1987
|
|
44
|
The Estimation of Probabilities
– Good
- 1965
|
|
39
|
On the choice of a model to fit data from an exponential familly
– Haughton
- 1988
|
|
34
|
Using EM to obtain asymptotic variance-covariance matrices: the SEM algorithm
– Meng, Rubin
- 1991
|
|
32
|
Asymptotic Model Selection for Directed Networks with Hidden Variables
– Geiger, Heckerman, et al.
- 1996
|
|
27
|
Hypothesis testing and model selection
– Raftery
- 1996
|
|
25
|
Stochastic Complexity (with discussion
– Rissanen
- 1987
|
|
22
|
Computing second derivatives in feed-forward networks: A review
– Buntine, Weigend
- 1994
|
|
21
|
Bayesian Mixture Modeling by Monte Carlo Simulation
– Neal
- 1991
|
|
15
|
A guide to the literature on learning graphical models
– Buntine
- 1996
|
|
15
|
Optimal discriminant plane for a small number of samples and design method of classifier on the plane
– Hong, Yang
- 1991
|
|
13
|
Asymptotics in bayesian computation
– Kass, Tierney, et al.
- 1988
|
|
12
|
Choice of basis for the Laplace approximation
– MacKay
- 1996
|
|
10
|
An application of the Laplace method to finite mixture distributions
– Crawford
- 1994
|
|
7
|
Likelihoods and priors for Bayesian networks
– Heckerman, Geiger
- 1996
|
|
6
|
Latent class models. In Handbook of statistical modeling for the social and behavioral sciences
– Clogg
- 1995
|
|
5
|
Laplace's method approximations for probabilistic inference in belief networks with continuous variables
– Azevedo-Filho
- 1994
|
|
3
|
Quantified maximum entropy. memsys5 user's manual
– Gull
- 1991
|