This literature review discusses different methods under the general rubric of learning Bayesian networks from data, and includes some overlapping work on more general probabilistic networks. Connections are drawn between the statistical, neural network, and uncertainty communities, and between the different methodological communities, such as Bayesian, description length, and classical statistics. Basic concepts for learning and Bayesian networks are introduced and methods are then reviewed. Methods are discussed for learning parameters of a probabilistic network, for learning the structure, and for learning hidden variables. The presentation avoids formal definitions and theorems, as these are plentiful in the literature, and instead illustrates key concepts with simplified examples. Keywords--- Bayesian networks, graphical models, hidden variables, learning, learning structure, probabilistic networks, knowledge discovery. I. Introduction Probabilistic networks or probabilistic gra...
|
4735
|
Maximum Likelihood from incomplete data via the EM algorithm
– Dempster, Laird, et al.
- 1977
|
|
4701
|
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
– Pearl
- 1988
|
|
1405
|
Introduction to the Theory of Neural Computation
– Hertz, Krogh, et al.
- 1991
|
|
936
|
Local Computations with Probabilities on Graphical Structures and Their Applications to Expert Systems
– Lauritzen, Spigelholter
- 1988
|
|
726
|
A bayesian method for the induction of probabilistic networks from data
– Cooper, Herskovits
- 1992
|
|
653
|
Information Theory and Statistics
– Kullback
- 1959
|
|
615
|
Learning Bayesian networks: The combination of knowledge and statistical data
– Heckerman, Geiger, et al.
- 1995
|
|
615
|
Generalized linear models
– Nelder, Wedderburn
- 1972
|
|
600
|
Bayesian Theory
– Bernardo, Smith
- 1994
|
|
476
|
Judgement Under Uncertainty: Heuristics and Biases
– Kahneman, Slovic, et al.
- 1982
|
|
422
|
Bayesian Learning for Neural Networks
– Neal
- 1996
|
|
416
|
Statistical Analysis of Finite Mixture Distributions
– Titterington, Smith, et al.
- 1985
|
|
385
|
Stochastic Complexity
– Rissanen
- 1987
|
|
366
|
A study of cross-validation and bootstrap for accuracy estimation and model selection
– Kohavi
- 1995
|
|
349
|
Approximating discrete probability distributions with dependence trees
– Chow, Liu
- 1968
|
|
344
|
Probabilistic inference using Markov chain Monte Carlo methods
– Neal
- 1993
|
|
302
|
Decision theoretic generalizations of the PAC model for neural net and other learning applications
– Haussler
- 1992
|
|
293
|
Stochastic Simulation
– Ripley
- 1987
|
|
281
|
Planning and Control
– Dean, Wellman
- 1991
|
|
266
|
Graphical Models in Applied Multivariate Statistics
– Whittaker
- 1990
|
|
221
|
Learning Bayesian Networks
– Heckerman, Geiger, et al.
- 1994
|
|
183
|
Bayesian networks without tears
– Charniak
- 1991
|
|
183
|
Model selection and accounting for model uncertainty in graphical models using Occam’s window
– Madigan, Raftery
- 1994
|
|
178
|
Operations for learning with graphical models
– Buntine
- 1994
|
|
176
|
Spatial Statistics
– Ripley
- 1981
|
|
167
|
Sequential updating of conditional probabilities on directed graphical structures, Networks
– Spiegelhalter, Lauritzen
- 1990
|
|
164
|
Bayesian analysis in expert systems
– Spiegelhalter, Dawid, et al.
- 1993
|
|
156
|
Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis
– MORGAN, HENRION
- 1990
|
|
150
|
The EM algorithm for graphical association models with missing data
– Lauritzen
- 1995
|
|
148
|
Learning Bayesian belief networks: An approach based on the MDL principle
– Lam, Bacchus
- 1994
|
|
145
|
Minimum complexity density estimation
– Barron, Cover
- 1991
|
|
141
|
Connectionist learning of belief networks
– Neal
- 1992
|
|
135
|
Equivalence and synthesis of causal models
– Verma, Pearl
- 1990
|
|
135
|
Theory refinement of Bayesian networks
– Buntine
- 1991
|
|
133
|
Building Expert Systems
– Hayes-Roth, Waterman, et al.
- 1983
|
|
128
|
Bayesian graphical models for discrete data
– Madigan, York
- 1995
|
|
121
|
Graphical models for association between variables, some of which are qualitative and some quantitative
– Lauritzen, Wermuth
- 1989
|
|
112
|
M: Tools for Statistical Inference
– Tanner
- 1996
|
|
111
|
Probabilistic Similarity Networks
– Heckerman
- 1991
|
|
109
|
Independence properties of directed Markov fields
– Lauritzen, Dawid, et al.
- 1990
|
|
108
|
Bayesian model selection in social research (with discussion
– Raftery
- 1995
|
|
97
|
Mean field theory for sigmoid belief networks
– Saul, Jaakkola, et al.
- 1996
|
|
96
|
Learning Classification Trees
– Buntine
- 1992
|
|
95
|
Causal Diagrams for Empirical Research
– Pearl
- 1995
|
|
95
|
Unknown attribute values in induction
– Quinlan
- 1989
|
|
90
|
MLC++: A machine learning library in C
– Kohavi, John, et al.
- 1994
|
|
84
|
Hyper Markov laws in the statistical analysis of decomposable graphical models
– Dawid, Lauritzen
- 1993
|
|
84
|
Applications of machine learning and rule induction
– Langley, Simon
- 1995
|
|
82
|
Information and Exponential Families in Statistical Theory
– Barndorff-Nielsen
- 1978
|
|
81
|
Correlation and Causation
– Wright
- 1921
|