## Learning mixtures of DAG models (1997)

### Cached

### Download Links

Citations: | 26 - 2 self |

### BibTeX

@TECHREPORT{Thiesson97learningmixtures,

author = {Bo Thiesson and Christopher Meek and David Maxwell Chickering and David Heckerman},

title = {Learning mixtures of DAG models},

institution = {},

year = {1997}

}

### Years of Citing Articles

### OpenURL

### Abstract

We describe computationally efficient methods for learning mixtures in which each component is a directed acyclic graphical model (mixtures of DAGs or MDAGs). We argue that simple search-and-score algorithms are infeasible for a variety of problems, and introduce a feasible approach in which parameter and structure search is interleaved and expected data is treated as real data. Our approach can be viewed as a combination of (1) the Cheeseman–Stutz asymptotic approximation for model posterior probability and (2) the Expectation–Maximization algorithm. We evaluate our procedure for selecting among MDAGs on synthetic and real examples. 1

### Citations

9193 | Maximum likelihood from incomplete data via the EM algorithm - Dempster, Laird, et al. - 1977 |

1194 |
Bayesian Theory
- Bernardo, Smith
- 2000
(Show Context)
Citation Context ...for X, we compute the posterior distributions for each s h and θs using Bayes’ rule. We can use the model posterior probability for various forms of model comparison, including model averaging (e.g., =-=Bernardo & Smith, 1994-=-). In this work, we limit ourselves to the selection of a model with a high posterior probability. In what follows, we concentrate on model selection using the posterior model probability. To simplify... |

1148 | A Bayesian method for the induction of probabilistic networks form data
- Cooper, Herskovits
- 1992
(Show Context)
Citation Context ...cting among MDAGs on synthetic and real examples. 1 Introduction For almost a decade, statisticians and computer scientists have used directed-acyclic graph (DAG) models for learning from data (e.g., =-=Cooper & Herskovits, 1992-=-; Spirtes, Glymour, & Scheines, 1993; Spiegelhalter, Dawid, Lauritzen, & Cowell, 1993; Buntine, 1994; and Heckerman, Geiger, & Chickering, 1995). In this paper, we consider mixtures of DAG models (MDA... |

962 | Learning Bayesian networks: The combination of knowledge and statistical data
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...ed directed-acyclic graph (DAG) models for learning from data (e.g., Cooper & Herskovits, 1992; Spirtes, Glymour, & Scheines, 1993; Spiegelhalter, Dawid, Lauritzen, & Cowell, 1993; Buntine, 1994; and =-=Heckerman, Geiger, & Chickering, 1995-=-). In this paper, we consider mixtures of DAG models (MDAG models) and methods for choosing among models in this class. MDAG models generalize DAG models, and should more accurately model domains cont... |

850 | Optimal Statistical Decisions - DeGroot - 1971 |

542 |
Causation, Prediction and Search
- Spirtes, Glymour, et al.
- 2000
(Show Context)
Citation Context ...tic and real examples. 1 Introduction For almost a decade, statisticians and computer scientists have used directed-acyclic graph (DAG) models for learning from data (e.g., Cooper & Herskovits, 1992; =-=Spirtes, Glymour, & Scheines, 1993-=-; Spiegelhalter, Dawid, Lauritzen, & Cowell, 1993; Buntine, 1994; and Heckerman, Geiger, & Chickering, 1995). In this paper, we consider mixtures of DAG models (MDAG models) and methods for choosing a... |

519 |
Bayesian classification (AutoClass): theory and results
- Cheeseman, Stutz
- 1996
(Show Context)
Citation Context ...995), and probabilistic principle component analysis (Tipping & Bishop, 1997). MDAG models generalize a variety of mixtures models including naive-Bayes models used for clustering (e.g., Clogg, 1995; =-=Cheeseman and Stutz, 1995-=-), mixtures of factor analytic models (Hinton, Dayan, & Revow, 1997), and mixtures of probabilistic principle component analytic models (Tipping & Bishop, 1997). There is also work related to our lear... |

406 |
Evaluating influence diagrams
- Shachter
- 1986
(Show Context)
Citation Context .... . , π |C|. Thus, when C is hidden, we say that the multi-DAG model for X and C is a mixture of DAG models (or MDAG model) for X. An important subclass of DAG models is the Gaussian DAG model (e.g., =-=Shachter & Kenley, 1989-=-). In this subclass, the local distribution family for every random variable given its parents is a linear regression with Gaussian noise. It is well known that a Gaussian DAG model for X1, . . .,Xn u... |

376 | Matheson: Influence diagrams - Howard, E - 1984 |

357 | Model-based Gaussian and non-Gaussian clustering - Banfield, Raftery - 1993 |

331 | A tutorial on learning bayesian networks - Heckerman - 1996 |

254 | Operations for learning with graphical models
- Buntine
- 2004
(Show Context)
Citation Context ... scientists have used directed-acyclic graph (DAG) models for learning from data (e.g., Cooper & Herskovits, 1992; Spirtes, Glymour, & Scheines, 1993; Spiegelhalter, Dawid, Lauritzen, & Cowell, 1993; =-=Buntine, 1994-=-; and Heckerman, Geiger, & Chickering, 1995). In this paper, we consider mixtures of DAG models (MDAG models) and methods for choosing among models in this class. MDAG models generalize DAG models, an... |

233 |
Accurate Approximations for Posterior Moments and Marginal Densities
- Tierney, Kadane
- 1986
(Show Context)
Citation Context ...ures, that shows the approximation to be quite good. In all experiments, it was at least as accurate and sometimes more accurate than the standard approximation obtained using Laplace’s method (e.g., =-=Tierney & Kadane, 1986-=-). An important idea behind the Cheeseman–Stutz approximation is that we treat data completed by the EM algorithm as if it were real data. This same idea underlies the M step of the EM algorithm. As w... |

223 | The Bayesian structural EM algorithm - Friedman - 1998 |

218 | A theory of inferred causation - Pearl, Verma - 1991 |

200 |
Bayesian analysis in expert systems
- Spiegelhalter, Dawid, et al.
- 1993
(Show Context)
Citation Context ...n For almost a decade, statisticians and computer scientists have used directed-acyclic graph (DAG) models for learning from data (e.g., Cooper & Herskovits, 1992; Spirtes, Glymour, & Scheines, 1993; =-=Spiegelhalter, Dawid, Lauritzen, & Cowell, 1993-=-; Buntine, 1994; and Heckerman, Geiger, & Chickering, 1995). In this paper, we consider mixtures of DAG models (MDAG models) and methods for choosing among models in this class. MDAG models generalize... |

187 |
A Database for Handwritten Text Recognition
- Hull
- 1994
(Show Context)
Citation Context ...here are 64 random variables corresponding to the gray-scale values [0,255] of scaled and smoothed 8-pixel x 8-pixel images of handwritten digits obtained from the CEDAR U.S. postal service database (=-=Hull, 1994-=-). Applications of joint prediction include image compression and digit classification. The sample sizes for the digits (“0” through “9”) range from 1293 to 1534. For each digit, we use 1100 samples f... |

184 | Efficient Approximations for the Marginal Likelihood of Incomplete Data Given a Bayesian Network - Chickering, Heckerman - 1996 |

174 | Rational Decisions - Good - 1952 |

164 | Learning Bayesian networks is NP-complete - Chickering - 1996 |

156 | Modeling the manifolds of images of handwritten digits
- Hinton, Dayan, et al.
- 1997
(Show Context)
Citation Context ...e structure and parameters for our search procedures as described in Section 5 with equivalent sample sizes equal to 200. The example we consider addresses the digital encoding of handwritten digits (=-=Hinton, Dayan, & Revow, 1997-=-). In this domain, there are 64 random variables corresponding to the gray-scale values [0,255] of scaled and smoothed 8-pixel x 8-pixel images of handwritten digits obtained from the CEDAR U.S. posta... |

148 | Propagation of probabilities, means, and variances in mixed graphical association models - Lauritzen - 1992 |

133 | Learning belief networks in the presence of missing values and hidden variables - Friedman - 1997 |

108 | Gaussian parsimonious clustering models - Celeux, G, et al. - 1995 |

102 | Reverend bayes on inference engines: a distributed hierarchical approach - Pearl - 1982 |

77 |
Computing Bayes factors by combining simulation and asymptotic approximations
- Diciccio, Kass, et al.
- 1997
(Show Context)
Citation Context ... When data is incomplete, no tractable closed form for marginal likelihood is available. Nonetheless, we can approximate the marginal likelihood using either MonteCarlo or large-sample methods (e.g., =-=DiCiccio, Kass, Raftery, and Wasserman, 1995-=-). Thus, a straightforward class of algorithm for choosing an MDAG model is to search among structures as before (e.g., perform greedy search), using some approximation for marginal likelihood. We sha... |

49 | Asymptotic model selection for directed networks with hidden variables - Geiger, Heckerman, et al. - 1996 |

48 |
Latent class models
- Clogg
- 1995
(Show Context)
Citation Context ... 375. (b) X58 Related work DAG models (single-component MDAG models) with hidden variables generalize many well-known statistical models including linear factor analysis, latent factor models (e.g., =-=Clogg, 1995-=-), and probabilistic principle component analysis (Tipping & Bishop, 1997). MDAG models generalize a variety of mixtures models including naive-Bayes models used for clustering (e.g., Clogg, 1995; Che... |

26 | Estimating dependency structure as a hidden variable - Meila, Jordan, et al. - 1997 |

14 | Latent class models. In Handbook of Statistical Modeling for the Social and Behavioral Sciences. Edited by - Clogg - 1995 |

14 | Learning Bayesian networks from incomplete data - Singh |

12 | Mixtures of probabilistic principle component analyzers - Tipping, Bishop - 1999 |

10 | Beyond Bayesian networks: Similarity networks and Bayesian multinets - Geiger, Heckerman - 1996 |

10 | Likelihoods and priors for Bayesian networks - Heckerman, Geiger - 1996 |

5 | Learning Bayesian networks from data - CHICKERING - 1996 |

3 | Vehicle recognition using rule-based methods - Siebert - 1987 |