## Structured priors for structure learning (2006)

### Cached

### Download Links

- [cocosci.berkeley.edu]
- [www.psy.cmu.edu]
- [www.psy.cmu.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI |

Citations: | 19 - 8 self |

### BibTeX

@INPROCEEDINGS{Mansinghka06structuredpriors,

author = {V. K. Mansinghka and C. Kemp and J. B. Tenenbaum},

title = {Structured priors for structure learning},

booktitle = {In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI},

year = {2006},

publisher = {AUAI Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

Traditional approaches to Bayes net structure learning typically assume little regularity in graph structure other than sparseness. However, in many cases, we expect more systematicity: variables in real-world systems often group into classes that predict the kinds of probabilistic dependencies they participate in. Here we capture this form of prior knowledge in a hierarchical Bayesian framework, and exploit it to enable structure learning and type discovery from small datasets. Specifically, we present a nonparametric generative model for directed acyclic graphs as a prior for Bayes net structure learning. Our model assumes that variables come in one or more classes and that the prior probability of an edge existing between two variables is a function only of their classes. We derive an MCMC algorithm for simultaneous inference of the number of classes, the class assignments of variables, and the Bayes net structure over variables. For several realistic, sparse datasets, we show that the bias towards systematicity of connections provided by our model can yield more accurate learned networks than the traditional approach of using a uniform prior, and that the classes found by our model are appropriate. 1

### Citations

1075 | Herskovitz: A Bayesian Method for the Induction
- Cooper, E
- 1992
(Show Context)
Citation Context ...arameterization. For simplicity, we work with discrete-state Bayesian networks with known domains, and use the standard conjugate Dirichlet-Multinomial model for their conditional probability tables [=-=Cooper and Herskovits, 1992-=-]. Note that the variable G in Figure 1 refers to a graph structure, with B its elaboration into a full Bayesian network with conditional probability tables. Inference in the model of Figure 1 and its... |

537 | Hierarchical Dirichlet processes
- Teh, Jordan, et al.
- 2006
(Show Context)
Citation Context ...y adding another layer to our hierarchical model. In particular, we could flexibly share causal roles across entirely different networks by replacing our CRP with the Chinese restaurant franchise of [=-=Teh et al., 2004-=-]. A central feature of this work is that it attempts to negotiate a principled tradeoff between the expressiveness of the space of possible abstract patterns (with the attendant advantages of particu... |

510 | Learning probabilistic relational models
- Getoor, Friedman, et al.
- 2001
(Show Context)
Citation Context ... both this sort of network and a good characterization of this sort of regularity. At the other end of the flexibilitylearnability tradeoff, frameworks such as probabilistic relational models (PRMs) [=-=Friedman et al., 1999-=-] provide a far more expressive language for abstract relational knowledge that can constrain the space of Bayesian networks appropriate for a given domain, but it is a significant challenge to learn ... |

158 | The infinite Gaussian mixture model - Rasmussen - 2000 |

115 | Probabilistic diagnosis using a reformulation of the internist-1/qmr knowledge base
- Middleton, Shwe, et al.
- 1991
(Show Context)
Citation Context ...aspects of learning and inference. Consider the domain of medical learning and reasoning. Knowledge engineers in this area have historically imposed strong structural constraints; the QMR-DT network [=-=Shwe et al., 1991-=-], for example, segregates nodes into diseases and symptoms, and only permits edges from the former to the latter. Recent attempts at medical knowledge engineering have continued in this tradition; fo... |

47 | Improving Markov Chain Monte Carlo Model Search for Data Mining - Giudici, Castelo - 2003 |

44 | Learning module networks - Segal, Pe’er, et al. |

42 | Bayesian Model Averaging: A
- Hoeting, Madigan, et al.
- 1999
(Show Context)
Citation Context ...ions. Of course, it is straightforward to anneal the Markov chain and to periodically restart if we are only interested in the MAP value or a set of high-scoring states for selective model averaging [=-=Madigan et al., 1996-=-]. Our overall MCMC process decouples into moves on graphs and, if relevant, moves on the latent states of the prior (a partition of the nodes, and in the ordered case, an ordering of the groups in th... |

40 | Discovering latent classes in relational data
- Kemp, Griffiths, et al.
- 2004
(Show Context)
Citation Context ...sted in discovering node classes that explain patterns of incoming and outgoing edges, assa step towards representation and learning of causal roles. Our starting point is the infinite blockmodel of [=-=Kemp et al., 2004-=-], a nonparametric generative model for directed graphs, which we modify in two ways to only produce acyclic graphs. We first describe the generative process for the ordered blockmodel: 1. Generate a ... |

32 | Learning Bayesian Network Parameters From Small Data Sets: Application of Noisy-OR Gates - Onisko, Druzdzel, et al. - 2000 |

16 |
Combinatorial Stochastic Processes. Notes for Saint Flour Summer School
- Pitman, J
- 2002
(Show Context)
Citation Context ...irst describe the generative process for the ordered blockmodel: 1. Generate a class-assignment vector �z containing a partition of the N nodes of the graph via the Chinese restaurant process or CRP [=-=Pitman, 2002-=-] with hyperparameter α. The CRP represents a partition in terms of a restaurant with a countably infinite number of tables, where each table corresponds to a group in the partition (see Figure 1b) an... |

12 | Learning Bayes net structure from sparse data sets
- Murphy
- 2001
(Show Context)
Citation Context ...lete discrete observations, assuming the CPTs (represented by the parameters B) have been integrated out; γ plays the role of the pseudo-counts setting the degree of expected determinism of the CPTs [=-=Murphy, 2001-=-]. Overall, then, our model has three free parameters (in the symmetric β case) — α, β and γ — which each give us the opportunity to encode weak prior knowledge at different levels of abstraction.sa) ... |

10 | GeNIeRate: An interactive generator of diagnostic Bayesian network models
- Kraaijeveld, Druzdzel
(Show Context)
Citation Context ...especially appropriate for the study of gene regulatory networks, but it is not as appropriate for modeling many other domains with less regular structure. The QMR-DT [Shwe et al., 1991] or Hepar II [=-=Kraaijeveld et al., 2005-=-] networks, for example, contain highly regular but largely nonmodular structure, and we would like to be able to discover both this sort of network and a good characterization of this sort of regular... |

4 | MinReg: A Scalable Algorithm for Learning Parsimonious Regulatory Networks in Yeast and Mammals - Pe’er, Tanay, et al. - 2006 |

1 | The Variational EM Algorithm for Incomplete Data: with Application to Scoring Graphical Model Structures - Beal, Ghahramani |