## Exact bayesian structure learning from uncertain interventions (2007)

Venue: | AI & Statistics, In |

Citations: | 24 - 5 self |

### BibTeX

@INPROCEEDINGS{Eaton07exactbayesian,

author = {Daniel Eaton},

title = {Exact bayesian structure learning from uncertain interventions},

booktitle = {AI & Statistics, In},

year = {2007},

publisher = {Press}

}

### OpenURL

### Abstract

We show how to apply the dynamic programming algorithm of Koivisto and Sood [KS04, Koi06], which computes the exact posterior marginal edge probabilities p(Gij = 1|D) of a DAG G given data D, to the case where the data is obtained by interventions (experiments). In particular, we consider the case where the targets of the interventions are a priori unknown. We show that it is possible to learn the targets of intervention at the same time as learning the causal structure. We apply our exact technique to a biological data set that had previously been analyzed using MCMC [SPP + 05, EW06, WGH06]. 1

### Citations

1117 |
Causality: Models, Reasoning, and Inference
- Pearl
- 2000
(Show Context)
Citation Context ...erventions, in which it is assumed that an intervention sets a single variable to a specific state (as in a randomized experiment). This is the basis of Pearl’s “do-calculus” (as in the verb “to do”) =-=[Pea00]-=-. A perfect intervention essentially “cuts off” the influence of the parents to the intervened node, and can be modeled as a structural change by performing “graph surgery” (removing incoming edges fr... |

903 | Learning Bayesian networks: The combination of knowledge and statistical data
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...probability density function (e.g., multinomial or linear Gaussian). For the parameter prior p(θ|G), we will make the usual assumptions of global and local independence, and parameter modularity (see =-=[HGC95]-=- for details). We will further assume that each p(θi) is conjugate to fi, which allows for closed form computation of the marginal likelihood p(X 1:N |G) = � p(X 1:N |G, θ)p(θ)dθ, where N is the numbe... |

226 | Bayesian graphical models for discrete data
- Madigan, York
- 1995
(Show Context)
Citation Context ...terior over models often has many modes, so it would be unwise to pick any single model, assuming one’s goal is scientific discovery.) Standard MCMC methods for sampling from the posterior (see e.g., =-=[MY95]-=-) are very slow and do not mix well, due to the size of the search space and the “peakiness” of the posterior landscape. A significant advance was made by Friedman and Koller [FK03], who suggested sam... |

219 | Learning the Structure of Dynamic Probabilistic Networks
- Friedman
- 1998
(Show Context)
Citation Context ...05, EGS06, TP01a, TP01b] and empirically A B C D E (a) A B C D E (e) A B C D E A B C D E A B C (b) (c) (d) A B C D E (f) A B C D E (g) D E A B C D E (h) Figure 3: Top left: the “cancer network”, from =-=[FMR98]-=-. (a-d) are Markov equivalent. (c-g) are equivalent under an intervention on B. (h) is the unique member under an intervention on A. Based on [TP01b]. [CY99, MS03, TP01a, TP01b, WGH06]. Specifically, ... |

202 | Being bayesian about network structure
- Friedman, Koller
- 2000
(Show Context)
Citation Context ...sterior (see e.g., [MY95]) are very slow and do not mix well, due to the size of the search space and the “peakiness” of the posterior landscape. A significant advance was made by Friedman and Koller =-=[FK03]-=-, who suggested sampling over the space of node orderings, which “only” has size O(d!). Koivisto and Sood [KS04, Koi06] made another significant advance, by showing that one can compute the exact post... |

158 | CaUSAl protein-signaling networks derived from multiparameter single-cell data - Sachs, Perez, et al. - 2005 |

120 | Cached sufficient statistics for efficient machine learning with large datasets
- Moore, Lee
- 1998
(Show Context)
Citation Context ... is the number of training cases. We have found that 95% of the overall algorithm time is spent computing these terms, even for relatively small (N ∼ 5000) datasets. Fortunately, one can use AD-trees =-=[ML98]-=- to speed this up. 3.3 Layering In the case where we include the intervention nodes in the graph, we use a two layered graph structure, V = X ∪ I, where X are the regular nodes and I are the intervent... |

91 | A transformational characterization of equivalent Bayesian network structures
- Chickering
- 1995
(Show Context)
Citation Context ...WGH06]. Specifically, each intervention determines the direction of the edges between the intervened nodes and its neighbors; this in turn may result in the direction of other edges being “compelled” =-=[Chi95]-=-. For example, in Figure 3, we see that there are 4 graphs that are Markov equivalent to the true structure; given observational data alone, this is all we can infer. However, given enough interventio... |

62 | Causal discovery from a mixture of experimental and observational data
- Cooper, Yoo
- 1999
(Show Context)
Citation Context ... the case of perfect interventions, this second factor evaluates to 1, so we can simply drop cases in which node i was set by intervention from the computation of the marginal likelihood of that node =-=[CY99]-=-. We can also model the case where the interventions are unreliable, by introducing a latent indicator Rn i , where Rn i = 1 means the intervention succeeded, and Rn i = 0 means it failed. In this cas... |

55 | Advances in exact Bayesian structure discovery in Bayesian networks - Koivisto - 2006 |

55 | Active learning for structure in Bayesian networks - Tong, Koller - 2001 |

39 |
Counting labeled acyclic digraphs
- Robinson
- 1971
(Show Context)
Citation Context ...y molecular biology experiments. However, we are left with a computational problem. Computing the full posterior is intractable, since there are O(d!2 (d2) ) DAGs (directed acyclic graphs) on d nodes =-=[Rob73]-=-. 1 So all one can realistically hope to do is to 1 The exact formula is given by the following recurrence equation: r(d) = Pd i=1 (−1)i+1` ´ d i(d−i) 2 r(d − i). This gives i r(2) = 3, r(3) = 25, r(4... |

37 | Active learning of causal bayes net structure - Murphy - 2001 |

33 |
Finding optimal Bayesian networks
- Chickering, Meek
- 2002
(Show Context)
Citation Context ...l protocol. We considered the graph structure in Figure 3, and then generated random multinomial CPDs by sampling from a Dirichlet distribution with hyper-parameters chosen by the method described in =-=[CM02]-=-. This ensures that there are reasonably strong dependencies between the nodes. (We used binary nodes for simplicity.) We then generated data using forwards sampling; the first 2000 cases D0 were from... |

33 |
Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and Bayesian networks
- Werhli, Grzegorczyk, et al.
(Show Context)
Citation Context ... methodology to a real biological data set, which had previously been analyzed using MCMC by Sachs et al [SPP + 05] (who used multiple restart simulated annealing in the space of DAGs), Werhli et al. =-=[WGH06]-=- (who used Metropolis Hastings in the space of node orderings), and Ellis and Wong [EW06] (who used equi-energy sampling in the space of node orderings). The purpose of our experiment is to determine ... |

26 | D (2002) Parameter priors for directed acyclic graphical models and the characterization of several probability distributions. Annals of statistics 30
- Geiger, Heckerman
(Show Context)
Citation Context ... the number of discrete states for Xi, and qi is the number of states for XGi. We will usually use the BDeu prior αijk = 1/qiri [HGC95]. (An analogous formula can be derived for the normal-Gamma case =-=[GH02]-=-.) The marginal likelihood of all the nodes is then given by p(X1:N |G) = �d i=1 p(X1:N i |X1:N), where Gi d is the number of nodes. 2.2 Perfect interventions If we perform a perfect intervention on n... |

19 | Structured priors for structure learning
- Mansinghka, Kemp, et al.
- 2003
(Show Context)
Citation Context ... bottleneck to tackling larger problems is the space and time limit of O(d2 d ), which limits us to about d = 20. However, one can exploit the layering idea to extend this to much larger graphs. (See =-=[MKTG06]-=- for some ideas on how to partition nodes into groups/ layers in an unsupervised way.) Layering should also enable the learning of dynamic Bayes nets (DBNs) [FMR98]. Another issue that deserves more a... |

19 | Causal discovery from changes
- Tian, Pearl
- 2001
(Show Context)
Citation Context ...cs into D since it is a vstructure, and the C→E arc since it is compelled. In general, given a set of interventions and observational data, we can identify a graph up to intervention equivalence (see =-=[TP01a]-=- for a precise definition). In Section 4.1, we will experimentally study the question of whether one can still learn the true structure from uncertain interventions (i.e., when the targets of interven... |

15 | On the number of experiments sufficient and in the worst case necessary to identify all causal relations among n variables
- Eberhardt, Glymour, et al.
- 2005
(Show Context)
Citation Context ...r example, the three models X→Y →Z, X←Y ←Z, and X←Y →Z all encode the same conditional independency statement, X ⊥ Z|Y . To distinguish between such models, we need interventional (experimental) data =-=[EGS05]-=-. Most previous work has focused on the case of “perfect” interventions, in which it is assumed that an intervention sets a single variable to a specific state (as in a randomized experiment). This is... |

12 | Evaluating the Effect of Perturbations in Reconstructing Network Topologies - Markowetz, Spang - 2003 |

11 |
Probabilistic soft interventions in conditional Gaussian networks
- Markowetz, Grossmann, et al.
- 2005
(Show Context)
Citation Context ...n [KHNA04]. Another way to model imperfect interventions is as “soft” interventions, in which an intervention just increases the likelihood that a node enters its target state x∗ i . Markowetz et al. =-=[MGR05]-=- suggest using the same model of p(Xi|XGi, Ii, θ, G) as before, but now the parameters θ0 i and θ1 i have dependent hyper-parameters. In particular, for ∼ Dir(α0/1 ij· ), they the multinomial-Dirichle... |

9 | Causal discovery from changes: a bayesian approach
- Tian, Pearl
- 2001
(Show Context)
Citation Context ...(h) Figure 3: Top left: the “cancer network”, from [FMR98]. (a-d) are Markov equivalent. (c-g) are equivalent under an intervention on B. (h) is the unique member under an intervention on A. Based on =-=[TP01b]-=-. [CY99, MS03, TP01a, TP01b, WGH06]. Specifically, each intervention determines the direction of the edges between the intervened nodes and its neighbors; this in turn may result in the direction of o... |

7 |
Varieties of causal intervention
- Korb, Hope, et al.
- 2004
(Show Context)
Citation Context ...t Kevin Murphy Computer Science Dept. University of British Columbia murphyk@cs.ubc.ca interventions are “stochastic”, meaning that they induce a distribution over states rather than a specific state =-=[KHNA04]-=-. A further relaxation is to assume that the effect of an intervention does not render the node independent of its parents, but simply changes the parameters of the local distribution; this has been c... |

5 | Interventions and causal inference
- Eberhardt, Scheines
(Show Context)
Citation Context ...ntion does not render the node independent of its parents, but simply changes the parameters of the local distribution; this has been called a “mechanism change” [TP01b, TP01a] or “parametric change” =-=[EGS06]-=-. For many situations, this is a more realistic model than perfect interventions, since it is often impossible to force variables into specific states. In this paper, we propose a further relaxation o... |

2 | Sufficient condition for pooling data from different distributions
- Eberhardt
- 2006
(Show Context)
Citation Context ...ture learning avoids many of the conceptual problems that arise when trying to combine the results of potentially inconsistent conditional independency tests performed on different (“mutated”) models =-=[Ebe06]-=-. In addition, it is particularly appropriate when the sample sizes are small, but “soft” prior knowledge is available, as in many molecular biology experiments. However, we are left with a computatio... |

2 |
Sampling Bayesian Networks quickly
- Ellis, Wong
- 2006
(Show Context)
Citation Context ...by Sachs et al [SPP + 05] (who used multiple restart simulated annealing in the space of DAGs), Werhli et al. [WGH06] (who used Metropolis Hastings in the space of node orderings), and Ellis and Wong =-=[EW06]-=- (who used equi-energy sampling in the space of node orderings). The purpose of our experiment is to determine the exact posterior over edges, and hence to assess the quality of the MCMC techniques, a... |