## ESTIMATING HIGH-DIMENSIONAL INTERVENTION EFFECTS FROM OBSERVATIONAL DATA (2009)

Citations: | 9 - 2 self |

### BibTeX

@MISC{Maathuis09estimatinghigh-dimensional,

author = {Marloes H. Maathuis and Markus Kalisch and Peter Bühlmann},

title = { ESTIMATING HIGH-DIMENSIONAL INTERVENTION EFFECTS FROM OBSERVATIONAL DATA},

year = {2009}

}

### OpenURL

### Abstract

We assume that we have observational data, generated from an unknown underlying directed acyclic graph (DAG) model. A DAG is typically not identifiable from observational data, but it is possible to consistently estimate the equivalence class of a DAG. Moreover, for any given DAG, causal effects can be estimated using intervention calculus. In this paper, we combine these two parts. For each DAG in the estimated equivalence class, we use intervention calculus to estimate the causal effects of the covariates on the response. This yields a collection of estimated causal effects for each covariate. We show that the distinct values in this set can be consistently estimated by an algorithm that uses only local information of the graph. This local approach is computationally fast and feasible in high-dimensional problems. We propose to use summary measures of the set of possible causal effects to determine variable importance. In particular, we use the minimum absolute value of this set, since that is a lower bound on the size of the causal effect. We demonstrate the merits of our methods in a simulation study, and on a data set about riboflavin production.

### Citations

1121 |
Causality: Models, Reasoning, and Inference
- Pearl
- 2000
(Show Context)
Citation Context ...nal covariates. Using such observational data, we want to infer all (single gene) intervention effects. This task coincides with inferring causal effects, a well-established area in Statistics (e.g., =-=[5, 8, 10, 11, 13, 18, 24, 25, 26]-=- and [31]). We emphasize that, in our application, it is exactly the intervention or causal effect that is of interest, rather than a regression-type effect of association. If we can estimate the inte... |

1109 | Graphical Models - Lauritzen - 1996 |

905 | Learning Bayesian networks: the combination of knowledge and statistical
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...a lower bound on the size of the causal effect of Xi on Y . We use this bound to determine variable importance. There is a large existing literature on estimating the equivalence class of DAGs (e.g., =-=[2, 3, 4, 12, 14, 30, 31]-=- and [33]), and there is also a large literature on estimating causal effects when a DAG is given (e.g., [18, 19, 23, 24] and [25]). Our new approach combines these two parts in order to estimate the ... |

637 | Approximating discrete probability distributions with dependence trees
- Chow, Liu
- 1968
(Show Context)
Citation Context ...a lower bound on the size of the causal effect of Xi on Y . We use this bound to determine variable importance. There is a large existing literature on estimating the equivalence class of DAGs (e.g., =-=[2, 3, 4, 12, 14, 30, 31]-=- and [33]), and there is also a large literature on estimating causal effects when a DAG is given (e.g., [18, 19, 23, 24] and [25]). Our new approach combines these two parts in order to estimate the ... |

384 | High-dimensional graphs and variable selection with the Lasso
- Meinshausen, Bühlmann
(Show Context)
Citation Context ...blems. Condition (7) in assumption (E) requires the nonzero partial correlations to be outside of the n −b/2 range, with b as in assumption (D). Note that this condition is similar to Assumption 5 in =-=[21]-=- and condition (8) in [36]. Finally, we note that assumption (F) is of the same spirit as Assumption 2 in [21]. Namely, if we scale Yn such that Var(Yn) = σ 2 for all n, then assumption (F) is implied... |

306 |
Statistics and Causal Inference
- Holland
- 1986
(Show Context)
Citation Context ...nal covariates. Using such observational data, we want to infer all (single gene) intervention effects. This task coincides with inferring causal effects, a well-established area in Statistics (e.g., =-=[5, 8, 10, 11, 13, 18, 24, 25, 26]-=- and [31]). We emphasize that, in our application, it is exactly the intervention or causal effect that is of interest, rather than a regression-type effect of association. If we can estimate the inte... |

245 |
Weak Convergence and Empirical Processes: With Application to Statistics
- Vaart, Wellner
- 1996
(Show Context)
Citation Context ...S| − 1)|S 2 σ 2 ni|S = P(χ 2 n−|S|−1 ≤ (n − |S| − 1)/2|S) ≤ P(χ 2 n−qn−1 ≤ (n − 1)/2), where χ2 k denotes a chi-squared random variable with k degrees of freedom. We now apply Bernstein’s inequality (=-=[32]-=-, Lemma 2.2.11, page 103) by writing P(χ 2 n−qn−1 ≤ (n − 1)/2) = P(χ 2 n−qn−1 − (n − qn − 1) ≤ −(n − 1)/2 + qn) ≤ P(|χ 2 n−qn−1 − (n − qn − 1)| ≥ (n − 1)/2 − qn). By viewing a χ2 n−qn−1 −(n −qn −1) ra... |

216 |
Equivalence and synthesis of causal models
- Verma, Pearl
- 1990
(Show Context)
Citation Context ...he causal effect of Xi on Y . We use this bound to determine variable importance. There is a large existing literature on estimating the equivalence class of DAGs (e.g., [2, 3, 4, 12, 14, 30, 31] and =-=[33]-=-), and there is also a large literature on estimating causal effects when a DAG is given (e.g., [18, 19, 23, 24] and [25]). Our new approach combines these two parts in order to estimate the multisets... |

191 |
Incidence matrices and interval graphs
- Fulkerson, Gross
- 1965
(Show Context)
Citation Context ...ring σ = (v1,...,vn) of its vertices, so that each vi is a simplicial vertex in the induced subgraph G {vi,...,vn}. Chordal graphs have many nice properties. We will use the following (cf. [1, 6] and =-=[9]-=-): 1. Every chordal graph G has a simplicial vertex. If G is not complete, then it has at least two nonadjacent simplicial vertices. 2. Chordality of graphs is a hereditary property: if G = (V,E) is c... |

190 |
Bayesian analysis in expert systems
- Spiegelhalter, Dawid, et al.
- 1993
(Show Context)
Citation Context ...a lower bound on the size of the causal effect of Xi on Y . We use this bound to determine variable importance. There is a large existing literature on estimating the equivalence class of DAGs (e.g., =-=[2, 3, 4, 12, 14, 30, 31]-=- and [33]), and there is also a large literature on estimating causal effects when a DAG is given (e.g., [18, 19, 23, 24] and [25]). Our new approach combines these two parts in order to estimate the ... |

166 |
On the desirability of acyclic database schemes
- Beeri, Fagin, et al.
- 1983
(Show Context)
Citation Context ...the DAG by letting it consist of 100 disconnected components (blocks) of 10 variables each. Subsequently, we equip all edges Xi ← Xj with edge weights βij which are drawn independently from a Uniform(=-=[1,2]-=-) distribution. For each k = 1,...,nreps in the two settings, the DAG G (k) with edge weights β (k) ij (9) defines an underlying distribution on (X(k) 1 ,...,X(k) p+1 ): let ε1,...,εp+1, for i = 1,...... |

161 | Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis
- Efron
- 2004
(Show Context)
Citation Context ...trongly estimated causal effects that are stable in a bootstrap analysis. In order to decide which causal scores should be considered “significantly high,” we use the local false discovery rate (FDR) =-=[7]-=-. The vertical line in Figure 6 shows the cut-off for a local FDR of 10%. About 200 of the 4088 genes fall to the right of this cut-off, and hence have a local FDR that is less than 10%. According to ... |

159 | Optimal structure identification with greedy search
- Chickering
(Show Context)
Citation Context |

133 |
Causal diagrams for epidemiologic research. Epidemiology
- Greenland, Pearl, et al.
- 1999
(Show Context)
Citation Context ...nal covariates. Using such observational data, we want to infer all (single gene) intervention effects. This task coincides with inferring causal effects, a well-established area in Statistics (e.g., =-=[5, 8, 10, 11, 13, 18, 24, 25, 26]-=- and [31]). We emphasize that, in our application, it is exactly the intervention or causal effect that is of interest, rather than a regression-type effect of association. If we can estimate the inte... |

130 | Causality - Pearl - 2000 |

128 | Learning equivalence classes of Bayesian-network structures
- Chickering
- 2002
(Show Context)
Citation Context |

114 |
On Rigid Circuit Graphs
- Dirac
- 1961
(Show Context)
Citation Context ... is an ordering σ = (v1,...,vn) of its vertices, so that each vi is a simplicial vertex in the induced subgraph G {vi,...,vn}. Chordal graphs have many nice properties. We will use the following (cf. =-=[1, 6]-=- and [9]): 1. Every chordal graph G has a simplicial vertex. If G is not complete, then it has at least two nonadjacent simplicial vertices. 2. Chordality of graphs is a hereditary property: if G = (V... |

87 |
Causal Inference without Counterfactuals (with comments and rejoinder
- Dawid
- 2000
(Show Context)
Citation Context |

82 |
Causal inference and causal explanation with background knowledge
- Meek
- 1995
(Show Context)
Citation Context ...tices in S, since, otherwise, GS→i contains a new v-structure with Xi as collider, and this contradicts the assumption that GS→i is locally valid. Next, we use the following facts that were proved in =-=[20]-=-, Proof of Theorem 3: (i) no orientation of the edges not oriented in G will create a directed cycleINTERVENTION EFFECTS FROM OBSERVATIONAL DATA 27 which includes an edge or edges that were oriented ... |

75 | Ancestral graph Markov models
- Richardson, Spirtes
(Show Context)
Citation Context |

74 |
On model selection consistency of
- Zhao, Yu
(Show Context)
Citation Context ...sumption (E) requires the nonzero partial correlations to be outside of the n −b/2 range, with b as in assumption (D). Note that this condition is similar to Assumption 5 in [21] and condition (8) in =-=[36]-=-. Finally, we note that assumption (F) is of the same spirit as Assumption 2 in [21]. Namely, if we scale Yn such that Var(Yn) = σ 2 for all n, then assumption (F) is implied by requiring that Var(Xni... |

73 |
Causal diagrams for empirical research. Biometrika 82
- Pearl
- 2007
(Show Context)
Citation Context ...xisting literature on estimating the equivalence class of DAGs (e.g., [2, 3, 4, 12, 14, 30, 31] and [33]), and there is also a large literature on estimating causal effects when a DAG is given (e.g., =-=[18, 19, 23, 24]-=- and [25]). Our new approach combines these two parts in order to estimate the multisets of possible causal effects Θi, i = 1,...,p. We use these multisets to determine bounds for causal effects and c... |

63 | Stability selection - MEINSHAUSEN, BÜHLMANN - 2010 |

56 | Causal inference from graphical models
- Lauritzen
- 1999
(Show Context)
Citation Context |

53 | A linear non-Gaussian acyclic model for causal discovery
- Shimizu, Hoyer, et al.
- 2006
(Show Context)
Citation Context ...t in a certain sense the Gaussian assumption makes things more difficult. If the linearity assumption is retained and the errors are assumed to be non-Gaussian, then the DAG can be uniquely recovered =-=[29]-=-, preventing the need to work with equivalence classes. We note, however, that Gaussianity is essential for our consistency proofs of the algorithms in highdimensional settings. Consistency of the PC-... |

50 |
Graphical Models. Oxford Statistical Science Series 17
- Lauritzen
- 1996
(Show Context)
Citation Context ...G with independent error terms is called Markovian, with respect to the DAG. Any Markovian distribution can be factorized as p+1 ∏ f(x1,...,xp+1) = f(xj|paj) see [25], Theorem 3.1, page 297; see also =-=[17]-=-, Section 3.2.2, for a formulation in terms of directed local or global Markov properties. In order to represent the effect of an intervention on a set of variables, [16] and [23] introduced so-called... |

48 | Estimating high-dimensional directed acyclic graphs with the PC-algorithm
- Kalisch, Bühlmann
(Show Context)
Citation Context |

33 |
Confounding and collapsibility in causal inference
- GREENLAND, PEARL, et al.
- 1999
(Show Context)
Citation Context |

20 |
Uniform consistency in causal inference
- Robins, Scheines, et al.
- 2003
(Show Context)
Citation Context ... (cf. [12] and [30]). In this paper, we will use the PC-algorithm, since this algorithm is computationally feasible and asymptotically consistent in sparse high-dimensional settings [14]. We refer to =-=[28]-=- and [35] for a discussion about pointwise versus uniform consistency of the PC-algorithm. 2.2. Intervention calculus. We now give a brief Introduction to intervention calculus, mostly based on [24] a... |

18 |
Causal inference via ancestral graph models. In Highly structured stochastic systems
- Richardson, Spirtes
- 2003
(Show Context)
Citation Context ...s among the observed variables, these CPDAGs may no longer be interpreted causally. Relaxing the assumption of unmeasured confounders is possible by extending our methodology to ancestral graphs (see =-=[26, 27]-=- and [34]) which allow for hidden variables. However, deriving bounds for causal effects when the underlying ancestral graph is unknown is an open issue. Another interesting direction of future resear... |

16 | On specifying graphical models for causation, and the identification problem
- Freedman
- 2003
(Show Context)
Citation Context |

11 | Statistics and causal inference: a review
- Pearl
- 2003
(Show Context)
Citation Context ...orization formula: f(x1,...,xp+1|do(Xi = x ′ ⎧ p+1 ⎪⎨ ∏ f(xj|pa i)) = j)| xi=x ⎪⎩ j=1,j̸=i ′ i , if xi = x ′ i , (1) 0, otherwise, where f(xj|pa j) are the pre-intervention conditional distributions (=-=[25]-=-, Corollary 3.1, page 297). Note that this formula uses the DAG structure (determining the sets pa j) to write the interventional distribution on the left-hand j=16 M. H. MAATHUIS, M. KALISCH AND P. ... |

7 |
Assessment of structured socioeconomic effects on health. Epidemiology
- JS, Kaufman
(Show Context)
Citation Context ...eorem 3.1, page 297; see also [17], Section 3.2.2, for a formulation in terms of directed local or global Markov properties. In order to represent the effect of an intervention on a set of variables, =-=[16]-=- and [23] introduced so-called do or set operators. In particular, they used expressions of the form f(y|do(Xi = x ′ i )) or f(y|set(Xi = x ′ i )) to denote the distribution of Y that would occur if t... |

6 | Causal reasoning with ancestral graphs
- Zhang
- 2008
(Show Context)
Citation Context ...bserved variables, these CPDAGs may no longer be interpreted causally. Relaxing the assumption of unmeasured confounders is possible by extending our methodology to ancestral graphs (see [26, 27] and =-=[34]-=-) which allow for hidden variables. However, deriving bounds for causal effects when the underlying ancestral graph is unknown is an open issue. Another interesting direction of future research consis... |

6 | Strong faithfulness and uniform consistency in causal inference
- Zhang, Spirtes
- 2003
(Show Context)
Citation Context ...] and [30]). In this paper, we will use the PC-algorithm, since this algorithm is computationally feasible and asymptotically consistent in sparse high-dimensional settings [14]. We refer to [28] and =-=[35]-=- for a discussion about pointwise versus uniform consistency of the PC-algorithm. 2.2. Intervention calculus. We now give a brief Introduction to intervention calculus, mostly based on [24] and [25]. ... |

3 |
R-package pcalg: Estimating the skeleton and equivalence class of a dag. Available at http://cran.r-project.org
- Kalisch, Mächler
- 2008
(Show Context)
Citation Context ...p + 1 is the number of vertices of the DAG and en is the expected neighborhood size of the DAG. The simulation of a single DAG with edge weights proceeds as follows. First, we use the R-package pcalg =-=[15]-=- to simulate a random DAG on X1,...,Xp+1 with the pre-specified expected neighborhood size en. In setting 2, we enforce a special block structure on the DAG by letting it consist of 100 disconnected c... |

1 |
R-package ggm: Graphical gaussian models. Available at http://cran.r-project.org
- Marchetti, Drton
- 2006
(Show Context)
Citation Context ...xisting literature on estimating the equivalence class of DAGs (e.g., [2, 3, 4, 12, 14, 30, 31] and [33]), and there is also a large literature on estimating causal effects when a DAG is given (e.g., =-=[18, 19, 23, 24]-=- and [25]). Our new approach combines these two parts in order to estimate the multisets of possible causal effects Θi, i = 1,...,p. We use these multisets to determine bounds for causal effects and c... |

1 |
Stability selection. Preprint. Available at arXiv:0809.2932v1
- Meinshausen, Bühlmann
- 2008
(Show Context)
Citation Context ...ubsampling frequency is larger than a certain cut-off. Surprisingly, this cut-off can be determined via controlling a multiple testing error rate. Details of such a generic procedure are described in =-=[22]-=-. 4.2. Incoherences with sample versions. Two types of incoherences may occur in the sample version of the PC-algorithm (but the probability of these incoherences converges to zero as the sample size ... |