## Estimating high-dimensional directed acyclic graphs with the PC-algorithm (2007)

Venue: | JOURNAL OF MACHINE LEARNING RESEARCH |

Citations: | 47 - 4 self |

### BibTeX

@TECHREPORT{Kalisch07estimatinghigh-dimensional,

author = {Markus Kalisch and Peter Bühlmann},

title = {Estimating high-dimensional directed acyclic graphs with the PC-algorithm},

institution = {JOURNAL OF MACHINE LEARNING RESEARCH},

year = {2007}

}

### Years of Citing Articles

### OpenURL

### Abstract

We consider the PC-algorithm (Spirtes et al., 2000) for estimating the skeleton and equivalence class of a very high-dimensional directed acyclic graph (DAG) with corresponding Gaussian distribution. The PC-algorithm is computationally feasible and often very fast for sparse problems with many nodes (variables), and it has the attractive property to automatically achieve high computational efficiency as a function of sparseness of the true underlying DAG. We prove uniform consistency of the algorithm for very high-dimensional, sparse DAGs where the number of nodes is allowed to quickly grow with sample size n, as fast as O(n a) for any 0 < a < ∞. The sparseness assumption is rather minimal requiring only that the neighborhoods in the DAG are of lower order than sample size n. We also demonstrate the PC-algorithm for simulated data.

### Citations

1139 | Al Introduction to Multivariate Statistical Analysis - Anderson - 1985 |

1102 | Graphical Models - LAURITZEN - 1996 |

903 | Learning Bayesian networks: The combination of knowledge and statistical data
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...or moderate. For example, the search space may be restricted to trees as in MWST (Maximum Weight Spanning Trees; see Chow and Liu, c○2007 Markus Kalisch and Peter Bühlmann.sKALISCH AND BÜHLMANN 1968; =-=Heckerman et al., 1995-=-), or a greedy search is employed. The greedy DAG search can be improved by exploiting probabilistic equivalence relations, and the search space can be reduced from individual DAGs to equivalence clas... |

637 | Approximating discrete probability distributions with dependence trees - Chow, Liu - 1968 |

382 | High-dimensional graphs and variable selection with the lasso - Meinshausen, Bühlmann |

246 |
Learning Bayesian networks
- Neapolitan
- 2004
(Show Context)
Citation Context ...keleton 1. Introduction Graphical models are a popular probabilistic tool to analyze and visualize conditional independence relationships between random variables (see Edwards, 2000; Lauritzen, 1996; =-=Neapolitan, 2004-=-). Major building blocks of the models are nodes, which represent random variables and edges, which encode conditional dependence relations of the enclosing vertices. The structure of conditional inde... |

222 | On model selection consistency of Lasso - Zhao, Yu - 2006 |

217 | Equivalence and synthesis of causal models - Verma, Pearl - 1990 |

205 | A theory of inferred causation - Pearl, Verma - 1991 |

166 |
Introduction to graphical modelling
- Edwards
- 2000
(Show Context)
Citation Context ...graphical model, PC-algorithm, skeleton 1. Introduction Graphical models are a popular probabilistic tool to analyze and visualize conditional independence relationships between random variables (see =-=Edwards, 2000-=-; Lauritzen, 1996; Neapolitan, 2004). Major building blocks of the models are nodes, which represent random variables and edges, which encode conditional dependence relations of the enclosing vertices... |

159 | Optimal structure identification with greedy search - HEMMECKE, Chickering |

129 | Learning equivalence classes of Bayesian-network structure - Chickering - 2002 |

82 |
Causal inference and causal explanation with background knowledge
- Meek
- 1995
(Show Context)
Citation Context ...mpleted perfectly, that is, if there was no error while testing conditional independencies (it is not enough to assume that the skeleton was estimated correctly), the second part will never fail (see =-=Meek, 1995-=-b). Therefore, we easily obtain: Theorem 2 Assume (A1)-(A4). Denote by ˆGCPDAG(αn) the estimate from the entire PC-algorithm in Section 2.2.2 and 2.3 and by GCPDAG the true CPDAG from the DAG G. Then,... |

75 | Aliferis. The max-min hill-climbing Bayesian network structure learning algorithm - Tsamardinos, Brown, et al. - 2006 |

49 |
Strong completeness and faithfulness in Bayesian networks
- Meek
- 1995
(Show Context)
Citation Context ...mpleted perfectly, that is, if there was no error while testing conditional independencies (it is not enough to assume that the skeleton was estimated correctly), the second part will never fail (see =-=Meek, 1995-=-b). Therefore, we easily obtain: Theorem 2 Assume (A1)-(A4). Denote by ˆGCPDAG(αn) the estimate from the entire PC-algorithm in Section 2.2.2 and 2.3 and by GCPDAG the true CPDAG from the DAG G. Then,... |

39 | Counting labeled acyclic digraphs - Robinson - 1971 |

37 | On feature selection: Learning with exponentially many irrelevant features as training examples - Ng - 1998 |

30 | Bayesian analysis in expert systems (with discussion - SPIEGELHALTER, DAWID, et al. - 1993 |

29 | Tractable learning of large bayes net structures from sparse data - Goldenberg, Moore - 2004 |

21 |
New light on the correlation coefficient and its transforms
- Hotelling
- 1953
(Show Context)
Citation Context ...metry implies, IPρ[ˆρ < ρ − γ] = IP˜ρ[ˆρ > ˜ρ + γ] with ˜ρ = −ρ. (6) Thus, it suffices to show that IP[ˆρ > ρ + γ] = IPρ[ˆρ > ρ + γ] decays exponentially in n, uniformly for all ρ. It has been shown (=-=Hotelling, 1953-=-, p.201, Formula (29)), that for −1 < ρ < 1, with = IP[ˆρ > ρ + γ] ≤ Z 1 M0(ρ + γ) = Z 1 ρ+γ ≤ (1 − ρ2 ) 3 2 (1 − |ρ|) 5 2 (n − 1)Γ(n) √ 1 2πΓ(n + 2 )M0(ρ + γ)(1 + 2 ) (7) 1 − |ρ| ρ+γ (1 − ρ 2 ) n 2 (... |

20 | Uniform consistency in causal inference - Robins, Scheines, et al. - 2003 |

13 | Enumerating markov equivalence classes of acyclic digraph models
- Gillispie, Perlman
- 2001
(Show Context)
Citation Context ...d seems quite promising when having few or a moderate number of nodes, it is limited by the fact that the space of equivalence classes is conjectured to grow super-exponentially in the nodes as well (=-=Gillispie and Perlman, 2001-=-). Bayesian approaches for DAGs, which are computationally very intensive, include Spiegelhalter et al. (1993) and Heckerman et al. (1995). An interesting alternative to greedy or structurally restric... |

6 |
The distribution of the partial correlation coefficient
- Fisher
- 1924
(Show Context)
Citation Context ...s complete (note that the proof assumed sample size n + 1). � Lemma 1 can be easily extended to partial correlations, as shown by Fisher (1924), using projections for Gaussian distributions. Lemma 2 (=-=Fisher, 1924-=-) Assume (A1) (without requiring faithfulness). If the cumulative distribution function of ˆρn;i, j is denoted by F(·|n,ρn;i, j), then the cdf of the sample partial correlation ˆρ n;i, j|k with |k| = ... |

6 | Strong faithfulness and uniform consistency in causal inference - Zhang, Spirtes - 2003 |

1 | pcalg: an R-package for the PC-algorithm (in progress - Kalisch - 2005 |