## A simple constraint-based algorithm for efficiently mining observational databases for causal relationships (1997)

Venue: | Data Mining and Knowledge Discovery |

Citations: | 28 - 2 self |

### BibTeX

@ARTICLE{Cooper97asimple,

author = {Gregory F. Cooper},

title = {A simple constraint-based algorithm for efficiently mining observational databases for causal relationships},

journal = {Data Mining and Knowledge Discovery},

year = {1997},

volume = {1},

pages = {203--224}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. This paper presents a simple, efficient computer-based method for discovering causal relationships from databases that contain observational data. Observational data is passively observed, as contrasted with experimental data. Most of the databases available for data mining are observational. There is great potential for mining such databases to discover causal relationships. We illustrate how observational data can constrain the causal relationships among measured variables, sometimes to the point that we can conclude that one variable is causing another variable. The presentation here is based on a constraint-based approach to causal discovery. A primary purpose of this paper is to present the constraint-based causal discovery method in the simplest possible fashion in order to (1) readily convey the basic ideas that underlie more complex constraint-based causal discovery techniques, and (2) permit interested readers to rapidly program and apply the method to their own databases, as a start toward using more elaborate causal discovery algorithms.

### Citations

7052 |
Probabilistic reasoning in intelligent systems: networks of plausible inference
- Pearl
- 1988
(Show Context)
Citation Context ...node are interpreted as directly causing that node, relative to the other nodes in the model. For a more detaileds206 COOPER discussion of Bayesian networks, see (Castillo et al., 1997; Jensen, 1996; =-=Pearl, 1988-=-; Spirtes et al., 1993). Let S be the graphical structure of G and let P be the joint probability distribution represented by G. By definition, S is a directed, acyclic graph. A node in S denotes a va... |

1122 |
Statistical Analysis with Missing Data
- Little, Rubin
- 1987
(Show Context)
Citation Context ...at X causes Y . A third solution to the problem is to fill in each missing value of each variable with some admissible value for the variable. There are numerous methods for assigning missing values (=-=Little and Rubin, 1987-=-). Hopefully, of course, the substituted values correspond closely to the actual, underlying values, but in general there is no guarantee that this will be the case.s216 COOPER Assumption 2 (Discrete ... |

1075 | Herskovitz: A Bayesian Method for the Induction
- Cooper, E
- 1992
(Show Context)
Citation Context ...ay become very small, thus jeopardizing the validity of Assumption 6. 5.2. Bayesian causal discovery Bayesian methods have been developed for discovering causal relationships from observational data (=-=Cooper and Herskovits, 1992-=-; Heckerman, 1996; Heckerman et al., 1995). These methods differ in several ways from constraint-based methods. First, the methods take a user-specified prior probability over Bayesian network structu... |

905 |
An Introduction to Bayesian Networks
- Jensen
- 1996
(Show Context)
Citation Context ... parents of a node are interpreted as directly causing that node, relative to the other nodes in the model. For a more detaileds206 COOPER discussion of Bayesian networks, see (Castillo et al., 1997; =-=Jensen, 1996-=-; Pearl, 1988; Spirtes et al., 1993). Let S be the graphical structure of G and let P be the joint probability distribution represented by G. By definition, S is a directed, acyclic graph. A node in S... |

903 | Learning Bayesian networks: The combination of knowledge and statistical data
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...lidity of Assumption 6. 5.2. Bayesian causal discovery Bayesian methods have been developed for discovering causal relationships from observational data (Cooper and Herskovits, 1992; Heckerman, 1996; =-=Heckerman et al., 1995-=-). These methods differ in several ways from constraint-based methods. First, the methods take a user-specified prior probability over Bayesian network structures and parameters. If the user has littl... |

854 | A tutorial on learning with bayesian networks
- Heckerman
- 1995
(Show Context)
Citation Context ...opardizing the validity of Assumption 6. 5.2. Bayesian causal discovery Bayesian methods have been developed for discovering causal relationships from observational data (Cooper and Herskovits, 1992; =-=Heckerman, 1996-=-; Heckerman et al., 1995). These methods differ in several ways from constraint-based methods. First, the methods take a user-specified prior probability over Bayesian network structures and parameter... |

496 |
Causation, Prediction, and Search
- Spirtes, Glymour, et al.
- 1993
(Show Context)
Citation Context ...ed and complex algorithms. Additionally, the basic ideas presented here should make it easier for readers to understand the general theory of constraint-based causal discovery (Pearl and Verma, 1991; =-=Spirtes et al., 1993-=-). In Section 5, we discuss the relationship between constraint-based and Bayesian methods for causal discovery. 2. Assumptions for causal discovery In this section, we describe six assumptions that a... |

186 |
Discrete Multivariate Analysis
- Bishop, Fienberg, et al.
- 1975
(Show Context)
Citation Context ... database D to determine if A is marginally independent of B. The function Independent might, for example, be based on classical statistical tests such as a chi-squared or a G 2 test of independence (=-=Bishop et al., 1975-=-). For such classical tests, we would need to specify the statistical threshold(s) needed to apply a given implementation of Independent. The appendix contains a Bayesian implementation of Independent... |

176 | Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables
- Chickering, Heckerman
- 1997
(Show Context)
Citation Context ...ation with current Bayesian methods often is intractable, even when the causal graphs contain only a few variables. The use of sampling and approximation methods, however, recently has shown promise (=-=Chickering and Heckerman, 1996-=-). In summary, even though exact application of Bayesian methods often is intractable, approximate solutions may be acceptable. Another challenge of applying Bayesian methods for causal discovery is t... |

174 |
Expert Systems and Probabilistic Network Models
- Castillo, Gutierrez, et al.
- 1997
(Show Context)
Citation Context ...an network in which the parents of a node are interpreted as directly causing that node, relative to the other nodes in the model. For a more detaileds206 COOPER discussion of Bayesian networks, see (=-=Castillo et al., 1997-=-; Jensen, 1996; Pearl, 1988; Spirtes et al., 1993). Let S be the graphical structure of G and let P be the joint probability distribution represented by G. By definition, S is a directed, acyclic grap... |

172 | Causal diagrams for empirical research
- Pearl
- 1995
(Show Context)
Citation Context ... captures exactly the conditional independence relationships that are implied by the Markov condition (Geiger et al., 1990; Meek, 1995; Pearl, 1988) 2 . The following is a definition of d-separation (=-=Pearl, 1994-=-): Let A, B, and C be disjoint subsets of the nodes in S. Let p be any acyclic path between a node in A and a node in B, where an acyclic path is any succession of arcs, regardless of their directions... |

110 |
Identifying independence in Bayesian networks
- Geiger, Verma, et al.
- 1990
(Show Context)
Citation Context ...us, they will give us no information about the distribution of X. A criterion called d-separation captures exactly the conditional independence relationships that are implied by the Markov condition (=-=Geiger et al., 1990-=-; Meek, 1995; Pearl, 1988) 2 . The following is a definition of d-separation (Pearl, 1994): Let A, B, and C be disjoint subsets of the nodes in S. Let p be any acyclic path between a node in A and a n... |

58 |
Bayesian belief networks: from construction to evidence
- Bouckaert
- 1995
(Show Context)
Citation Context ...ptions 1 through 6 hold, and when there are no hidden variables, in the large sample limit the Bayesian methods and PC will identify the same set of causal relationships among the measured variables (=-=Bouckaert, 1995-=-). If there are hidden variables, however, the Bayesian methods can make distinctions that PC and FCI cannot make. For example, the Bayesian methods sometimes can determine the likely number of values... |

49 |
Strong completeness and faithfulness in Bayesian networks
- Meek
- 1995
(Show Context)
Citation Context ... no information about the distribution of X. A criterion called d-separation captures exactly the conditional independence relationships that are implied by the Markov condition (Geiger et al., 1990; =-=Meek, 1995-=-; Pearl, 1988) 2 . The following is a definition of d-separation (Pearl, 1994): Let A, B, and C be disjoint subsets of the nodes in S. Let p be any acyclic path between a node in A and a node in B, wh... |

26 |
Computer-Based Probabilistic-Network Construction, Doctoral Dissertation
- Herskovits
- 1991
(Show Context)
Citation Context ...e + f + g + h)/(a + b + c + d + e + f + g + h)>tthen Dependent := true else Dependent := false; end {Dependent}. One Bayesian metric for computing Pr(S, D), which is derived in Cooper and Herskovits (=-=Herskovits, 1991-=-), is as follows. Let Z be a set of n discrete variables, where a variable Xi in Z has ri possible value assignments: (vi1,...,viri ). Let D be a database of m cases, where each case contains a value ... |

24 |
G.: An evaluation of an algorithm for inductive learning of Bayesian belief networks using simulated data sets
- Aliferis, Cooper
- 1994
(Show Context)
Citation Context ...r of graph variables. In simulation experiments, however, the application of Bayesian methods with heuristic search techniques has been effective in recovering causal structure on measured variables (=-=Aliferis and Cooper, 1994-=-; Cooper and Herskovits, 1992; Heckerman et al., 1995). When there are hidden variables, exact computation with current Bayesian methods often is intractable, even when the causal graphs contain only ... |

18 | Identifying independencies in causal graphs with feedback - Pearl, Dechter - 1996 |

14 |
Causal discovery from data in the presence of selection bias
- COOPER, F
- 1995
(Show Context)
Citation Context ...l emergency room, where we have been collecting our data. In this situation, it is possible for X and Y to be dependent, due to selection bias, even though none of the relationships in Table 3 holds (=-=Cooper, 1995-=-). Such bias can persist, regardless of how large the sample size, and it may lead to LCD erroneously concluding that X causes Y . Although a detailed treatment is beyond the scope of this paper, rese... |

6 |
Understanding causality
- J, Garcia
- 1977
(Show Context)
Citation Context ... those more sophisticated and complex algorithms. Additionally, the basic ideas presented here should make it easier for readers to understand the general theory of constraint-based causal discovery (=-=Pearl and Verma, 1991-=-; Spirtes et al., 1993). In Section 5, we discuss the relationship between constraint-based and Bayesian methods for causal discovery. 2. Assumptions for causal discovery In this section, we describe ... |

1 |
Web page on Software for Learning Belief Networks from Data, http://bayes.stat.washington.edu/almond/belfit.html#BNG
- Almond
- 1997
(Show Context)
Citation Context ...rking assumptions. This is an important empirical issue that needs to be addressed much more extensively. Hopefully, this paper will encourage readers to apply causal discovery methods to their data (=-=Almond, 1997-=-; Heckerman, 1996; Scheines et al., 1995), and thereby help to determine the real-world utility of the methods.sEFFICIENTLY MINING OBSERVATIONAL DATABASES 221 Appendix This appendix describes one poss... |

1 | A discovery algorithm for directed causal graphs - Richardson - 1996 |

1 |
Tetrad II: Tools for Causal Modeling (with software). Mahwah
- Scheines, Spirtes, et al.
- 1995
(Show Context)
Citation Context ...causal discovery algorithms that uses background knowledge about an uncaused variable (W ). The PC and FCI algorithms are described in detail in (Spirtes et al., 1993) and are available commercially (=-=Scheines et al., 1995-=-). While these two algorithms certainly are more difficult to implement than LCD, in an absolute sense they are not especially difficult to implement. PC assumes no hidden variables, while FCI allows ... |