## Causal discovery via MML (1996)

Venue: | IN: PROCEEDINGS OF THE THIRTEENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING |

Citations: | 21 - 10 self |

### BibTeX

@INPROCEEDINGS{Wallace96causaldiscovery,

author = {Chris Wallace and Kevin B. Korb and Honghua Dai},

title = { Causal discovery via MML},

booktitle = {IN: PROCEEDINGS OF THE THIRTEENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING},

year = {1996},

pages = {516--524},

publisher = {Morgan Kaufmann}

}

### Years of Citing Articles

### OpenURL

### Abstract

Automating the learning of causal models from sample data is a key step toward incorporating machine learning into decisionmaking and reasoning under uncertainty. This paper presents a Bayesian approach to the discovery of causal models, using a Minimum Message Length (MML) method. We have developed encoding and search methods for discovering linear causal models. The initial experimental results presented in this paper show that the MML induction approach can recover causal models from generated data which are quite accurate re ections of the original models and compare favorably with those of TETRAD II (Spirtes et al. 1994) even when it is supplied with prior temporal information and MML is not.

### Citations

7319 |
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...iscovery, minimum message length, MML induction, Bayesian learning, causal modeling, inductive inference, machine learning. 1 Introduction Bayesian network technology, despite being only a decade old =-=[19, 17]-=-, has already been applied to a wide variety of tasks involving reasoning under uncertainty. Even more recently the interest that has developed in the use of Bayesian networks has transferred to the l... |

942 | Learning Bayesian networks: The combination of knowledge and statistical data, Machine Learning
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ... a wide variety of tasks involving reasoning under uncertainty. Even more recently the interest that has developed in the use of Bayesian networks has transferred to the learning of Bayesian networks =-=[5, 11, 14]-=----which is very natural given the known difficulties and limitations of knowledge acquisition techniques, and especially the difficulties of eliciting probability estimates from domain experts [6]. A... |

315 |
An information measure for classification
- Wallace, Boulton
- 1968
(Show Context)
Citation Context ...nce tests, which ignore the prior probabilities of the candidate causal models. Here we report on initial results of our investigation into the use of Wallace's Minimum Message Length Principle (MML) =-=[27, 29, 28]-=- to search for and evaluate linear causal models given sample data, including experimental comparisons with TETRAD II. A linear causal model for a population is an ordered pair ! V; E ?, where V is a ... |

236 |
The American occupational structure
- Blau, Duncan
- 1967
(Show Context)
Citation Context ...AD II model(s) returned when no prior temporal information was made available. 4.1 Blau and Duncan's Model The Blau and Duncan model of Figure 2(a) is their stratification process model of occupation =-=[2]-=-. The variables are to be interpreted: 5 x 1 Introduced by authors (1) x 2 Father's education (2) x 3 Father's occupation (2) x 4 Respondent's occupation in 1962 (3) x 5 Respondent's education (3) x 6... |

222 |
Equivalence and synthesis of causal models
- Verma, Pearl
- 1990
(Show Context)
Citation Context ...rstanding statistically equivalent structures are: 3 Theorem 1 (Verma and Pearl, 1990) Two causal structures are statistically equivalent if and only if they have identical skeletons and v-structures =-=[26]-=-. V-structures refer to triples of nodes such that in the subgraph in the three nodes one of the them is the common effect of the other two and the two parents are not adjacent. Chickering has extende... |

195 | Learning Bayesian Belief Networks: An Approach Based on
- Lam, Bacchus
- 1994
(Show Context)
Citation Context ... a wide variety of tasks involving reasoning under uncertainty. Even more recently the interest that has developed in the use of Bayesian networks has transferred to the learning of Bayesian networks =-=[5, 11, 14]-=----which is very natural given the known difficulties and limitations of knowledge acquisition techniques, and especially the difficulties of eliciting probability estimates from domain experts [6]. A... |

191 |
Estimation and inference by compact coding
- Wallace, Freeman
- 1987
(Show Context)
Citation Context ...nce tests, which ignore the prior probabilities of the candidate causal models. Here we report on initial results of our investigation into the use of Wallace's Minimum Message Length Principle (MML) =-=[27, 29, 28]-=- to search for and evaluate linear causal models given sample data, including experimental comparisons with TETRAD II. A linear causal model for a population is an ordered pair ! V; E ?, where V is a ... |

181 | Erlbaum Associates - Lawrence - 1978 |

176 |
Modern Factor Analysis
- HARMAN
(Show Context)
Citation Context ...ame Problem in particular. The social sciences have developed over the course of the century a battery of statistical methods for studying causal models of social phenomena, including factor analysis =-=[10]-=-, path analysis [30], and structural equation modeling [7, 15]. The causal models so studied are more limited than Bayesian networks: effect variables are strictly additive, linear functions of exogen... |

150 |
Probabilistic Reasoning in Expert Systems
- Neapolitan
- 1990
(Show Context)
Citation Context ...iscovery, minimum message length, MML induction, Bayesian learning, causal modeling, inductive inference, machine learning. 1 Introduction Bayesian network technology, despite being only a decade old =-=[19, 17]-=-, has already been applied to a wide variety of tasks involving reasoning under uncertainty. Even more recently the interest that has developed in the use of Bayesian networks has transferred to the l... |

126 |
Latent variable models: An introduction to factor, path and structural analysis. Mahwah, NJ: Lawrence Erlbaum Associates
- Loehlin
- 1987
(Show Context)
Citation Context ...ed over the course of the century a battery of statistical methods for studying causal models of social phenomena, including factor analysis [10], path analysis [30], and structural equation modeling =-=[7, 15]-=-. The causal models so studied are more limited than Bayesian networks: effect variables are strictly additive, linear functions of exogenous variables. Although this is a significant limitation, only... |

113 |
The Interpretation of Interaction in Contingency Tables
- Simpson
- 1951
(Show Context)
Citation Context ...utperforming the MML search, in particular by picking up the weak link between variables 2 and 4. Of some interest is the coincidental fact that the Fiji model contains a version of Simpson's paradox =-=[22]-=-. That is, in the original model the combined effect of the causal links from variables 1 to 2 to 3 almost exactly counterbalances the direct effect from variable 1 to 3, leaving a marginal correlatio... |

93 | A transformational characterization of equivalent Bayesian network structures
- Chickering
- 1995
(Show Context)
Citation Context ... two and the two parents are not adjacent. Chickering has extended the Verma and Pearl criterion of statistical equivalence by converting it into a topological criterion that is even simpler to apply =-=[4]-=-: Theorem 2 (Chickering, 1995) Two causal structures are statistically equivalent if and only if there exists a sequence of covered arc reversals that transforms one into the other. A covered arc is o... |

88 |
The Theory of Probability
- Reichenbach
- 1949
(Show Context)
Citation Context ...between the random variables being considered. This is a kind of primitive, unaided form of induction which, on the one hand, is a necessary prerequisite to more sophisticated forms of induction (see =-=[20]-=-) and, on the other hand, is easier to set up and examine. The drawback with experimenting with primitive induction is that it is generally much harder to get useful results from it than from similar ... |

81 |
Introduction to structural equation models
- Duncan
- 1975
(Show Context)
Citation Context ...ed over the course of the century a battery of statistical methods for studying causal models of social phenomena, including factor analysis [10], path analysis [30], and structural equation modeling =-=[7, 15]-=-. The causal models so studied are more limited than Bayesian networks: effect variables are strictly additive, linear functions of exogenous variables. Although this is a significant limitation, only... |

72 |
The method of path coefficients
- Wright
- 1934
(Show Context)
Citation Context ...cular. The social sciences have developed over the course of the century a battery of statistical methods for studying causal models of social phenomena, including factor analysis [10], path analysis =-=[30]-=-, and structural equation modeling [7, 15]. The causal models so studied are more limited than Bayesian networks: effect variables are strictly additive, linear functions of exogenous variables. Altho... |

70 |
A bayesian method for constructing bayesian belief networks from databases
- Cooper, Herskovits
(Show Context)
Citation Context ... a wide variety of tasks involving reasoning under uncertainty. Even more recently the interest that has developed in the use of Bayesian networks has transferred to the learning of Bayesian networks =-=[5, 11, 14]-=----which is very natural given the known difficulties and limitations of knowledge acquisition techniques, and especially the difficulties of eliciting probability estimates from domain experts [6]. A... |

48 |
Counting linear extensions is #p-complete
- Brightwell, Winkler
- 1991
(Show Context)
Citation Context ...nown as the linear extensions of the DAG. Hence, L (s) 1 = log K! + K(K \Gamma 1) 2 \Gamma log M: (4) Unfortunately, counting the number of linear extensions of a DAG is known to be an NPhard problem =-=[3]-=-. There are efficient means of producing an upper bound to M [12], and we are investigating its use to provide an estimate for the value of M . In the meantime, we calculate M by brute force, with the... |

35 |
On the conductance of order Markov chains
- Karzanov, Khachiyan
- 1991
(Show Context)
Citation Context ...! + K(K \Gamma 1) 2 \Gamma log M: (4) Unfortunately, counting the number of linear extensions of a DAG is known to be an NPhard problem [3]. There are efficient means of producing an upper bound to M =-=[12]-=-, and we are investigating its use to provide an estimate for the value of M . In the meantime, we calculate M by brute force, with the understanding that this technique will be applicable only to mod... |

32 |
A General selection criterion for inductive inference
- Georgeff, Wallace
- 1984
(Show Context)
Citation Context ...nce tests, which ignore the prior probabilities of the candidate causal models. Here we report on initial results of our investigation into the use of Wallace's Minimum Message Length Principle (MML) =-=[27, 29, 28]-=- to search for and evaluate linear causal models given sample data, including experimental comparisons with TETRAD II. A linear causal model for a population is an ordered pair ! V; E ?, where V is a ... |

30 |
Causal Modeling
- Asher
- 1983
(Show Context)
Citation Context ...8 X X X X 1 2 3 4 0.161 -0.128 0.368 -0.058 -0.334 (a) Original (b) MML (c) TETRAD II (d) TETRAD II Figure 4: Fiji Fertility Model 4.4 Goldberg's Model Figure 5 illustrates Goldberg's mediation model =-=[1]-=- (p. 43) of voter preferences. The variables are: x 1 Father's social characteristics (1) x 2 Respondent social characteristics (2) x 3 Father's party identification (2) x 4 Respondent's party identif... |

26 | Some Algebraic Properties of the Reticular Action Model - McArdle, McDonald - 1984 |

26 |
Causality from Probability
- Spirtes, Glymour, et al.
- 1990
(Show Context)
Citation Context ...ommercially available program TETRAD II [25]. Their methods, however, while now incorporating a number of principles based upon Judea Pearl's work (specifically, what they call Principles I and II in =-=[23]-=-), otherwise rely upon orthodox statistical techniques, such as significance tests, which ignore the prior probabilities of the candidate causal models. Here we report on initial results of our invest... |

24 |
MML and Bayesianism: similarities and differences
- Oliver, Baxter
- 1994
(Show Context)
Citation Context ...stribution, the likelihood function is P (yj\Theta) = P (yjoe; fa k g) = N Y n=1 1 p 2��oe 2 e \Gamma 1 2oe 2 (yn \Gamma P k a k x nk ) 2 (16) 1 We are simplifying here by ignoring a volume term; =-=see [18]. An-=-d the message length for encoding the data given the model is given by L(DjH) = \GammalnP (datajparameters) (17) = \GammalnP (yjoe; fa k g) (18) = \Gammaln[ N Y i=1 1 p 2��oe 2 e \Gamma(y i \Gamma... |

12 |
Dependence, Inequality, and the Growth of the Tertiary: A Comparative Analysis of Less Developed Countries.” American Sociological Review 45:531–52
- Evans, Timberlake
- 1980
(Show Context)
Citation Context ...265 0.200 0.106 0.449 0.435 0.239 (g) TETRAD II (5) (h) TETRAD II (6) (i) TETRAD II (7) Figure 2: Blau and Duncan's Model 4.2 Evans' Model Figure 3 illustrates Peter Evans' income inequality Model II =-=[8]-=- and the models induced by MML and TETRAD II. The variables are: x 1 Investment dependence (2) x 2 Per capita GDP (2) x 3 Change in service sector employment (3) x 4 Gini index of income inequality (4... |

10 |
TETRAD II: Tools for causal modeling. Lawrence Erlbaum Associates
- Scheines, Spirtes, et al.
- 1994
(Show Context)
Citation Context ...approach can recover causal models from generated data which are quite accurate reflections of the original models; our results compare favorably with those of the TETRAD II program of Spirtes et al. =-=[25]-=- even when their algorithm is supplied with prior temporal information and MML is not. Keywords: Causal discovery, minimum message length, MML induction, Bayesian learning, causal modeling, inductive ... |

5 |
Inductive learning and defeasible inference
- Korb
- 1995
(Show Context)
Citation Context ...n, the automation of their learning is potentially tantamount to the automation of scientific inductive practice--- and promises, thereby, a substantial advance in solving the "AI problem" i=-=n general [13]-=- and the Frame Problem in particular. The social sciences have developed over the course of the century a battery of statistical methods for studying causal models of social phenomena, including facto... |

5 |
Causal Models of Publishing Productivity in Psychology
- Rogers, Maranto
- 1989
(Show Context)
Citation Context ...649 0.744 0.203 X X X X 1 2 3 4 0.753 0.088 0.758 0.729 (a) Original (b) MML (c) TETRAD II (d) TETRAD II Figure 6: Miller and Stokes' Model 4.6 Rodgers and Maranto's Model Rodgers and Maranto's model =-=[21]-=- of publishing productivity among academic psychologists is shown in Figure 7(a). The variables are to be interpreted: x 1 Ability (1) x 2 Graduate school program quality (2) x 3 Quality of the first ... |

4 |
Slovic and Amos Tversky, Judgment under Uncertainty: Heuristics and Biases
- Kahneman, Paul
- 1982
(Show Context)
Citation Context ...1, 14]---which is very natural given the known difficulties and limitations of knowledge acquisition techniques, and especially the difficulties of eliciting probability estimates from domain experts =-=[6]-=-. As these networks are plausibly understood (in many cases) as describing the causal structure of some physical phenomenon, the automation of their learning is potentially tantamount to the automatio... |

4 |
Mml and bayesianism: similarities and di erences
- Oliver, Baxter
- 1994
(Show Context)
Citation Context ...1987), the message length for encoding the parameters is 1 L (p) = ;ln( h(parms) p F where )= 1 lnF ; ln h(parms) (5) 2 F =2N ;2(K+1) jAj (6) 1 We are simplifying here by ignoring a volume term� see (=-=Oliver and Baxter 1994-=-).is the Fisher information 2 and where A is the data matrix (xi xj)K K with xj =(x1j�:::�xNj)forj = 1� 2�:::�K. Assuming the parameters are normally distributed and a prior of 1= for , h(parms) = pr... |

1 | Judgment under Uncertainty: Heuristics and Biases - Dependence - 1982 |

1 |
Discovering Causal Structure: Arti cial Intelligence, Philosophy of Science, and Statistical Modeling
- Glymour, Scheines, et al.
- 1987
(Show Context)
Citation Context ...y of the issues for AI, there has been only one substantial research program aimed at the automated learning of linear models, which is that of Clark Glymour et al. at Carnegie Mellon University (see =-=Glymour et al. 1987-=- and Spirtes et al. 1993). Their approach has shown successes, leading to the program TETRAD II (Spirtes et al. 1994). Their methods, however, while now incorporating principles based upon Judea Pearl... |