## Learning Bayesian belief networks: An approach based on the MDL principle (1994)

### Cached

### Download Links

- [www.cs.iastate.edu]
- [newlogos.uwaterloo.ca]
- [newlogos.uwaterloo.ca]
- [www.cs.toronto.edu]
- [www.cs.toronto.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | Computational Intelligence |

Citations: | 188 - 8 self |

### BibTeX

@ARTICLE{Lam94learningbayesian,

author = {Wai Lam and Fahiem Bacchus},

title = {Learning Bayesian belief networks: An approach based on the MDL principle},

journal = {Computational Intelligence},

year = {1994},

volume = {10},

pages = {269--293}

}

### Years of Citing Articles

### OpenURL

### Abstract

A new approach for learning Bayesian belief networks from raw data is presented. The approach is based on Rissanen's Minimal Description Length (MDL) principle, which is particularly well suited for this task. Our approach does not require any prior assumptions about the distribution being learned. In particular, our method can learn unrestricted multiply-connected belief networks. Furthermore, unlike other approaches our method allows us to tradeo accuracy and complexity in the learned model. This is important since if the learned model is very complex (highly connected) it can be conceptually and computationally intractable. In such a case it would be preferable to use a simpler model even if it is less accurate. The MDL principle o ers a reasoned method for making this tradeo. We also show that our method generalizes previous approaches based on Kullback cross-entropy. Experiments have been conducted to demonstrate the feasibility of the approach. Keywords: Knowledge Acquisition � Bayes Nets � Uncertainty Reasoning. 1

### Citations

7052 |
Probabilistic Reasoning in Intelligent Systems
- Pearl
- 1988
(Show Context)
Citation Context ...ch (1990) that can discover a minimal-edge I-map. A network structure is an I-map of a probability distribution if every independence relation exhibited in the network holds also in the distribution (=-=Pearl, 1988-=-; Geiger and Pearl, 1990). However, their approach is again limited to polytrees; it is only guaranteed to work in the case where the underlying distribution has a polytree structure. All of the above... |

1240 |
A.: On information and sufficiency
- KULLBACK, LEIBLER
- 1951
(Show Context)
Citation Context ...ned tree network was the closest of all tree networks to the underlying distribution of the raw data. The criterion of "closeness" they used was the well-known Kullback-Leibler cross-entropy=-= measure (Kullback and Leibler, 1951-=-). The main restriction of this work was that it could only learn tree structures. Hence, if the raw data was the result of a non-tree structured distribution, the learned structure could be very inac... |

1160 |
Modeling by shortest data description
- Rissanen
- 1978
(Show Context)
Citation Context ...ill capable of learning a complex network if no simpler network is sufficiently accurate. To make this tradeoff we use a well-studied formalism: Rissanen's Minimum Description Length (MDL) Principle (=-=Rissanen, 1978-=-). Besides the reasons given above, making a tradeoff between accuracy and usefulness seems to be particularly important when learning from raw data. The raw data is itself only an approximate picture... |

637 | Approximating discrete probability distributions with dependence trees
- Chow, Liu
- 1968
(Show Context)
Citation Context ..., we can develop an approach to evaluating cross-entropy that uses local computation over low-order marginals. This approach is an extension of previous work due to Chow and Liu (1968). Chow and Liu (=-=Chow and Liu, 1968-=-) developed a method for finding a tree structure that minimized the cross-entropy, and their method was extended by Rebane and Pearl (1987) to finding polytrees with minimal cross-entropy. Theorem 3.... |

580 |
The computational complexity of probabilistic inference using bayesian belief networks
- Cooper
- 1990
(Show Context)
Citation Context ...ifficult to deal with. It is well known that in the worst case it is intractable to compute posterior probabilities in multiply-connected Bayesian networks; to be precise this computation is NP-Hard (=-=Cooper, 1990-=-). Furthermore, the time complexity of the known algorithms increases with the degree of connectivity of the network. For large multiply-connected networks approximation algorithms are often used, eit... |

498 |
Stochastic Complexity
- Rissanen
- 1989
(Show Context)
Citation Context ...as "X is independent of Y, given Z", Geiger et al. developed an approach [6] that can 1 Rissanen provides a lucid and convincing argument that discovering useful models is the real concern o=-=f science [14]-=-. discover a minimal-edge I-map[10]. However, their approach is again limited to polytrees; it is only guaranteed to work in the case where the underlying distribution has an exact polytree structure.... |

373 | Fusion, Propagation and Structuring in Belief Networks - Pearl - 1986 |

287 | Bayesian Updating in Causal Probabilistic Networks by Local Computations - Jensen, Lauritzen, et al. - 1990 |

254 | Approximating Probabilistic Inference in Bayesian Belief Networks
- Dagum
- 1993
(Show Context)
Citation Context ...ree of connectivity of the network. For large multiply-connected networks approximation algorithms are often used, either based on stochastic simulation, e.g., (Chavez and Cooper, 1990; Chavez, 1990; =-=Dagum and Chavez, 1991-=-; Fung and Chang, 1990; Henrion, 1987; Pearl, 1987; Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (Cooper, 1984; Henrion, 1990; Henrion, 1991; Peng and Reg... |

239 |
The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks
- Beinlich, Suermondt, et al.
- 1989
(Show Context)
Citation Context ...t paradigm for representing and reasoning with uncertainty. Systems based on Bayesian networks have been constructed in a number of different application areas, ranging from medical diagnosis, e.g., (=-=Beinlich et al., 1989-=-), to reasoning about the oil market, e.g., (Abramson, 1991). Despite these successes, a major obstacle to using Bayesian networks lies in the difficulty of constructing them in complex domains. It ca... |

217 |
Equivalence and synthesis of causal models
- Verma, Pearl
- 1990
(Show Context)
Citation Context ...nnected networks, which topologically are directed acyclic graphs (dags). Recently, Spirtes et al. [16] have developed an algorithm that can construct multiply-connected networks. And Verma and Pearl =-=[17, 11]-=- have developed what they call an IC-Algorithm that can also recover these kinds of structures. However, both approaches require that the underlying distribution being learned be dagisomorphic. 2 But,... |

205 | A theory of inferred causation - Pearl, Verma - 1991 |

184 |
Propagating uncertainty in Bayesian networks by probabilistic logic sampling
- Henrion
- 1988
(Show Context)
Citation Context ...multiply-connected networks approximation algorithms are often used, either based on stochastic simulation, e.g., (Chavez and Cooper, 1990; Chavez, 1990; Dagum and Chavez, 1991; Fung and Chang, 1990; =-=Henrion, 1987-=-; Pearl, 1987; Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (Cooper, 1984; Henrion, 1990; Henrion, 1991; Peng and Reggia, 1987a; Peng and Reggia, 1987b). ... |

156 |
Simulation approaches to general probabilistic inference on belief networks
- Shachter, Peot
- 1990
(Show Context)
Citation Context ...approximation algorithms are often used, either based on stochastic simulation, e.g., (Chavez and Cooper, 1990; Chavez, 1990; Dagum and Chavez, 1991; Fung and Chang, 1990; Henrion, 1987; Pearl, 1987; =-=Shachter and Peot, 1990-=-), or search through the space of alternative instantiations, e.g., (Cooper, 1984; Henrion, 1990; Henrion, 1991; Peng and Reggia, 1987a; Peng and Reggia, 1987b). In practice these algorithms allow one... |

100 |
Weighing and integrating evidence for stochastic simulation in bayesian networks
- Fung, Chang
- 1990
(Show Context)
Citation Context ...he network. For large multiply-connected networks approximation algorithms are often used, either based on stochastic simulation, e.g., (Chavez and Cooper, 1990; Chavez, 1990; Dagum and Chavez, 1991; =-=Fung and Chang, 1990-=-; Henrion, 1987; Pearl, 1987; Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (Cooper, 1984; Henrion, 1990; Henrion, 1991; Peng and Reggia, 1987a; Peng and R... |

100 | Causality: Models
- Pearl
- 2000
(Show Context)
Citation Context ...t can discover a minimal-edge I-map. A network structure is an I-map of a probability distribution if every independence relation exhibited in the network holds also in the distribution (Pearl, 1988; =-=Geiger and Pearl, 1990-=-). However, their approach is again limited to polytrees; it is only guaranteed to work in the case where the underlying distribution has a polytree structure. All of the above approaches fail to reco... |

93 |
Evidential reasoning using stochastic simulation of causal models
- Pearl
- 1987
(Show Context)
Citation Context ...ted networks approximation algorithms are often used, either based on stochastic simulation, e.g., (Chavez and Cooper, 1990; Chavez, 1990; Dagum and Chavez, 1991; Fung and Chang, 1990; Henrion, 1987; =-=Pearl, 1987-=-; Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (Cooper, 1984; Henrion, 1990; Henrion, 1991; Peng and Reggia, 1987a; Peng and Reggia, 1987b). In practice t... |

85 | Data Compression
- Lelewer, Hirschberg
(Show Context)
Citation Context ...ing distribution each atomic event e i has probability p i . Then Huffman's algorithm, when run using these probabilities, will assign event e i a codeword of length approximately \Gammalog 2 (p i ) (=-=Lelewer and Hirschberg, 1987-=-). When we have N data points, where N is large, we would expect that there will be Np i occurrences of event e i . Hence, the length of the string encoding the database will be approximately \Gamma N... |

67 |
A Bayesian Method for Constructing Bayesian Belief Networks from Databases
- Cooper, Herskovits
- 1991
(Show Context)
Citation Context ... the function D respectively. One additional feature of our approach, in particular a feature of our heuristic search algorithm, is that we did not require a user supplied ordering of variables, cf. (=-=Cooper and Herskovits, 1991-=-). We feel that this experiment demonstrates that our approach is feasible for recovering Bayesian networks of practical size. In the third set of experiments, the original Bayesian network G6 consist... |

47 | The recovery of causal poly-trees from statistical data - Rebane, Pearl - 1989 |

47 | Counting unlabeled acyclic digraphs - Robinson - 1977 |

45 |
probabilistic causal model for diagnostic prblem solvingpart I : integrating symbolic causal inference with numeric probabili stic inference
- PENG, REGGIAJ
(Show Context)
Citation Context ...Chavez, 1991; Fung and Chang, 1990; Henrion, 1987; Pearl, 1987; Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (Cooper, 1984; Henrion, 1990; Henrion, 1991; =-=Peng and Reggia, 1987-=-a; Peng and Reggia, 1987b). In practice these algorithms allow one to reason with more complex networks than can be handled by the exact algorithms. However, it has recently been shown that in general... |

39 |
Search-based methods to bound diagnostic probabilities in very large belief nets
- Henrion
- 1991
(Show Context)
Citation Context ...990; Dagum and Chavez, 1991; Fung and Chang, 1990; Henrion, 1987; Pearl, 1987; Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (Cooper, 1984; Henrion, 1990; =-=Henrion, 1991-=-; Peng and Reggia, 1987a; Peng and Reggia, 1987b). In practice these algorithms allow one to reason with more complex networks than can be handled by the exact algorithms. However, it has recently bee... |

28 | Learning causal trees from dependence information - Geiger, Paz, et al. - 1990 |

26 |
A randomized approximation algorithm for probabilistic inference on Bayesian belief networks
- Chavez, Cooper
- 1990
(Show Context)
Citation Context ...known algorithms increases with the degree of connectivity of the network. For large multiply-connected networks approximation algorithms are often used, either based on stochastic simulation, e.g., (=-=Chavez and Cooper, 1990-=-; Chavez, 1990; Dagum and Chavez, 1991; Fung and Chang, 1990; Henrion, 1987; Pearl, 1987; Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (Cooper, 1984; Henr... |

24 | Causality from Probability - Spirtes, Glymour, et al. - 1990 |

19 |
On Information and Su ciency
- Kullback, Leibler
- 1951
(Show Context)
Citation Context ...ned tree network was the closest of all tree networks to the underlying distribution of the raw data. The criterion of \closeness" they used was the well-known Kullback-Leibler cross-entropy measure (=-=Kullback and Leibler, 1951-=-). The main restriction of this work was that it could only learn tree structures. Hence, if the raw data was the result of a non-tree structured distribution, the learned structure could be very inac... |

15 | Introduction to Algorithms. MIT-Press - Cormen, Leiserson, et al. - 1989 |

13 |
Arco1: an application of belief networks to the oil market
- Abramson
- 1991
(Show Context)
Citation Context ...s based on Bayesian networks have been constructed in a number of different application areas, ranging from medical diagnosis, e.g., (Beinlich et al., 1989), to reasoning about the oil market, e.g., (=-=Abramson, 1991-=-). Despite these successes, a major obstacle to using Bayesian networks lies in the difficulty of constructing them in complex domains. It can be a very time-consuming and error-prone task to specify ... |

9 |
Nestor: A computer-based medical diagnosis that integrates causal and probabilistic knowledge
- Cooper
- 1984
(Show Context)
Citation Context ...z and Cooper, 1990; Chavez, 1990; Dagum and Chavez, 1991; Fung and Chang, 1990; Henrion, 1987; Pearl, 1987; Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (=-=Cooper, 1984-=-; Henrion, 1990; Henrion, 1991; Peng and Reggia, 1987a; Peng and Reggia, 1987b). In practice these algorithms allow one to reason with more complex networks than can be handled by the exact algorithms... |

8 |
The minimum description length principle and its application to online learning of handprinted characters
- Gao, Li
- 1989
(Show Context)
Citation Context ... a total ordering. 3 The MDL Principle In this section we will discuss in greater detail Rissanen's Minimal Description Length (MDL) principle, a well studied formalism in learning theory, see e.g., (=-=Gao and Li, 1989-=-; Rissanen, 1978). The MDL principle is based on the idea that the best model of a collection of data items is the model that minimizes the sum of 1. the length of the encoding of the model, and 2. th... |

4 |
Towards efficient inference in multiply connected belief networks
- Henrion
- 1990
(Show Context)
Citation Context ...1990; Chavez, 1990; Dagum and Chavez, 1991; Fung and Chang, 1990; Henrion, 1987; Pearl, 1987; Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (Cooper, 1984; =-=Henrion, 1990-=-; Henrion, 1991; Peng and Reggia, 1987a; Peng and Reggia, 1987b). In practice these algorithms allow one to reason with more complex networks than can be handled by the exact algorithms. However, it h... |

4 |
Propagating uncertainty inBayesian networks by probabilistic logic sampling
- Henrion
- 1988
(Show Context)
Citation Context ...multiply-connected networks approximation algorithms are often used, either based on stochastic simulation, e.g., (Chavez and Cooper, 1990� Chavez, 1990� Dagum and Chavez, 1991� Fung and Chang, 1990� =-=Henrion, 1987-=-� Pearl, 1987� Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (Cooper, 1984� Henrion, 1990� Henrion, 1991� Peng and Reggia, 1987a� Peng and Reggia, 1987b). ... |

3 |
Architectures and approximation algorithms for probabilistic expert systems
- Chavez
- 1990
(Show Context)
Citation Context ...s with the degree of connectivity of the network. For large multiply-connected networks approximation algorithms are often used, either based on stochastic simulation, e.g., (Chavez and Cooper, 1990; =-=Chavez, 1990-=-; Dagum and Chavez, 1991; Fung and Chang, 1990; Henrion, 1987; Pearl, 1987; Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (Cooper, 1984; Henrion, 1990; Hen... |

2 |
Towards e cient inference in multiply connected belief networks
- Henrion
- 1990
(Show Context)
Citation Context ...1990� Chavez, 1990� Dagum and Chavez, 1991� Fung and Chang, 1990� Henrion, 1987� Pearl, 1987� Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (Cooper, 1984� =-=Henrion, 1990-=-� Henrion, 1991� Peng and Reggia, 1987a� Peng and Reggia, 1987b). In practice these algorithms allow one to reason with more complex networks than can be handled by the exact algorithms. However, it h... |