## K.B.: Learning hybrid Bayesian networks by MML

### Cached

### Download Links

Venue: | Proc. 19th Australian Joint Conf. on AI, LNAI (2006 |

Citations: | 2 - 2 self |

### BibTeX

@INPROCEEDINGS{O’donnell_k.b.:learning,

author = {Rodney T. O’donnell and Lloyd Allison and Kevin B. Korb},

title = {K.B.: Learning hybrid Bayesian networks by MML},

booktitle = {Proc. 19th Australian Joint Conf. on AI, LNAI (2006},

year = {}

}

### OpenURL

### Abstract

Abstract. We use a Markov Chain Monte Carlo (MCMC) MML algorithm to learn hybrid Bayesian networks from observational data. Hybrid networks represent local structure, using conditional probability tables (CPT), logit models, decision trees or hybrid models, i.e., combinations of the three. We compare this method with alternative local structure learning algorithms using the MDL and BDe metrics. Results are presented for both real and artificial data sets. Hybrid models compare favourably to other local structure learners, allowing simple representations given limited data combined with richer representations given massive data. 1

### Citations

7114 |
The mathematical theory of communication
- Shannon
- 1949
(Show Context)
Citation Context ...ating a fully parameterized model, using Bayes’s Theorem: [9] P (H&D) =P (H).P (D|H) =P (D).P (H|D) for a hypothesis (e.g., a Bayesian network), H, and data, D, and on Shannon’s law for optimal codes =-=[10]-=- msgLen(E) =− log P (E) requiring an event, E, to have a code of length − log P (E). Lengths can be measured in bits (log base 2) or, if mathematically convenient, in nits (natural logs). In any case,... |

1243 |
Modeling by shortest data description
- Rissanen
- 1978
(Show Context)
Citation Context ...tions exist for many useful problems [12,13,14]. The work described here relies on stochastic MML approximations [3]. Minimum description length (MDL) inference was developed as an alternative to MML =-=[15]-=- and uses the same message length paradigm. MDL, however, favours universal priors and the selection of a model class rather than a parameterized model. A detailed comparison of MML and MDL has been g... |

1141 | A Bayesian method for the induction of probabilistic networks from data, Machine Learning 9
- Cooper, Herskovits
- 1992
(Show Context)
Citation Context ...uristicsearchtheDAGwiththeshortest description length is accepted as the best model. 4.2 BDe Scheme i=0 Heckerman et al.’s BDe metric [6], based on a previous Bayesian metric of Cooper and Herskovits =-=[18]-=-, has been augmented with decision trees by Friedman [1] whose implementation is used for comparison in section 7. The suggested prior isbasedonaneditdistancefromanexpertsuppliednetwork.Friedmanusesa ... |

955 | Learning Bayesian networks: The combination of knowledge and statistical data, Machine Learning 20
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...ag Berlin Heidelberg 2006sLearning Hybrid Bayesian Networks by MML 193 on a node-by-node basis (hybrid models). We compare our approach with Nir Friedman’s implementation[1] of the Bayesian BDe metric=-=[6]-=- and the MDL[7] metric, which were also applied to learning local structure using decision trees, although without hybrid model learning. 0.9/0.1 B A C 0.2/0.8 0.8/0.2 0.3/0.7 A B C P(x|A,B,C) 0 0 1 0... |

451 |
1764): An Essay Towards Solving a Problem in the Doctrine of Chances, in
- Bayes
- 1958
(Show Context)
Citation Context ...ly adapting representational complexity to the amount of data available. 2 Metrics Minimum Message Length (MML) inference is a method of estimating a fully parameterized model, using Bayes’s Theorem: =-=[9]-=- P (H&D) =P (H).P (D|H) =P (D).P (H|D) for a hypothesis (e.g., a Bayesian network), H, and data, D, and on Shannon’s law for optimal codes [10] msgLen(E) =− log P (E) requiring an event, E, to have a ... |

324 |
An information measure for classification
- Wallace, Boulton
- 1968
(Show Context)
Citation Context ...l see how this is useful for Bayesian networks. Strict MML is computationally infeasible for all but the simplest problems [11], but practical, efficient approximations exist for many useful problems =-=[12,13,14]-=-. The work described here relies on stochastic MML approximations [3]. Minimum description length (MDL) inference was developed as an alternative to MML [15] and uses the same message length paradigm.... |

247 | Learning Bayesian Networks with Local Structure. Learning and Inference in Graphical Models
- Friedman, Goldszmidt
- 1998
(Show Context)
Citation Context ...f non-CPT models quickly becomes apparent as more parent variables are added. The advantage of using local structures that are more economical than CPTs in Bayesian networks has been clearly shown in =-=[1,2]-=- and elsewhere. Here we apply CaMML (Causal discovery via MML) [3,4,5] to the learning of local structure in Bayesian networks in an especially flexible way, using either full CPTs, logit models or de... |

94 | A transformational characterization of equivalent Bayesian network structures
- Chickering
- 1995
(Show Context)
Citation Context ..., if any; we use conditional probability tables (CPTs), logit models and decision trees [8] for this. An important concept when dealing with Bayesian nets is the ‘statistical equivalence class’ (SEC) =-=[17]-=-. Two DAGs in the same equivalence class can be parameterized to give an identical joint probability distribution; there is no way to distinguish between the two using only observational data over the... |

93 |
Statistical and Inductive Inference by Minimum Message Length
- Wallace
- 2005
(Show Context)
Citation Context ...l see how this is useful for Bayesian networks. Strict MML is computationally infeasible for all but the simplest problems [11], but practical, efficient approximations exist for many useful problems =-=[12,13,14]-=-. The work described here relies on stochastic MML approximations [3]. Minimum description length (MDL) inference was developed as an alternative to MML [15] and uses the same message length paradigm.... |

89 |
Coding decision trees
- Wallace, Patrick
- 1991
(Show Context)
Citation Context ...[2]. Here we extend CaMML to decision trees, making it possible to compare the results effectively with local structure learning elsewhere. This requires a new coding technique, distinct from that of =-=[8]-=-, since the decision trees involved in local structure have unique constraints. The hybrid learning likewise extends earlier work, allowing all varieties of local structure to be represented in forms ... |

32 |
Bayesian Artificial Intelligence
- Korb, Nicholson
- 2003
(Show Context)
Citation Context ...re added. The advantage of using local structures that are more economical than CPTs in Bayesian networks has been clearly shown in [1,2] and elsewhere. Here we apply CaMML (Causal discovery via MML) =-=[3,4,5]-=- to the learning of local structure in Bayesian networks in an especially flexible way, using either full CPTs, logit models or decision trees, or any combination of these determined A. Sattar and B.H... |

32 |
MDL and MML: Similarities and differences
- Baxter, Oliver
(Show Context)
Citation Context ...aradigm. MDL, however, favours universal priors and the selection of a model class rather than a parameterized model. A detailed comparison of MML and MDL has been given elsewhere by Baxter and Oliver=-=[16]-=-. BDe has its roots in Bayesian statistics. Like MDL, BDe attempts to find a model class rather than a parameterized model. BDe integrates over its prior on continuous parameters, whereas MML tries to... |

28 | Learning linear causal models by MML sampling - Wallace, Korb - 1999 |

14 |
The complexity of strict minimum message length inference
- Farr, Wallace
(Show Context)
Citation Context ...o discrete, parameters may be stated with less than maximum precision; we will see how this is useful for Bayesian networks. Strict MML is computationally infeasible for all but the simplest problems =-=[11]-=-, but practical, efficient approximations exist for many useful problems [12,13,14]. The work described here relies on stochastic MML approximations [3]. Minimum description length (MDL) inference was... |

11 |
Learning Bayesian networks with restricted causal interactions
- Neil, Wallace, et al.
- 1999
(Show Context)
Citation Context ...late Decision Tree costs is obviously a good thing to do, but it becomes especially important when dealing with hybrid models so that CPTs and trees can compete fairly. 5.3 Logit We follow Neil et al.=-=[2]-=- in supporting MML logit models of local structure. A CPT treats each parameter independently. It is common, however, for there to be local structure, with some parameters dependent upon others. A fir... |

10 |
Models for machine learning and data mining in functional programming
- Allison
- 2005
(Show Context)
Citation Context ...l see how this is useful for Bayesian networks. Strict MML is computationally infeasible for all but the simplest problems [11], but practical, efficient approximations exist for many useful problems =-=[12,13,14]-=-. The work described here relies on stochastic MML approximations [3]. Minimum description length (MDL) inference was developed as an alternative to MML [15] and uses the same message length paradigm.... |

3 |
F.: Learning Bayesian belief networks
- Lam, Bacchus
- 1994
(Show Context)
Citation Context ...lberg 2006sLearning Hybrid Bayesian Networks by MML 193 on a node-by-node basis (hybrid models). We compare our approach with Nir Friedman’s implementation[1] of the Bayesian BDe metric[6] and the MDL=-=[7]-=- metric, which were also applied to learning local structure using decision trees, although without hybrid model learning. 0.9/0.1 B A C 0.2/0.8 0.8/0.2 0.3/0.7 A B C P(x|A,B,C) 0 0 1 0.9 0 0 1 0.9 0 ... |

2 |
Causal discovery with prior information
- ODonnell, Nicholson, et al.
- 2006
(Show Context)
Citation Context ...re added. The advantage of using local structures that are more economical than CPTs in Bayesian networks has been clearly shown in [1,2] and elsewhere. Here we apply CaMML (Causal discovery via MML) =-=[3,4,5]-=- to the learning of local structure in Bayesian networks in an especially flexible way, using either full CPTs, logit models or decision trees, or any combination of these determined A. Sattar and B.H... |