## Model Selection Criteria for Learning Belief Nets: An Empirical Comparison (2000)

### Cached

### Download Links

- [www.cs.ualberta.ca]
- [www.cs.ualberta.ca]
- DBLP

### Other Repositories/Bibliography

Venue: | In ICML’00 |

Citations: | 12 - 2 self |

### BibTeX

@INPROCEEDINGS{Allen00modelselection,

author = {Tim Van Allen and Russ Greiner},

title = {Model Selection Criteria for Learning Belief Nets: An Empirical Comparison},

booktitle = {In ICML’00},

year = {2000},

pages = {1047--1054}

}

### OpenURL

### Abstract

We are interested in the problem of learning the dependency structure of a belief net, which involves a trade-o between simplicity and goodness of t to the training data. We describe the results of an empirical comparison of three standard model selection criteria | viz., a Minimum Description Length criterion (MDL), Akaike's Information Criterion (AIC) and a Cross-Validation criterion | applied to this problem. Our results suggest that AIC and Cross-Validation are both good criteria for avoiding overtting, but MDL does not work well in this context. 1. Introduction In learning a model of a data-generating process from a random sample, a fundamental problem is nding the right balance between the complexity of the model and its goodness of t to the training data. A more complex model can usually achieve a closer t to the training data, but this may be because the model re- ects not just signicant regularities in the data but also minor variations due to random samp...

### Citations

7440 |
Probabilistie Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...sults suggest that AIC and XV are both good criteria for avoiding overtting, but MDL does not work well in this context. This report focuses on the challenge of learning the (Bayesian) belief net BN [=-=Pea88]-=- that has minimum KL-divergence [KL51] from the true distribution, D over a set of discrete variables X | i.e., the network that minimizes 1 info( BN ; D ) = X x PD ( X = x ) log PBN ( X = x ) from as... |

2689 | Estimating the dimension of a model - Schwarz - 1978 |

1132 | A Bayesian method for the induction of probabilistic networks from data - Cooper, Herskovits - 1992 |

895 | A tutorial on learning with Bayesian Networks - Heckerman - 1995 |

812 |
Cross-validatory choice and assessment of statistical predictions
- Stone
(Show Context)
Citation Context ...nother approach is to use only part of the sample to set the parameters, and use the rest of the sample to get an unbiased estimate of the true error. This latter approach is called Cross-Validation (=-=Stone, 197-=-4). In this paper, we compare these three model selection criteria, in the context of learning belief nets (dened in Section 2.1). We had two goals in carrying out this research. Thesrst was tosnd a g... |

325 |
Stochastic Complexity in Statistical Inquiry
- Rissanen
(Show Context)
Citation Context ...lty to the training error so that more complex models have tost the data considerably better than smaller models in order to outscore them. Two standard criteria are Minimum Description Length (MDL) (=-=Rissanen, 1989-=-) and Akaike's Information Criterion (AIC) (Bozdogan, 1987). Another approach is to use only part of the sample to set the parameters, and use the rest of the sample to get an unbiased estimate of the... |

244 | Learning Bayesian networks with local structure - Friedman, Goldszmidt - 1996 |

227 |
Model Selection and the Akaike's Information Criterion (AIC): The General Theory and its Analytic Extensions
- Bozdogan
- 1987
(Show Context)
Citation Context ... tost the data considerably better than smaller models in order to outscore them. Two standard criteria are Minimum Description Length (MDL) (Rissanen, 1989) and Akaike's Information Criterion (AIC) (=-=Bozdogan, 1987-=-). Another approach is to use only part of the sample to set the parameters, and use the rest of the sample to get an unbiased estimate of the true error. This latter approach is called Cross-Validati... |

199 | Learning Bayesian belief networks. An approach based on the MDL principle - Lam, Bacchus - 1994 |

147 |
Model Selection
- Linhardt, Zucchini
- 1986
(Show Context)
Citation Context ...ples of 1 datum each. This family of methods goes under the generic name of Cross-Validation, being respectively called \simple", \k-fold", and \leave-one-out" CrossValidation (Stone, 1=-=974; Linhart & Zucchini, 1986-=-). For our experiments, we used the simple version, dividing the sample into two equal size subsamples, one for training and one for validation. XV (h; s) = info(h(s 1 ); s 2 ) where s has been split ... |

110 | An Experimental and Theoretical Comparison of Model Selection Methods - Kearns, Mansour, et al. - 1997 |

50 | On the sample complexity of learning Bayesian networks - Friedman, Yakhini - 1996 |

43 |
On information and su"ciency
- Kullback, Leibler
- 1951
(Show Context)
Citation Context ...good criteria for avoiding overtting, but MDL does not work well in this context. This report focuses on the challenge of learning the (Bayesian) belief net BN [Pea88] that has minimum KL-divergence [=-=KL51]-=- from the true distribution, D over a set of discrete variables X | i.e., the network that minimizes 1 info( BN ; D ) = X x PD ( X = x ) log PBN ( X = x ) from asxed training sample s drawn iid from D... |

40 |
Stochastic Complexity (with discussion
- Rissanen
- 1987
(Show Context)
Citation Context ...er to outscore them. Two standard criteria are Minimum Description Length (MDL) which seeks the h that minimizes MDL(h; s) = info( h(s); s ) + k log jsj 2 jsj where k is the number of parameters of h =-=[Ris87]-=- Akaike's Information Criterion (AIC) which uses MDL(h; s) = info( h(s); s ) + k log e jsj where the log e is simply to convert from nats to bits [Boz87]. Another approach | called Cross-Validation (X... |

3 |
On information and su#ciency
- Leibler
- 1951
(Show Context)
Citation Context ... optimal code based on the distribution given by h. When s is a sequence of i.i.d. instances x 1 ; x 2 ; : : : ; xm , of X , then: DL(s; h) = m X i=1 log P(X = x i j T = h): KL-divergence (Kullback & =-=Leibler, 1951) is -=-a standard measure of error for distribution learning. If t is the \true" model, and h is a hypothesized model, the KL-divergence of h from t is given by: KLD(t k h) = X x P(X = x j T = t) log P(... |