## Sequential and Factorized NML models

### BibTeX

@MISC{Silander_sequentialand,

author = {Tomi Silander and Teemu Roos and Petri Myllymäki},

title = {Sequential and Factorized NML models},

year = {}

}

### OpenURL

### Abstract

Bayesian networks are among most popular model classes for discrete vector-valued i.i.d data. Currently the most popular model selection criterion for Bayesian networks follows Bayesian paradigm. However, this method has recently been reported to be very sensitive to the choice of prior hyper-parameters [1]. On the other hand, the general model selection criteria, AIC [2] and BIC [3], are derived through asymptotics

### Citations

2320 |
Estimating the dimension of a model
- Schwarz
- 1978
(Show Context)
Citation Context ...an paradigm. However, this method has recently been reported to be very sensitive to the choice of prior hyper-parameters [1]. On the other hand, the general model selection criteria, AIC [2] and BIC =-=[3]-=-, are derived through asymptotics and their behavior is suboptimal for small sample sizes. This extended abstract is based on an unpublished manuscript [4] in which we introduce a new effective scorin... |

1242 |
Information theory and an extension of the maximum likelihood principle
- Akaike
- 1973
(Show Context)
Citation Context ...llows Bayesian paradigm. However, this method has recently been reported to be very sensitive to the choice of prior hyper-parameters [1]. On the other hand, the general model selection criteria, AIC =-=[2]-=- and BIC [3], are derived through asymptotics and their behavior is suboptimal for small sample sizes. This extended abstract is based on an unpublished manuscript [4] in which we introduce a new effe... |

905 | Learning Bayesian networks: the combination of knowledge and statistical
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...marginal likelihood [5]. However, all popular Bayesian network selection criteria S(G,D) feature a convenient decomposability SCORE(G,D) = that makes implementing a heuristic search for models easier =-=[6]-=-. n∑ i=1 S(Di,DGi ) (3) Many popular scoring functions avoid overfitting by balancing the fit to the data and the complexity of the model. A common form of this idea can be expressed as SCORE(G,D) = l... |

275 |
Fisher information and stochastic complexity
- Rissanen
- 1996
(Show Context)
Citation Context ...s 4 FACTORIZED NML θ BD ijk = ∑ ri k ′ =1 Nijk + αijk [Nijk ′ + αijk ′]. (7) The factorized normalized maximum likelihood (fNML) score is based on the normalized maximum likelihood (NML) distribution =-=[7, 8]-=- ˆP(D | M) PNML(D | M) = ∑ D ′ ˆ P(D ′ , (8) | M) where the normalization is over all data sets D ′ of a fixed size N. The log of the normalizing factor is called the parametric complexity or the regr... |

25 |
Learning Bayesian Networks Is Np-complete. Learning from Data, volume 112
- Chickering
- 1996
(Show Context)
Citation Context ...n network models for n variables is super exponential, and the model selection task has been shown to be NP-hard for practically all model selection criteria such as AIC, BIC, and marginal likelihood =-=[5]-=-. However, all popular Bayesian network selection criteria S(G,D) feature a convenient decomposability SCORE(G,D) = that makes implementing a heuristic search for models easier [6]. n∑ i=1 S(Di,DGi ) ... |

24 |
A lineartime algorithm for computing the multinomial stochastic complexity
- Kontkanen, Myllymäki
- 2007
(Show Context)
Citation Context ...alizing sum goes over all the possible Di-column vectors of length N, i.e., D ′ i ∈ {1,...,ri} N . Using recently discovered methods for calculating the regret for a single r-ary multinomial variable =-=[9]-=- the fNML-criterion can be calculated as efficiently as other decomposable scores. For predictive purposes its is natural to parameterize the model learned with the fNML-score by predictive conditiona... |

7 |
On sensitivity of the MAP Bayesian network structure to the equivalent sample size parameter
- Silander, Kontkanen, et al.
- 2007
(Show Context)
Citation Context ...he most popular model selection criterion for Bayesian networks follows Bayesian paradigm. However, this method has recently been reported to be very sensitive to the choice of prior hyper-parameters =-=[1]-=-. On the other hand, the general model selection criteria, AIC [2] and BIC [3], are derived through asymptotics and their behavior is suboptimal for small sample sizes. This extended abstract is based... |

5 |
Conditional NML models
- Rissanen, Roos
- 2007
(Show Context)
Citation Context ...on can be calculated as efficiently as other decomposable scores. For predictive purposes its is natural to parameterize the model learned with the fNML-score by predictive conditional NML parameters =-=[10]-=- θijk = ∑ ri k ′ =1 e(Nijk)(Nijk + 1) (9) e(Nijk ′)(Nijk′ + 1), where e(n) = ( n+1 n )n . Empirical tests with real data sets indicate that the fNML selection criterion performs very well in a code le... |

3 | Factorized normalized maximum likelihood criterion for learning bayesian network structures,” Submitted for PGM08
- Silander, Roos, et al.
- 2008
(Show Context)
Citation Context ...al model selection criteria, AIC [2] and BIC [3], are derived through asymptotics and their behavior is suboptimal for small sample sizes. This extended abstract is based on an unpublished manuscript =-=[4]-=- in which we introduce a new effective scoring criterion for learning Bayesian network structures, the factorized normalized maximum likelihood (fNML). This score features no tunable parameters thus a... |