Results 1 
4 of
4
Model Selection by Normalized Maximum Likelihood
, 2005
"... The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a data set by extracting useful information in the data apart from random noise. The goal of model selection is to identify the model, from a set of candidate models, that permits the shortest description length (code) of the data. Since Rissanen originally formalized the problem using the crude ‘twopart code ’ MDL method in the 1970s, many significant strides have been made, especially in the 1990s, with the culmination of the development of the refined ‘universal code’ MDL method, dubbed Normalized Maximum Likelihood (NML). It represents an elegant solution to the model selection problem. The present paper provides a tutorial review on these latest developments with a special focus on NML. An application example of NML in cognitive modeling is also provided.
Bayesian Network Structure Learning using Factorized NML Universal Models
, 2008
"... Universal codes/models can be used for data compression and model selection by the minimum description length (MDL) principle. For many interesting model classes, such as Bayesian networks, the minimax regret optimal normalized maximum likelihood (NML) universal model is computationally very deman ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Universal codes/models can be used for data compression and model selection by the minimum description length (MDL) principle. For many interesting model classes, such as Bayesian networks, the minimax regret optimal normalized maximum likelihood (NML) universal model is computationally very demanding. We suggest a computationally feasible alternative to NML for Bayesian networks, the factorized NML universal model, where the normalization is done locally for each variable. This can be seen as an approximate sumproduct algorithm. We show that this new universal model performs extremely well in model selection, compared to the existing stateoftheart, even for small sample sizes.
Monte Carlo Estimation of Minimax Regret with an Application to MDL Model Selection
, 2008
"... Minimum description length (MDL) model selection, in its modern NML formulation, involves a model complexity term which is equivalent to minimax/maximin regret. When the data are discretevalued, the complexity term is a logarithm of a sum of maximized likelihoods over all possible datasets. Becaus ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Minimum description length (MDL) model selection, in its modern NML formulation, involves a model complexity term which is equivalent to minimax/maximin regret. When the data are discretevalued, the complexity term is a logarithm of a sum of maximized likelihoods over all possible datasets. Because the sum has an exponential number of terms, its evaluation is in many cases intractable. In the continuous case, the sum is replaced by an integral for which a closed form is available in only a few cases. We present an approach based on Monte Carlo sampling, which works for all model classes, and gives strongly consistent estimators of the minimax regret. The estimates convergence almost surely to the correct value with increasing number of iterations. For the important class of Markov models, one of the presented estimators is particularly efficient: in empirical experiments, accuracy that is sufficient for model selection is usually achieved already on the first iteration, even for long sequences.
Keep it Simple Stupid – On the Effect of LowerOrder Terms in BICLike Criteria
"... Abstract—We study BIClike model selection criteria. In particular, we approximate the lowerorder terms, which typically include the constant log ∫ √ det I(θ) dθ, where I(θ) is the Fisher information at parameter value θ. We observe that the constant can sometimes be a huge negative number that do ..."
Abstract
 Add to MetaCart
Abstract—We study BIClike model selection criteria. In particular, we approximate the lowerorder terms, which typically include the constant log ∫ √ det I(θ) dθ, where I(θ) is the Fisher information at parameter value θ. We observe that the constant can sometimes be a huge negative number that dominates the other terms in the criterion for moderate sample sizes. At least in the case of Markov sources, including the lowerorder terms in the criteria dramatically degrades model selection accuracy. A takehome lesson is to keep it simple. I.