Results 1 
6 of
6
Model Selection by Normalized Maximum Likelihood
, 2005
"... The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a data set by extracting useful information in the data apart from random noise. The goal of model selection is to identify the model, from a set of candidate models, that permits the shortest description length (code) of the data. Since Rissanen originally formalized the problem using the crude ‘twopart code ’ MDL method in the 1970s, many significant strides have been made, especially in the 1990s, with the culmination of the development of the refined ‘universal code’ MDL method, dubbed Normalized Maximum Likelihood (NML). It represents an elegant solution to the model selection problem. The present paper provides a tutorial review on these latest developments with a special focus on NML. An application example of NML in cognitive modeling is also provided.
An empirical study of MDL model selection with infinite parametric complexity
 J. Mathematical Psychology
, 2006
"... Parametric complexity is a central concept in MDL model selection. In practice it often turns out to be infinite, even for quite simple models such as the Poisson and Geometric families. In such cases, MDL model selection as based on NML and Bayesian inference based on Jeffreys ’ prior can not be us ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Parametric complexity is a central concept in MDL model selection. In practice it often turns out to be infinite, even for quite simple models such as the Poisson and Geometric families. In such cases, MDL model selection as based on NML and Bayesian inference based on Jeffreys ’ prior can not be used. Several ways to resolve this problem have been proposed. We conduct experiments to compare and evaluate their behaviour on small sample sizes. We find interestingly poor behaviour for the plugin predictive code; a restricted NML model performs quite well but it is questionable if the results validate its theoretical motivation. The Bayesian model with the improper Jeffreys ’ prior is the most dependable. 1
Prequential plugin codes that achieve optimal redundancy rates even if the model is wrong. arXiv:1002.0757
, 2010
"... Abstract — We analyse the prequential plugin codes relative to oneparameter exponential families M. We show that if data are sampled i.i.d. from some distribution outside M, then the redundancy of any plugin prequential code grows at rate larger than 1 ln n in the worst case. This means that plug ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Abstract — We analyse the prequential plugin codes relative to oneparameter exponential families M. We show that if data are sampled i.i.d. from some distribution outside M, then the redundancy of any plugin prequential code grows at rate larger than 1 ln n in the worst case. This means that plugin codes, such 2 as the RissanenDawid ML code, may behave inferior to other important universal codes such as the 2part MDL, Shtarkov and Bayes codes, for which the redundancy is always 1 ln n + O(1). 2 However, we also show that a slight modification of the ML plugin code, “almost ” in the model, does achieve the optimal redundancy even if the the true distribution is outside M. I.
Following the Flattened Leader
"... We analyze the regret, measured in terms of log loss, of the maximum likelihood (ML) sequential prediction strategy. This “follow the leader ” strategy also defines one of the main versions of Minimum Description Length model selection. We proved in prior work for single parameter exponential family ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We analyze the regret, measured in terms of log loss, of the maximum likelihood (ML) sequential prediction strategy. This “follow the leader ” strategy also defines one of the main versions of Minimum Description Length model selection. We proved in prior work for single parameter exponential family models that (a) in the misspecified case, the redundancy of followtheleader is not 1 2 log n+O(1), as it is for other universal prediction strategies; as such, the strategy also yields suboptimal individual sequence regret and inferior model selection performance; and (b) that in general it is not possible to achieve the optimal redundancy when predictions are constrained to the distributions in the considered model. Here we describe a simple “flattening” of the sequential ML and related predictors, that does achieve the optimal worst case individual sequence regret of (k/2)log n + O(1) for k parameter exponential family models for bounded outcome spaces; for unbounded spaces, we provide almostsure results. Simulations show a major improvement of the resulting model selection criterion.
Maximum Likelihood vs. Sequential Normalized Maximum Likelihood in Online Density Estimation
"... The paper considers sequential prediction of individual sequences with log loss (online density estimation) using an exponential family of distributions. We first analyze the regret of the maximum likelihood (“follow the leader”) strategy. We find that this strategy is (1) suboptimal and (2) require ..."
Abstract
 Add to MetaCart
The paper considers sequential prediction of individual sequences with log loss (online density estimation) using an exponential family of distributions. We first analyze the regret of the maximum likelihood (“follow the leader”) strategy. We find that this strategy is (1) suboptimal and (2) requires an additional assumption about boundedness of the data sequence. We then show that both problems can be be addressed by adding the currently predicted outcome to the calculation of the maximum likelihood, followed by normalization of the distribution. The strategy obtained in this way is known in the literature as the sequential normalized maximum likelihood or laststep minimax strategy. We show for the first time that for general exponential families, the regret is bounded by the familiar (k/2) log n and thus optimal up to O(1). We also show the relationship to the Bayes strategy with Jeffreys ’ prior. 1