Results 1 
7 of
7
Model Selection by Normalized Maximum Likelihood
, 2005
"... The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a ..."
Abstract

Cited by 23 (9 self)
 Add to MetaCart
The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a data set by extracting useful information in the data apart from random noise. The goal of model selection is to identify the model, from a set of candidate models, that permits the shortest description length (code) of the data. Since Rissanen originally formalized the problem using the crude ‘twopart code ’ MDL method in the 1970s, many significant strides have been made, especially in the 1990s, with the culmination of the development of the refined ‘universal code’ MDL method, dubbed Normalized Maximum Likelihood (NML). It represents an elegant solution to the model selection problem. The present paper provides a tutorial review on these latest developments with a special focus on NML. An application example of NML in cognitive modeling is also provided.
Bayesian Network Structure Learning using Factorized NML Universal Models
, 2008
"... Universal codes/models can be used for data compression and model selection by the minimum description length (MDL) principle. For many interesting model classes, such as Bayesian networks, the minimax regret optimal normalized maximum likelihood (NML) universal model is computationally very deman ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Universal codes/models can be used for data compression and model selection by the minimum description length (MDL) principle. For many interesting model classes, such as Bayesian networks, the minimax regret optimal normalized maximum likelihood (NML) universal model is computationally very demanding. We suggest a computationally feasible alternative to NML for Bayesian networks, the factorized NML universal model, where the normalization is done locally for each variable. This can be seen as an approximate sumproduct algorithm. We show that this new universal model performs extremely well in model selection, compared to the existing stateoftheart, even for small sample sizes.
Monte Carlo Estimation of Minimax Regret with an Application to MDL Model Selection
, 2008
"... Minimum description length (MDL) model selection, in its modern NML formulation, involves a model complexity term which is equivalent to minimax/maximin regret. When the data are discretevalued, the complexity term is a logarithm of a sum of maximized likelihoods over all possible datasets. Becaus ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Minimum description length (MDL) model selection, in its modern NML formulation, involves a model complexity term which is equivalent to minimax/maximin regret. When the data are discretevalued, the complexity term is a logarithm of a sum of maximized likelihoods over all possible datasets. Because the sum has an exponential number of terms, its evaluation is in many cases intractable. In the continuous case, the sum is replaced by an integral for which a closed form is available in only a few cases. We present an approach based on Monte Carlo sampling, which works for all model classes, and gives strongly consistent estimators of the minimax regret. The estimates convergence almost surely to the correct value with increasing number of iterations. For the important class of Markov models, one of the presented estimators is particularly efficient: in empirical experiments, accuracy that is sufficient for model selection is usually achieved already on the first iteration, even for long sequences.
Model Selection by Normalized Maximum Likelihood
"... cCorresponding Author The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on thei ..."
Abstract
 Add to MetaCart
(Show Context)
cCorresponding Author The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a data set by extracting useful information in the data apart from random noise. The goal of model selection is to identify the model, from a set of candidate models, that permits the shortest description length (code) of the data. Since Rissanen originally formalized the problem using the crude ‘twopart code ’ MDL method in the 1970s, many significant strides have been made, especially in the 1990s, with the culmination of the development of the refined ‘universal code ’ MDL method, dubbed Normalized Maximum Likelihood (NML). It represents an elegant solution to the model selection problem. The present paper provides a tutorial review on these latest developments with a special focus on NML. An application example of NML in cognitive modeling is also provided.
Keep it Simple Stupid – On the Effect of LowerOrder Terms in BICLike Criteria
"... Abstract—We study BIClike model selection criteria. In particular, we approximate the lowerorder terms, which typically include the constant log ∫ √ det I(θ) dθ, where I(θ) is the Fisher information at parameter value θ. We observe that the constant can sometimes be a huge negative number that do ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—We study BIClike model selection criteria. In particular, we approximate the lowerorder terms, which typically include the constant log ∫ √ det I(θ) dθ, where I(θ) is the Fisher information at parameter value θ. We observe that the constant can sometimes be a huge negative number that dominates the other terms in the criterion for moderate sample sizes. At least in the case of Markov sources, including the lowerorder terms in the criteria dramatically degrades model selection accuracy. A takehome lesson is to keep it simple. I.
On the Minimum Description Length Complexity of Multinomial Processing Tree Models
"... Multinomial processing tree (MPT) modeling is a statistical methodology that has been widely and successfully applied for measuring hypothesized latent cognitive processes in selected experimental paradigms. This paper concerns model complexity of MPT models. Complexity is a key and necessary conce ..."
Abstract
 Add to MetaCart
(Show Context)
Multinomial processing tree (MPT) modeling is a statistical methodology that has been widely and successfully applied for measuring hypothesized latent cognitive processes in selected experimental paradigms. This paper concerns model complexity of MPT models. Complexity is a key and necessary concept to consider in the evaluation and selection of quantitative models. A complex model with many parameters often overfits data beyond and above the underlying regularities, and therefore, should be appropriately penalized. It has been well established and demonstrated in multiple studies that in addition to the number of parameters, a model’s functional form, which refers to the way by which parameters are combined in the model equation, can also have significantly effects on complexity. Given that MPT models vary greatly in their functional forms (tree structures and parameter/category assignments), it would be of interest to evaluate their effects on complexity. Addressing this issue from the minimum description length (MDL) viewpoint, we prove a series of propositions concerning various ways in which functional form contributes to the complexity of MPT models. Computational issues of complexity are also discussed. COMPLEXITY OF MULTINOMIAL PROCESSING TREE MODELS 2
An Information Geometry Approach to Shape Density Minimum Description Length Model Selection
"... For advantages such as a richer representation power and inherent robustness to noise, probability density functions are becoming a staple for complex problems in shape analysis. We consider a principled and geometric approach to selecting the model order for a class of shape density models where ..."
Abstract
 Add to MetaCart
(Show Context)
For advantages such as a richer representation power and inherent robustness to noise, probability density functions are becoming a staple for complex problems in shape analysis. We consider a principled and geometric approach to selecting the model order for a class of shape density models where the squareroot of the distribution is expanded in an orthogonal series. The free parameters associated with these estimators can then be rigorously selected using the Minimum Description Length (MDL) criterion for model selection. Under these models, it is shown that the MDL has a closedform representation, atypical for most applications of MDL in density estimation. We provide a straightforward application of our derivations by using this closedfrom MDL criterion to select the optimal multiresolution level(s) for a class of squareroot, wavelet density estimators. Experimental evaluation of our technique is conducted on one and two dimensional density estimation problems in shape analysis, with comparative analysis against other popular model selection criteria such as Bayesian and Akaike information criteria. 1.