Results 1 
6 of
6
A tutorial introduction to the minimum description length principle
 in Advances in Minimum Description Length: Theory and Applications. 2005
"... ..."
Model Selection by Normalized Maximum Likelihood
, 2005
"... The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a data set by extracting useful information in the data apart from random noise. The goal of model selection is to identify the model, from a set of candidate models, that permits the shortest description length (code) of the data. Since Rissanen originally formalized the problem using the crude ‘twopart code ’ MDL method in the 1970s, many significant strides have been made, especially in the 1990s, with the culmination of the development of the refined ‘universal code’ MDL method, dubbed Normalized Maximum Likelihood (NML). It represents an elegant solution to the model selection problem. The present paper provides a tutorial review on these latest developments with a special focus on NML. An application example of NML in cognitive modeling is also provided.
MDL histogram density estimation
 In Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (to appear
, 2007
"... We regard histogram density estimation as a model selection problem. Our approach is based on the informationtheoretic minimum description length (MDL) principle, which can be applied for tasks such as data clustering, density estimation, image denoising and model selection in general. MDLbased mod ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
We regard histogram density estimation as a model selection problem. Our approach is based on the informationtheoretic minimum description length (MDL) principle, which can be applied for tasks such as data clustering, density estimation, image denoising and model selection in general. MDLbased model selection is formalized via the normalized maximum likelihood (NML) distribution, which has several desirable optimality properties. We show how this framework can be applied for learning generic, irregular (variablewidth bin) histograms, and how to compute the NML model selection criterion efficiently. We also derive a dynamic programming algorithm for finding both the MDLoptimal bin count and the cut point locations in polynomial time. Finally, we demonstrate our approach via simulation tests. 1
NML Computation Algorithms for TreeStructured Multinomial Bayesian Networks
, 2007
"... Typical problems in bioinformatics involve large discrete datasets. Therefore, in order to apply statistical methods in such domains, it is important to develop efficient algorithms suitable for discrete data. The minimum description length (MDL) principle is a theoretically wellfounded, general fr ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
Typical problems in bioinformatics involve large discrete datasets. Therefore, in order to apply statistical methods in such domains, it is important to develop efficient algorithms suitable for discrete data. The minimum description length (MDL) principle is a theoretically wellfounded, general framework for performing statistical inference. The mathematical formalization of MDL is based on the normalized maximum likelihood (NML) distribution, which has several desirable theoretical properties. In the case of discrete data, straightforward computation of the NML distribution requires exponential time with respect to the sample size, since the definition involves a sum over all the possible data samples of a fixed size. In this paper, we first review some existing algorithms for efficient NML computation in the case of multinomial and naive Bayes model families. Then we proceed by extending these algorithms to more complex, treestructured Bayesian networks.
MMLD Inference of the Poisson and Geometric Models
"... Abstract. This paper examines MMLDbased approximations for the inference of two univariate probability densities: the geometric distribution, parameterised in terms of a mean parameter, and the Poisson distribution. The focus is on both parameter estimation and hypothesis testing properties of the ..."
Abstract
 Add to MetaCart
Abstract. This paper examines MMLDbased approximations for the inference of two univariate probability densities: the geometric distribution, parameterised in terms of a mean parameter, and the Poisson distribution. The focus is on both parameter estimation and hypothesis testing properties of the approximation. The new parameter estimators are compared to the MML87 estimators in terms of bias, squared error risk and KL divergence risk. Empirical experiments demonstrate that the MMLD parameter estimates are more biased, and feature higher squared error risk than the corresponding MML87 estimators. In contrast, the two criteria are virtually indistinguishable in the hypothesis testing experiment. 1
An Empirical Comparison of NML Clustering Algorithms
"... Abstract—Clustering can be defined as a data assignment problem where the goal is to partition the data into nonhierarchical groups of items. In our previous work, we suggested an informationtheoretic criterion, based on the minimum description length (MDL) principle, for defining the goodness of a ..."
Abstract
 Add to MetaCart
Abstract—Clustering can be defined as a data assignment problem where the goal is to partition the data into nonhierarchical groups of items. In our previous work, we suggested an informationtheoretic criterion, based on the minimum description length (MDL) principle, for defining the goodness of a clustering of data. The basic idea behind this framework is to optimize the total code length over the data by encoding together data items belonging to the same cluster. In this setting efficient coding is possible only by exploiting underlying regularities that are common to the members of a cluster, which means that this approach produces an implicitly defined similarity metric between the data items. Formally the global code length criterion to be optimized is defined by using the intuitively appealing universal normalized maximum likelihood (NML) code which has been shown to produce optimal code lengths in the worst case sense. In this paper, we focus on the optimization aspect of the clustering problem, and study five algorithms that can be used for efficiently searching the exponentiallysized clustering space. As the suggested NML clustering criterion can be used for comparing clusterings with different number of cluster labels, the number of clusters is not known beforehand and determining it is part of the optimization process. In the empirical part of the paper we compare the performance of the suggested algorithms in the task of optimizing the NML clustering criterion using several realworld datasets. Index Terms—minimum description length, normalized maximum likelihood, clustering, EM algorithm, Kmeans algorithm I.