Results 1 
7 of
7
Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity
 IEEE Transactions on Information Theory
, 1998
"... The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic conditi ..."
Abstract

Cited by 79 (7 self)
 Add to MetaCart
The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be applied is encapsulated as the Fundamental Inequality, which in broad terms states that the principle is valid when the data are random, relative to every contemplated hypothesis and also these hypotheses are random relative to the (universal) prior. Basically, the ideal principle states that the prior probability associated with the hypothesis should be given by the algorithmic universal probability, and the sum of the log universal probability of the model plus the log of the probability of the data given the model should be minimized. If we restrict the model class to the finite sets then application of the ideal principle turns into Kolmogorov's mi...
Hypothesis Selection and Testing by the MDL Principle
 The Computer Journal
, 1998
"... ses where the variance is known or taken as a parameter. 1. INTRODUCTION Although the term `hypothesis' in statistics is synonymous with that of a probability `model' as an explanation of data, hypothesis testing is not quite the same problem as model selection. This is because usually ..."
Abstract

Cited by 64 (3 self)
 Add to MetaCart
(Show Context)
ses where the variance is known or taken as a parameter. 1. INTRODUCTION Although the term `hypothesis' in statistics is synonymous with that of a probability `model' as an explanation of data, hypothesis testing is not quite the same problem as model selection. This is because usually a particular hypothesis, called the `null hypothesis', has already been selected as a favorite model and it will be abandoned in favor of another model only when it clearly fails to explain the currently available data. In model selection, by contrast, all the models considered are regarded on the same footing and the objective is simply to pick the one that best explains the data. For the Bayesians certain models may be favored in terms of a prior probability, but in the minimum description length (MDL) approach to be outlined below, prior knowledge of any kind is to be used in selecting the tentative models, which in the end, unlike in the Bayesians' case, can and will be fitted to data
Kolmogorov’s structure functions and model selection
 IEEE Trans. Inform. Theory
"... approach to statistics and model selection. Let data be finite binary strings and models be finite sets of binary strings. Consider model classes consisting of models of given maximal (Kolmogorov) complexity. The “structure function ” of the given data expresses the relation between the complexity l ..."
Abstract

Cited by 44 (15 self)
 Add to MetaCart
approach to statistics and model selection. Let data be finite binary strings and models be finite sets of binary strings. Consider model classes consisting of models of given maximal (Kolmogorov) complexity. The “structure function ” of the given data expresses the relation between the complexity level constraint on a model class and the least logcardinality of a model in the class containing the data. We show that the structure function determines all stochastic properties of the data: for every constrained model class it determines the individual bestfitting model in the class irrespective of whether the “true ” model is in the model class considered or not. In this setting, this happens with certainty, rather than with high probability as is in the classical case. We precisely quantify the goodnessoffit of an individual model with respect to individual data. We show that—within the obvious constraints—every graph is realized by the structure function of some data. We determine the (un)computability properties of the various functions contemplated and of the “algorithmic minimal sufficient statistic.” Index Terms— constrained minimum description length (ML) constrained maximum likelihood (MDL) constrained bestfit model selection computability lossy compression minimal sufficient statistic nonprobabilistic statistics Kolmogorov complexity, Kolmogorov Structure function prediction sufficient statistic
Algorithmic Complexity and Stochastic Properties of Finite Binary Sequences
, 1999
"... This paper is a survey of concepts and results related to simple Kolmogorov complexity, prefix complexity and resourcebounded complexity. We also consider a new type of complexity statistical complexity closely related to mathematical statistics. Unlike other discoverers of algorithmic complexit ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
This paper is a survey of concepts and results related to simple Kolmogorov complexity, prefix complexity and resourcebounded complexity. We also consider a new type of complexity statistical complexity closely related to mathematical statistics. Unlike other discoverers of algorithmic complexity, A. N. Kolmogorov's leading motive was developing on its basis a mathematical theory more adequately substantiating applications of probability theory, mathematical statistics and information theory. Kolmogorov wanted to deduce properties of a random object from its complexity characteristics without use of the notion of probability. In the first part of this paper we present several results in this direction. Though the subsequent development of algorithmic complexity and randomness was different, algorithmic complexity has successful applications in a traditional probabilistic framework. In the second part of the paper we consider applications to the estimation of parameters and the definition of Bernoulli sequences. All considerations have finite combinatorial character. 1.
Redundancy of universal coding, Kolmogorov complexity, and Hausdorff dimension (Abstract)
, 2003
"... In [1, 4, 5], under a suitable condition, it is shown that asymptotic codelengths of sequences generated by a parametric model Pθ is given as follows; − log P ˆ θ + k 2 log n + o(log n), Pθ − a.e., (1) where ˆ θ is the maximumlikelihood estimator, k is the dimension of parameter space, n is the sam ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
In [1, 4, 5], under a suitable condition, it is shown that asymptotic codelengths of sequences generated by a parametric model Pθ is given as follows; − log P ˆ θ + k 2 log n + o(log n), Pθ − a.e., (1) where ˆ θ is the maximumlikelihood estimator, k is the dimension of parameter space, n is the sample size, and the base of log is 2. In view of the proof of Rissanen [4], the second term of (1) is the description of the maximum likelihood estimator ˆ θ with (log n)/2 bit accuracy, therefore, it is natural to study a universal coding obtained by compressing the description of the maximum likelihood estimator. In fact, Vovk [6] studied a universal coding for Bernoulli model with codelength inf θ − log Pθ + K(θn), (2) where θ ranges over computable real, and K is the prefix Kolmogorov complexity [2, 3]. In order to study the code (2), we study asymptotic expansion of Bayes mixture � Pθdm(θ) with two kind of priors. One is a prior that is singular with respect to Lebesgue measure, and another is a priori probability on Euclidean space. By considering prior of Bayes mixture to be a priori probability on Euclidean space, we extend the universal coding (2) to multidimensional parameter space, and show a universal coding whose codelength is − log P ˆ θ + k� K(description of θ j upto (log n)/2 bit n)+O(log log n), Pθ−a.e., j=1 (3) where θ = (θ 1, · · · , θ k). On the other hand Rissanen [5] showed that the codelength (1) is optimal up to O(log n) term except for parameters in a set of
On the Convergence Speed of MDL Predictions for Bernoulli Sequences
, 2004
"... We consider the Minimum Description Length principle for online sequence prediction. If the underlying model class is discrete, then the total expected square loss is a particularly interesting performance measure: (a) this quantity is bounded, implying convergence with probability one, and (b) it a ..."
Abstract
 Add to MetaCart
(Show Context)
We consider the Minimum Description Length principle for online sequence prediction. If the underlying model class is discrete, then the total expected square loss is a particularly interesting performance measure: (a) this quantity is bounded, implying convergence with probability one, and (b) it additionally specifies a rate of convergence. Generally, for MDL only exponential loss bounds hold, as opposed to the linear bounds for a Bayes mixture. We show that this is even the case if the model class contains only Bernoulli distributions. We derive a new upper bound on the prediction error for countable Bernoulli classes. This implies a small bound (comparable to the one for Bayes mixtures) for certain important model classes. The results apply to many Machine Learning tasks including classification and hypothesis testing. We provide arguments that our theorems generalize to countable classes of i.i.d. models.
MDL Convergence Speed for Bernoulli Sequences ∗
, 2006
"... The Minimum Description Length principle for online sequence estimation/prediction in a proper learning setup is studied. If the underlying model class is discrete, then the total expected square loss is a particularly interesting performance measure: (a) this quantity is finitely bounded, implying ..."
Abstract
 Add to MetaCart
(Show Context)
The Minimum Description Length principle for online sequence estimation/prediction in a proper learning setup is studied. If the underlying model class is discrete, then the total expected square loss is a particularly interesting performance measure: (a) this quantity is finitely bounded, implying convergence with probability one, and (b) it additionally specifies the convergence speed. For MDL, in general one can only have loss bounds which are finite but exponentially larger than those for Bayes mixtures. We show that this is even the case if the model class contains only Bernoulli distributions. We derive a new upper bound on the prediction error for countable Bernoulli classes. This implies a small bound (comparable to the one for Bayes mixtures) for certain important model classes. We discuss the application to Machine Learning tasks such as classification and hypothesis testing, and generalization to countable classes of i.i.d. models.