Results 1  10
of
12
Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity
 IEEE Transactions on Information Theory
, 1998
"... The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition un ..."
Abstract

Cited by 67 (7 self)
 Add to MetaCart
The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be applied is encapsulated as the Fundamental Inequality, which in broad terms states that the principle is valid when the data are random, relative to every contemplated hypothesis and also these hypotheses are random relative to the (universal) prior. Basically, the ideal principle states that the prior probability associated with the hypothesis should be given by the algorithmic universal probability, and the sum of the log universal probability of the model plus the log of the probability of the data given the model should be minimized. If we restrict the model class to the finite sets then application of the ideal principle turns into Kolmogorov's mi...
Algorithmic Statistics
 IEEE Transactions on Information Theory
, 2001
"... While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing the information in the data, for example, a finite set (or ..."
Abstract

Cited by 52 (14 self)
 Add to MetaCart
While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing the information in the data, for example, a finite set (or probability distribution) where the data sample typically came from. The statistical theory based on such relations between individual objects can be called algorithmic statistics, in contrast to classical statistical theory that deals with relations between probabilistic ensembles. We develop the algorithmic theory of statistic, sufficient statistic, and minimal sufficient statistic. This theory is based on twopart codes consisting of the code for the statistic (the model summarizing the regularity, the meaningful information, in the data) and the modeltodata code. In contrast to the situation in probabilistic statistical theory, the algorithmic relation of (minimal) sufficiency is an absolute relation between the individual model and the individual data sample. We distinguish implicit and explicit descriptions of the models. We give characterizations of algorithmic (Kolmogorov) minimal sufficient statistic for all data samples for both description modes in the explicit mode under some constraints. We also strengthen and elaborate earlier results on the "Kolmogorov structure function" and "absolutely nonstochastic objects" those rare objects for which the simplest models that summarize their relevant information (minimal sucient statistics) are at least as complex as the objects themselves. We demonstrate a close relation between the probabilistic notions and the algorithmic ones: (i) in both cases there is an "information nonincrease" law; (ii) it is shown that a function is a...
Kolmogorov’s structure functions and model selection
 IEEE Trans. Inform. Theory
"... approach to statistics and model selection. Let data be finite binary strings and models be finite sets of binary strings. Consider model classes consisting of models of given maximal (Kolmogorov) complexity. The “structure function ” of the given data expresses the relation between the complexity l ..."
Abstract

Cited by 32 (14 self)
 Add to MetaCart
approach to statistics and model selection. Let data be finite binary strings and models be finite sets of binary strings. Consider model classes consisting of models of given maximal (Kolmogorov) complexity. The “structure function ” of the given data expresses the relation between the complexity level constraint on a model class and the least logcardinality of a model in the class containing the data. We show that the structure function determines all stochastic properties of the data: for every constrained model class it determines the individual bestfitting model in the class irrespective of whether the “true ” model is in the model class considered or not. In this setting, this happens with certainty, rather than with high probability as is in the classical case. We precisely quantify the goodnessoffit of an individual model with respect to individual data. We show that—within the obvious constraints—every graph is realized by the structure function of some data. We determine the (un)computability properties of the various functions contemplated and of the “algorithmic minimal sufficient statistic.” Index Terms— constrained minimum description length (ML) constrained maximum likelihood (MDL) constrained bestfit model selection computability lossy compression minimal sufficient statistic nonprobabilistic statistics Kolmogorov complexity, Kolmogorov Structure function prediction sufficient statistic
Algorithmic Complexity and Stochastic Properties of Finite Binary Sequences
, 1999
"... This paper is a survey of concepts and results related to simple Kolmogorov complexity, prefix complexity and resourcebounded complexity. We also consider a new type of complexity statistical complexity closely related to mathematical statistics. Unlike other discoverers of algorithmic complexit ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
This paper is a survey of concepts and results related to simple Kolmogorov complexity, prefix complexity and resourcebounded complexity. We also consider a new type of complexity statistical complexity closely related to mathematical statistics. Unlike other discoverers of algorithmic complexity, A. N. Kolmogorov's leading motive was developing on its basis a mathematical theory more adequately substantiating applications of probability theory, mathematical statistics and information theory. Kolmogorov wanted to deduce properties of a random object from its complexity characteristics without use of the notion of probability. In the first part of this paper we present several results in this direction. Though the subsequent development of algorithmic complexity and randomness was different, algorithmic complexity has successful applications in a traditional probabilistic framework. In the second part of the paper we consider applications to the estimation of parameters and the definition of Bernoulli sequences. All considerations have finite combinatorial character. 1.
Kolmogorov’s structure functions with an application to the foundations of model selection
 In Proc. 43rd Symposium on Foundations of Computer Science
, 2002
"... We vindicate, for the first time, the rightness of the original “structure function”, proposed by Kolmogorov in 1974, by showing that minimizing a twopart code consisting of a model subject to (Kolmogorov) complexity constraints, together with a datatomodel code, produces a model of best fit (for ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
We vindicate, for the first time, the rightness of the original “structure function”, proposed by Kolmogorov in 1974, by showing that minimizing a twopart code consisting of a model subject to (Kolmogorov) complexity constraints, together with a datatomodel code, produces a model of best fit (for which the data is maximally “typical”). The method thus separates all possible model information from the remaining accidental information. This result gives a foundation for MDL, and related methods, in model selection. Settlement of this longstanding question is the more remarkable since the minimal randomness deficiency function (measuring maximal “typicality”) itself cannot be monotonically approximated, but the shortest twopart code can. We furthermore show that both the structure function and the minimum randomness deficiency function can assume all shapes over their full domain (improving an independent unpublished result of Levin on the former function of the early 70s, and extending a partial result of V’yugin on the latter function of the late 80s and also recent results on prediction loss measured by “snooping curves”). We give an explicit realization of optimal twopart codes at all levels of model complexity. We determine the (un)computability properties of the various functions and “algorithmic sufficient statistic ” considered. In our setting the models are finite sets, but the analysis is valid, up to logarithmic additive terms, for the model class of computable probability density functions, or the model class of total recursive functions. 1
Kolmogorov's Contributions to the Foundations of Probability
"... Andrei Nikolaevich Kolmogorov was the foremost contributor to the mathematical and philosophical foundations of probability in the twentieth century, and his thinking on the topic is still potent today. In this article we first review the three stages of Kolmogorov's work on the foundations of proba ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Andrei Nikolaevich Kolmogorov was the foremost contributor to the mathematical and philosophical foundations of probability in the twentieth century, and his thinking on the topic is still potent today. In this article we first review the three stages of Kolmogorov's work on the foundations of probability: (1) his formulation of measuretheoretic probability, 1933, (2) his frequentist theory of probability, 1963, and (3) his algorithmic theory of randomness, 19651987. We also discuss another approach to the foundations of probability, based on martingales, that Kolmogorov did not consider.
Rate distortion and denoising of individual data using Kolmogorov complexity
 IEEE Trans. Inform. Theory
, 2010
"... Abstract—We examine the structure of families of distortion balls from the perspective of Kolmogorov complexity. Special attention is paid to the canonical ratedistortion function of a source word which returns the minimal Kolmogorov complexity of all distortion balls containing that word subject t ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Abstract—We examine the structure of families of distortion balls from the perspective of Kolmogorov complexity. Special attention is paid to the canonical ratedistortion function of a source word which returns the minimal Kolmogorov complexity of all distortion balls containing that word subject to a bound on their cardinality. This canonical ratedistortion function is related to the more standard algorithmic ratedistortion function for the given distortion measure. Examples are given of list distortion, Hamming distortion, and Euclidean distortion. The algorithmic ratedistortion function can behave differently from Shannon’s ratedistortion function. To this end, we show that the canonical ratedistortion function can and does assume a wide class of shapes (unlike Shannon’s); we relate low algorithmic mutual information to low Kolmogorov complexity (and consequently suggest that certain aspects of the mutual information formulation of Shannon’s ratedistortion function behave differently than would an analogous formulation using algorithmic mutual information); we explore the notion that low Kolmogorov complexity distortion balls containing a given word capture the interesting properties of that word (which is hard to formalize in Shannon’s theory) and this suggests an approach to denoising. Index Terms—Algorithmic rate distortion, characterization, denoising, distortion families, fitness of destination words, individual data, Kolmogorov complexity, rate distortion, shapes of curves. word are relevant in the setting at hand, and which aspects are irrelevant (such as noise). For example, in application to lossy compression of a sound file this results in a compressed file where, among others, the very high and very low inaudible frequencies have been suppressed. The distortion measure is chosen such that it penalizes the deletion of the inaudible frequencies but lightly because they are not relevant for the auditory experience. We study rate distortion of individual source words using Kolmogorov complexity and show how it is related to denoising. The classical probabilistic theory is reviewed in Appendix A. Computability notions are reviewed
Kolmogorov's Complexity Conception of Probability
 Probability Theory: Philosophy, Recent History and Relations to Science
, 2000
"... Kolmogorov's goal in proposing his complexity conception of probability was to provide a better foundation for the applications of probability (as opposed to the theory of probability; he believed that his 1933 axioms were sufficient for the theory of probability). The complexity conception was ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Kolmogorov's goal in proposing his complexity conception of probability was to provide a better foundation for the applications of probability (as opposed to the theory of probability; he believed that his 1933 axioms were sufficient for the theory of probability). The complexity conception was a natural development of Kolmogorov's earlier frequentist conception combined with (a) his conviction that only finite data sequences are of any interest in the applications of probability, and (b) Turing's discovery of the universal computing device. Besides the complexity conception itself, its developments by MartinLof, Levin et al will be briefly discussed; I will also list some advantages and limitations of Kolmogorov's complexity conception and the algorithmic theory of randomness in general. Introduction 1 Theory Applications Kolmogorov's axiomatic conception of probability Modern algorithmic theory of randomness Kolmogorov's complexity conception of probability Conditi...
Towards an Algorithmic Statistics (Extended Abstract)
"... ) Peter G'acs ? , John Tromp, and Paul Vit'anyi ?? Abstract. While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model su ..."
Abstract
 Add to MetaCart
) Peter G'acs ? , John Tromp, and Paul Vit'anyi ?? Abstract. While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing the information in the data, for example, a finite set where the data sample typically came from. The statistical theory based on such relations between individual objects can be called algorithmic statistics, in contrast to ordinary statistical theory that deals with relations between probabilistic ensembles. We develop a new algorithmic theory of typical statistic, sufficient statistic, and minimal sufficient statistic. 1 Introduction We take statistical theory to ideally consider the following problem: Given a data sample and a family of models (hypotheses) one wants to select the model that produced the data. But a priori it is possible that the data is atypical for the...
1 Rate Distortion and Denoising of Individual Data Using Kolmogorov complexity
"... We examine the structure of families of distortion balls from the perspective of Kolmogorov complexity. Special attention is paid to the canonical ratedistortion function of a source word which returns the minimal Kolmogorov complexity of all distortion balls containing that word subject to a bound ..."
Abstract
 Add to MetaCart
We examine the structure of families of distortion balls from the perspective of Kolmogorov complexity. Special attention is paid to the canonical ratedistortion function of a source word which returns the minimal Kolmogorov complexity of all distortion balls containing that word subject to a bound on their cardinality. This canonical ratedistortion function is related to the more standard algorithmic ratedistortion function for the given distortion measure. Examples are given of list distortion, Hamming distortion, and Euclidean distortion. The algorithmic ratedistortion function can behave differently from Shannon’s ratedistortion function. To this end, we show that the canonical ratedistortion function can and does assume a wide class of shapes (unlike Shannon’s); we relate low algorithmic mutual information to low Kolmogorov complexity (and consequently suggest that certain aspects of the mutual information formulation of Shannon’s ratedistortion function behave differently than would an analogous formulation using algorithmic mutual information); we explore the notion that low Kolmogorov complexity distortion balls containing a given word capture the interesting properties of that word (which is hard to formalize in Shannon’s theory) and this suggests an approach to denoising. I.