Results 1  10
of
12
Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity
 IEEE Transactions on Information Theory
, 1998
"... The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition un ..."
Abstract

Cited by 67 (7 self)
 Add to MetaCart
The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be applied is encapsulated as the Fundamental Inequality, which in broad terms states that the principle is valid when the data are random, relative to every contemplated hypothesis and also these hypotheses are random relative to the (universal) prior. Basically, the ideal principle states that the prior probability associated with the hypothesis should be given by the algorithmic universal probability, and the sum of the log universal probability of the model plus the log of the probability of the data given the model should be minimized. If we restrict the model class to the finite sets then application of the ideal principle turns into Kolmogorov's mi...
Algorithmic Statistics
 IEEE Transactions on Information Theory
, 2001
"... While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing the information in the data, for example, a finite set (or ..."
Abstract

Cited by 52 (14 self)
 Add to MetaCart
While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing the information in the data, for example, a finite set (or probability distribution) where the data sample typically came from. The statistical theory based on such relations between individual objects can be called algorithmic statistics, in contrast to classical statistical theory that deals with relations between probabilistic ensembles. We develop the algorithmic theory of statistic, sufficient statistic, and minimal sufficient statistic. This theory is based on twopart codes consisting of the code for the statistic (the model summarizing the regularity, the meaningful information, in the data) and the modeltodata code. In contrast to the situation in probabilistic statistical theory, the algorithmic relation of (minimal) sufficiency is an absolute relation between the individual model and the individual data sample. We distinguish implicit and explicit descriptions of the models. We give characterizations of algorithmic (Kolmogorov) minimal sufficient statistic for all data samples for both description modes in the explicit mode under some constraints. We also strengthen and elaborate earlier results on the "Kolmogorov structure function" and "absolutely nonstochastic objects" those rare objects for which the simplest models that summarize their relevant information (minimal sucient statistics) are at least as complex as the objects themselves. We demonstrate a close relation between the probabilistic notions and the algorithmic ones: (i) in both cases there is an "information nonincrease" law; (ii) it is shown that a function is a...
Kolmogorov’s structure functions and model selection
 IEEE Trans. Inform. Theory
"... approach to statistics and model selection. Let data be finite binary strings and models be finite sets of binary strings. Consider model classes consisting of models of given maximal (Kolmogorov) complexity. The “structure function ” of the given data expresses the relation between the complexity l ..."
Abstract

Cited by 32 (14 self)
 Add to MetaCart
approach to statistics and model selection. Let data be finite binary strings and models be finite sets of binary strings. Consider model classes consisting of models of given maximal (Kolmogorov) complexity. The “structure function ” of the given data expresses the relation between the complexity level constraint on a model class and the least logcardinality of a model in the class containing the data. We show that the structure function determines all stochastic properties of the data: for every constrained model class it determines the individual bestfitting model in the class irrespective of whether the “true ” model is in the model class considered or not. In this setting, this happens with certainty, rather than with high probability as is in the classical case. We precisely quantify the goodnessoffit of an individual model with respect to individual data. We show that—within the obvious constraints—every graph is realized by the structure function of some data. We determine the (un)computability properties of the various functions contemplated and of the “algorithmic minimal sufficient statistic.” Index Terms— constrained minimum description length (ML) constrained maximum likelihood (MDL) constrained bestfit model selection computability lossy compression minimal sufficient statistic nonprobabilistic statistics Kolmogorov complexity, Kolmogorov Structure function prediction sufficient statistic
Syntactic Measures of Complexity
, 1999
"... page 14 Declaration  page 15 Notes of copyright and the ownership of intellectual property rights  page 15 The Author  page 16 Acknowledgements  page 16 1  Introduction  page 17 1.1  Background  page 17 1.2  The Style of Approach  page 18 1.3  Motivation  page 19 1.4  Style of ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
page 14 Declaration  page 15 Notes of copyright and the ownership of intellectual property rights  page 15 The Author  page 16 Acknowledgements  page 16 1  Introduction  page 17 1.1  Background  page 17 1.2  The Style of Approach  page 18 1.3  Motivation  page 19 1.4  Style of Presentation  page 20 1.5  Outline of the Thesis  page 21 2  Models and Modelling  page 23 2.1  Some Types of Models  page 25 2.2  Combinations of Models  page 28 2.3  Parts of the Modelling Apparatus  page 33 2.4  Models in Machine Learning  page 38 2.5  The Philosophical Background to the Rest of this Thesis  page 41 Syntactic Measures of Complexity  page 3  3  Problems and Properties  page 44 3.1  Examples of Common Usage  page 44 3.1.1  A case of nails  page 44 3.1.2  Writing a thesis  page 44 3.1.3  Mathematics  page 44 3.1.4  A gas  page 44 3.1.5  An ant hill  page 45 3.1.6  A car engine  page 45 3.1.7  A cell as part of an organism ...
Kolmogorov’s structure functions with an application to the foundations of model selection
 In Proc. 43rd Symposium on Foundations of Computer Science
, 2002
"... We vindicate, for the first time, the rightness of the original “structure function”, proposed by Kolmogorov in 1974, by showing that minimizing a twopart code consisting of a model subject to (Kolmogorov) complexity constraints, together with a datatomodel code, produces a model of best fit (for ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
We vindicate, for the first time, the rightness of the original “structure function”, proposed by Kolmogorov in 1974, by showing that minimizing a twopart code consisting of a model subject to (Kolmogorov) complexity constraints, together with a datatomodel code, produces a model of best fit (for which the data is maximally “typical”). The method thus separates all possible model information from the remaining accidental information. This result gives a foundation for MDL, and related methods, in model selection. Settlement of this longstanding question is the more remarkable since the minimal randomness deficiency function (measuring maximal “typicality”) itself cannot be monotonically approximated, but the shortest twopart code can. We furthermore show that both the structure function and the minimum randomness deficiency function can assume all shapes over their full domain (improving an independent unpublished result of Levin on the former function of the early 70s, and extending a partial result of V’yugin on the latter function of the late 80s and also recent results on prediction loss measured by “snooping curves”). We give an explicit realization of optimal twopart codes at all levels of model complexity. We determine the (un)computability properties of the various functions and “algorithmic sufficient statistic ” considered. In our setting the models are finite sets, but the analysis is valid, up to logarithmic additive terms, for the model class of computable probability density functions, or the model class of total recursive functions. 1
PAGODA: A Model for Autonomous Learning in Probabilistic Domains
, 1992
"... as a testbed for designing intelligent agents. The system consists of an overall agent architecture and five components within the architecture. The five components are: 1. GoalDirected Learning (GDL), a decisiontheoretic method for selecting learning goals. 2. Probabilistic Bias Evaluation (PBE) ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
as a testbed for designing intelligent agents. The system consists of an overall agent architecture and five components within the architecture. The five components are: 1. GoalDirected Learning (GDL), a decisiontheoretic method for selecting learning goals. 2. Probabilistic Bias Evaluation (PBE), a technique for using probabilistic background knowledge to select learning biases for the learning goals. 3. Uniquely Predictive Theories (UPTs) and Probability Computation using Independence (PCI), a probabilistic representation and Bayesian inference method for the agent's theories. 4. A probabilistic learning component, consisting of a heuristic search algorithm and a Bayesian method for evaluating proposed theories. 5. A decisiontheoretic probabilistic planner, which searches through the probability space defined by the agent's current theory to select the best action. PAGODA is given as input an initial planning goal (its ove
Sophistication Revisited
 Proceedings of the 30th International Colloquium on Automata, Languages and Programming
, 2001
"... The Kolmogorov structure function divides the smallest program producing a string in two parts: the useful information present in the string, called sophistication if based on total functions, and the remaining accidental information. We revisit the notion of sophistication due to Koppel, formal ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
The Kolmogorov structure function divides the smallest program producing a string in two parts: the useful information present in the string, called sophistication if based on total functions, and the remaining accidental information. We revisit the notion of sophistication due to Koppel, formalize a connection between sophistication and a variation of computational depth (intuitively the useful or nonrandom information in a string), prove the existence of strings with maximum sophistication and show that they encode solutions of the halting problem, i.e., they are the deepest of all strings.
Towards an Algorithmic Statistics (Extended Abstract)
"... ) Peter G'acs ? , John Tromp, and Paul Vit'anyi ?? Abstract. While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model su ..."
Abstract
 Add to MetaCart
) Peter G'acs ? , John Tromp, and Paul Vit'anyi ?? Abstract. While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing the information in the data, for example, a finite set where the data sample typically came from. The statistical theory based on such relations between individual objects can be called algorithmic statistics, in contrast to ordinary statistical theory that deals with relations between probabilistic ensembles. We develop a new algorithmic theory of typical statistic, sufficient statistic, and minimal sufficient statistic. 1 Introduction We take statistical theory to ideally consider the following problem: Given a data sample and a family of models (hypotheses) one wants to select the model that produced the data. But a priori it is possible that the data is atypical for the...
Sophisticated Infinite Sequences
"... Abstract. In this paper we revisit the notion of sophistication for infinite sequences. Koppel defined sophistication of an object as the length of the shortest (finite) total program (p) that with some (finite or infinite) data (d) produce it and p  + d  is smaller than the shortest description ..."
Abstract
 Add to MetaCart
Abstract. In this paper we revisit the notion of sophistication for infinite sequences. Koppel defined sophistication of an object as the length of the shortest (finite) total program (p) that with some (finite or infinite) data (d) produce it and p  + d  is smaller than the shortest description of the object plus a constant. However the notion of “description of infinite sequences” is not appropriately defined. In this work, we propose a new definition of sophistication for infinite sequences as the limit of the ratio of sophistication of the initial segments and its length. As the main results we prove that highly sophisticated sequences are dense when the sophistication is defined with lim sup and the set of sequences with sophistication equal to zero is also dense when we consider the definition with lim inf. We also prove that, similarly to what happens for finite strings, sophistication and depth, for infinite sequences are distinct complexity measures.
unknown title
, 2002
"... Abstract. The information in an individual finite object (like a binary string) is commonly measured by its Kolmogorov complexity. One can divide that information into two parts: the information accounting for the useful regularity present in the object and the information accounting for the remaini ..."
Abstract
 Add to MetaCart
Abstract. The information in an individual finite object (like a binary string) is commonly measured by its Kolmogorov complexity. One can divide that information into two parts: the information accounting for the useful regularity present in the object and the information accounting for the remaining accidental information. There can be several ways (model classes) in which the regularity is expressed. Kolmogorov has proposed the model class of finite sets, generalized later to computable probability mass functions. The resulting theory, known as Algorithmic Statistics, analyzes the algorithmic sufficient statistic when the statistic is restricted to the given model class. However, the most general way to proceed is perhaps to express the useful information as a recursive function. The resulting measure has been called the “sophistication ” of the object. We develop the theory of recursive functions statistic, the maximum and minimum value, the existence of absolutely nonstochastic objects (that have maximal sophistication—all the information in them is meaningful and there is no residual randomness), determine its relation with the more restricted model classes of finite sets, and computable probability distributions, in particular with respect to the algorithmic (Kolmogorov) minimal sufficient statistic, the relation to the halting problem and further algorithmic properties. 1