• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Discussion on Kolmogorov complexity and statistical analysis (0)

by Shen
Venue:The Computer Journal
Add To MetaCart

Tools

Sorted by:
Results 1 - 6 of 6

Algorithmic Statistics

by Péter Gács, John T. Tromp, Paul M.B. Vitányi - IEEE Transactions on Information Theory , 2001
"... While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing the information in the data, for example, a finite set (or ..."
Abstract - Cited by 41 (8 self) - Add to MetaCart
While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing the information in the data, for example, a finite set (or probability distribution) where the data sample typically came from. The statistical theory based on such relations between individual objects can be called algorithmic statistics, in contrast to classical statistical theory that deals with relations between probabilistic ensembles. We develop the algorithmic theory of statistic, sufficient statistic, and minimal sufficient statistic. This theory is based on two-part codes consisting of the code for the statistic (the model summarizing the regularity, the meaningful information, in the data) and the model-to-data code. In contrast to the situation in probabilistic statistical theory, the algorithmic relation of (minimal) sufficiency is an absolute relation between the individual model and the individual data sample. We distinguish implicit and explicit descriptions of the models. We give characterizations of algorithmic (Kolmogorov) minimal sufficient statistic for all data samples for both description modes in the explicit mode under some constraints. We also strengthen and elaborate earlier results on the "Kolmogorov structure function" and "absolutely non-stochastic objects" those rare objects for which the simplest models that summarize their relevant information (minimal sucient statistics) are at least as complex as the objects themselves. We demonstrate a close relation between the probabilistic notions and the algorithmic ones: (i) in both cases there is an "information non-increase" law; (ii) it is shown that a function is a...

Kolmogorov's Structure Functions and Model Selection

by Nikolai Vereshchagin, Paul Vitanyi - IEEE TRANS. INFORM. THEORY , 2003
"... In 1974 Kolmogorov proposed a non-probabilistic approach to statistics, an individual combinatorial relation between the data and its model, expressed by the so-called "structure function" of the data. We show that the structure function determines all stochastic properties of the data in the sense ..."
Abstract - Cited by 17 (7 self) - Add to MetaCart
In 1974 Kolmogorov proposed a non-probabilistic approach to statistics, an individual combinatorial relation between the data and its model, expressed by the so-called "structure function" of the data. We show that the structure function determines all stochastic properties of the data in the sense of determining the best- tting model at every model-complexity level. A consequence is this: minimizing the data-to-model code length (finding the ML estimator or MDL estimator), in a class of contemplated models of prescribed maximal (Kolmogorov) complexity, always results in a model of best fit, irrespective of whether the source producing the data is in the model class considered. In this setting, code minimization always separates optimal model information from the remaining accidental information, and not only with high probability. The function that maps the maximal allowed model complexity to the goodness-of-fit (expressed as minimal "randomness deficiency") of the best model cannot itself be monotonically approximated. However, the shortest one-part or two-part code above can -- implicitly optimizing this elusive goodness-of-fit. We show that -- within the obvious constraints -- every graph is realized by the structure function of some data. We determine the (un)computability properties of the various functions contemplated and of the "algorithmic minimal sufficient statistic."

Kolmogorov's Structure Functions with an Application to the Foundations of Model Selection

by Nikolai Vereshchagin, Paul Vitányi - In Proceedings of the 43rd Annual Symposium on Foundations of Computer Science. IEEE Computer Society , 2002
"... In 1974 Kolmogorov proposed a non-probabilistic approach to statistics, an individual combinatorial relation between the data and its model. We vindicate, for the first time, the rightness of the original "structure function", proposed by Kolmogorov: minimizing the data-to-model code length (finding ..."
Abstract - Cited by 9 (0 self) - Add to MetaCart
In 1974 Kolmogorov proposed a non-probabilistic approach to statistics, an individual combinatorial relation between the data and its model. We vindicate, for the first time, the rightness of the original "structure function", proposed by Kolmogorov: minimizing the data-to-model code length (finding the ML estimator or MDL estimator), in a class of contemplated models of prescribed maximal (Kolmogorov) complexity, always results in a model of best fit (expressed as minimal randomness deficiency). We show that both the structure function and the minimum randomness deficiency function can assume all shapes over their full domain (improving an old result of L.A. Levin and both an old and a recent one of V.V. Vyugin). We determine the (un)computability properties of the various functions and "algorithmic sufficient statistic." 1

Towards an Algorithmic Statistics (Extended Abstract)

by Peter Gács, John Tromp, Paul Vitányi
"... ) Peter G'acs ? , John Tromp, and Paul Vit'anyi ?? Abstract. While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model su ..."
Abstract - Add to MetaCart
) Peter G'acs ? , John Tromp, and Paul Vit'anyi ?? Abstract. While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing the information in the data, for example, a finite set where the data sample typically came from. The statistical theory based on such relations between individual objects can be called algorithmic statistics, in contrast to ordinary statistical theory that deals with relations between probabilistic ensembles. We develop a new algorithmic theory of typical statistic, sufficient statistic, and minimal sufficient statistic. 1 Introduction We take statistical theory to ideally consider the following problem: Given a data sample and a family of models (hypotheses) one wants to select the model that produced the data. But a priori it is possible that the data is atypical for the...

Kolmogorov Complexity and Model Selection

by Nikolay Vereshchagin
"... The goal of statistics is to provide explanations (models) of observed data. We are given some data and have to infer a plausible probabilistic hypothesis explaining it. Consider, for example, the following scenario. We are given a “black box”. We have turned the box on (only once) and it has produc ..."
Abstract - Add to MetaCart
The goal of statistics is to provide explanations (models) of observed data. We are given some data and have to infer a plausible probabilistic hypothesis explaining it. Consider, for example, the following scenario. We are given a “black box”. We have turned the box on (only once) and it has produced a sequence x of million bits. Given x, we have to infer a hypothesis about the black box. Classical mathematical statistics does not study this question. It considers only the case when we are given results of many independent tests of the box. However, in the real life, there are experiments that cannot be repeated. In some such cases the common sense does provide a reasonable explanation of x. Here are three examples: (1) The black box has printed million zeros. In this case we probably would say that the box is able to produce only zeros. (2) The box has produced a sequence without any regularities. In this case we would say that the box produces million independent random bits. (3) The first half of the sequence consists of zeros and the second half has no regularities. In this case we would say that the box produces 500000 zeros and then 500000 independent random bits. Let us try to understand the mechanism of such common sense reasoning. First, we can observe that in each of the three cases we have inferred a finite set A including x. In the first case, A consists of x only. In the second case, A consists of all sequences of length million. In the third case, the set includes all sequences whose first half consists of only zeros. Second, in all the three cases the set A can be described in few number of bits. That is A has low Kolmogorov complexity. 1 Third, all regularities present in x are shared by all other elements of A. That is, x is a “typical element of A”. It seems that the common sense reasoning works as follows: given a string x of n bits we find a finite set A of strings of length n containing x such that (1) A has low Kolmogorov complexity (we are interested in simple explanations)

Algorithmic Minimal Sufficient Statistics: a New Definition

by Nikolay Vereshchagin , 2010
"... We express some criticism about the definition of an algorithmic sufficient statistic and, in particular, of an algorithmic minimal sufficient statistic. We propose another definition, which has better properties. 1 ..."
Abstract - Add to MetaCart
We express some criticism about the definition of an algorithmic sufficient statistic and, in particular, of an algorithmic minimal sufficient statistic. We propose another definition, which has better properties. 1
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University