## Algorithmic Statistics (2001)

### Cached

### Download Links

- [www.cwi.nl]
- [arxiv.org]
- [arxiv.org]
- [arxiv.org]
- [www.cwi.nl]
- [homepages.cwi.nl]
- DBLP

### Other Repositories/Bibliography

Venue: | IEEE Transactions on Information Theory |

Citations: | 50 - 13 self |

### BibTeX

@ARTICLE{Gács01algorithmicstatistics,

author = {Péter Gács and John T. Tromp and Paul M.B. Vitányi},

title = {Algorithmic Statistics},

journal = {IEEE Transactions on Information Theory},

year = {2001},

volume = {47},

pages = {2443--2463}

}

### Years of Citing Articles

### OpenURL

### Abstract

While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing the information in the data, for example, a finite set (or probability distribution) where the data sample typically came from. The statistical theory based on such relations between individual objects can be called algorithmic statistics, in contrast to classical statistical theory that deals with relations between probabilistic ensembles. We develop the algorithmic theory of statistic, sufficient statistic, and minimal sufficient statistic. This theory is based on two-part codes consisting of the code for the statistic (the model summarizing the regularity, the meaningful information, in the data) and the model-to-data code. In contrast to the situation in probabilistic statistical theory, the algorithmic relation of (minimal) sufficiency is an absolute relation between the individual model and the individual data sample. We distinguish implicit and explicit descriptions of the models. We give characterizations of algorithmic (Kolmogorov) minimal sufficient statistic for all data samples for both description modes in the explicit mode under some constraints. We also strengthen and elaborate earlier results on the "Kolmogorov structure function" and "absolutely non-stochastic objects" those rare objects for which the simplest models that summarize their relevant information (minimal sucient statistics) are at least as complex as the objects themselves. We demonstrate a close relation between the probabilistic notions and the algorithmic ones: (i) in both cases there is an "information non-increase" law; (ii) it is shown that a function is a...

### Citations

9231 |
Elements of Information Theory
- Cover, Thomas
- 1990
(Show Context)
Citation Context ... an (implicitly or explicitly described) shortest program P ∗ for P, a shortest binary program computing x (that is, of length K(x | P ∗ )) can not be significantly shorter than its Shannon-Fano code =-=[5]-=- of length − log P(x), that is, K(x | P ∗ ) + > − log P(x). By definition, we fix some agreed upon constant β ≥ 0, and require K(x | P ∗ ) ≥ − log P(x) − β. As before, we will not indicate the depende... |

1782 | An Introduction to Kolmogorov Complexity and Its Applications, 2nd edn
- Li, Vitányi
- 1997
(Show Context)
Citation Context ... of the halting problem for Turing machines—for example, if K(K(y) | y) = O(1) for all y, then the halting problem can be shown to be decidable. This is known to be false. It is customary, [14], [7], =-=[10]-=-, to write explicitly “K(x | y)” and “K(x | y, K(y))”. Even though the difference between these two quantities is not very large, these small differences do matter in the sequel. In fact, not only the... |

561 | Three approaches to the quantitative definition of information,” Probl - Kolmogorov - 1965 |

427 |
A formal theory of inductive inference
- Solomonoff
- 1964
(Show Context)
Citation Context .... Historically, the idea of assigning to each object a probability consisting of the summed negative exponentials of the lengths of all programs computing the object, was first proposed by Solomonoff =-=[19]-=-. Then, the shorter programs contribute more probability than the longer ones. His aim, ultimately successful in terms of theory (see [10]) and as inspiration for developing applied versions [2], was ... |

340 | A theory of program size formally identical to information theory
- Chaitin
- 1975
(Show Context)
Citation Context ...ram to compute x if y is furnished as an auxiliary input to the computation. This conditional definition requires a warning since different authors use the same notation but mean different things. In =-=[3]-=- the author writes “K(x | y)” to actually mean “K(x | y, K(y)),” notationally hiding the intended supplementary auxiliary information “K(y).” This abuse of notation has the additional handicap that no... |

339 |
The definition of random sequences
- Martin-Löf
- 1966
(Show Context)
Citation Context ... code (based on the model Sd = {d}) that is as concise as the shortest single code. The description of d given S ∗ cannot be significantly shorter than log |S|. By the theory of Martin-Löf randomness =-=[16]-=- this means that d is a “typical” element 1 It is also called the Kolmogorov sufficient statistic. of S. In general there can be many algorithmic sufficient statistics for data d; a shortest among the... |

315 | The minimum description length principle in coding and modeling
- Barron, Rissanen, et al.
- 1998
(Show Context)
Citation Context ...noff [19]. Then, the shorter programs contribute more probability than the longer ones. His aim, ultimately successful in terms of theory (see [10]) and as inspiration for developing applied versions =-=[2]-=-, was to develop a general prediction method. Kolmogorov [11] introduced the complexity proper. The prefix-version of Kolmogorov complexity used in this paper was introduced in [14] and also treated l... |

273 |
On the mathematical foundations of theoretical statistics
- Fisher
- 1922
(Show Context)
Citation Context ...tcome of n coin tosses and T(D) = s then Pr(d | T(D) = s) = ( ) n −1 s and Pr(d | T(D) ̸= s) = 0. This can be shown to imply (I.2) and therefore T is a sufficient statistic for Θ. According to Fisher =-=[6]-=-: “The statistic chosen should summarise the whole of the relevant information supplied by the sample. This may be called the Criterion of Sufficiency . . . In the case of the normal curve of distribu... |

101 |
Laws of information conservation (non-growth) and aspects of the foundation of probability theory
- LEVIN
- 1974
(Show Context)
Citation Context ...ecidability of the halting problem for Turing machines—for example, if K(K(y) | y) = O(1) for all y, then the halting problem can be shown to be decidable. This is known to be false. It is customary, =-=[14]-=-, [7], [10], to write explicitly “K(x | y)” and “K(x | y, K(y))”. Even though the difference between these two quantities is not very large, these small differences do matter in the sequel. In fact, n... |

94 |
On the symmetry of algorithmic information
- GÁCS
- 1974
(Show Context)
Citation Context ...| y)” meaning that just “y” is given in the conditional. As it happens, “y, K(y)” represents more information than just “y”. For example, K(K(y) | y) can be almost as large as log K(y) by a result in =-=[7]-=-: For l(y) = n it has an upper bound of log n for all y, and for some y’s it has a lower bound of log n − log log n. In fact, this result quantifies the undecidability of the halting problem for Turin... |

70 | Minimum description length induction, Bayesianism, and Kolmogorov complexity - Vitanyi, Li - 2000 |

48 |
A formal theory of inductive inference, part 1 and part2
- Solomonoff
- 1964
(Show Context)
Citation Context .... Historically, the idea of assigning to each object a probability consisting of the summed negative exponentials of the lengths of all programs computing the object, was first proposed by Solomonoff =-=[19]-=-. Then, the shorter programs contribute more probability than the longer ones. His aim, ultimately successful in terms of theory (see [10]) and as inspiration for developing applied versions [2], was ... |

34 | A formal theory of inductive inference - Solomono |

32 |
Minimum Description Length Induction
- Vitanyi, Li
- 2000
(Show Context)
Citation Context ...ary strings. This paper is one of a triad of papers dealing with the best individual model for individual data: The present paper supplies the basic theoretical underpinning by way of two-part codes, =-=[20]-=- derives ideal versions of applied methods (MDL) inspired by the theory, and [9] treats experimental applications thereof. Probabilistic Statistics: In ordinary statistical theory one proceeds as foll... |

28 |
and V.A.Uspensky: Algorithms and Randomness
- Kolmogorov
- 1987
(Show Context)
Citation Context ...his can be generalized to computable probability mass functions for which the data is “typical.” Related aspects of “randomness deficiency” (formally defined later in (IV.1)) were formulated in [12], =-=[13]-=- and studied in [17], [21]. Algorithmic mutual information, and the associated non-increase law, were studied in [14], [15]. Despite its evident epistemological prominence in the theory of hypothesis ... |

21 | Applying mdl to learn best model granularity
- Gao, Li, et al.
- 2000
(Show Context)
Citation Context ...ual model for individual data: The present paper supplies the basic theoretical underpinning by way of two-part codes, [20] derives ideal versions of applied methods (MDL) inspired by the theory, and =-=[9]-=- treats experimental applications thereof. Probabilistic Statistics: In ordinary statistical theory one proceeds as follows, see for example [5]: Suppose two discrete random variables X, Y have a join... |

19 | Three approaches to the quantitative de of information - Kolmogorov - 1965 |

17 |
Kolmogorov complexity, data compression, and inference
- Cover
- 1985
(Show Context)
Citation Context ...involves separating regularity (structure) in the data from random effects. In a restricted setting where the models are finite sets a way to proceed was suggested by Kolmogorov, attribution in [17], =-=[4]-=-, [5]. Given data d, the goal is to identify the “most likely” finite set S of which d is a “typical” element. Finding a set of which the data is typical is reminiscent of selecting the appropriate ma... |

17 | Algorithmic complexity and stochastic properties of finite binary sequences
- V’yugin
(Show Context)
Citation Context ...ts of the algorithmic sufficient statistic have been studied before, for example as related to the “Kolmogorov structure function” [17], [4], and “absolutely non-stochastic objects” [17], [21], [18], =-=[22]-=-, notions also defined or suggested by Kolmogorov at the mentioned meeting. This work primarily studies quantification of the “non-sufficiency” of an algorithmic statistic, when the latter is restrict... |

14 |
On the defect of randomness of a finite object with respect to measures with given complexity bounds
- V’yugin
(Show Context)
Citation Context .... Cover [4], [5] interpreted this approach as a (sufficient) statistic. The “statistic” of the data is expressed as a finite set of which the data is a “typical” member. Following Shen [17] (see also =-=[21]-=-, [18], [20]), this can be generalized to computable probability mass functions for which the data is “typical.” Related aspects of “randomness deficiency” (formally defined later in (IV.1)) were form... |

12 | Discussion on Kolmogorov complexity and statistical analysis
- Shen
(Show Context)
Citation Context ... interpretation. One can also consider notions of near-typical and nearoptimal that arise from replacing the β in (III.1) by some slowly growing functions, such as O(log l(x)) or O(log k) as in [17], =-=[18]-=-. In [17], [21], a function of k and x is defined as the lack of typicality of x in sets of complexity at most k, and they then consider the minimum k for which this function becomes + = 0 or very sma... |

9 |
The concept of (α, β)-stochasticity in the Kolmogorov sense, and its properties
- Shen
(Show Context)
Citation Context ... this involves separating regularity (structure) in the data from random effects. In a restricted setting where the models are finite sets a way to proceed was suggested by Kolmogorov, attribution in =-=[17]-=-, [4], [5]. Given data d, the goal is to identify the “most likely” finite set S of which d is a “typical” element. Finding a set of which the data is typical is reminiscent of selecting the appropria... |

6 |
On logical foundations of probability theory, Pp. 1–5 in: Probability Theory and
- Kolmogorov
- 1983
(Show Context)
Citation Context ...0]), this can be generalized to computable probability mass functions for which the data is “typical.” Related aspects of “randomness deficiency” (formally defined later in (IV.1)) were formulated in =-=[12]-=-, [13] and studied in [17], [21]. Algorithmic mutual information, and the associated non-increase law, were studied in [14], [15]. Despite its evident epistemological prominence in the theory of hypot... |

4 | The concept of (; in the Kolmogorov sense, and its properties - Shen - 1983 |

4 | The concept of @ Y A-stochasticity in the Kolmogorov sense, and its properties - Shen - 1983 |

3 | On the defect of randomness of a object with respect to measures with given complexity bounds - V'yugin |

3 | conservation inequalities; Information and independence in mathematical theories - Randomness - 1984 |

3 | on Kolmogorov complexity and statistical analysis - “Discussion - 1999 |

2 | logical foundations of probability theory - “On - 1983 |