Results 1  10
of
77
The Similarity Metric
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2003
"... A new class of distances appropriate for measuring similarity relations between sequences, say one type of similarity per distance, is studied. We propose a new "normalized information distance", based on the noncomputable notion of Kolmogorov complexity, and show that it is in this class and it min ..."
Abstract

Cited by 186 (26 self)
 Add to MetaCart
A new class of distances appropriate for measuring similarity relations between sequences, say one type of similarity per distance, is studied. We propose a new "normalized information distance", based on the noncomputable notion of Kolmogorov complexity, and show that it is in this class and it minorizes every computable distance in the class (that is, it is universal in that it discovers all computable similarities). We demonstrate that it is a metric and call it the similarity metric. This theory forms the foundation for a new practical tool. To evidence generality and robustness we give two distinctive applications in widely divergent areas using standard compression programs like gzip and GenCompress. First, we compare whole mitochondrial genomes and infer their evolutionary history. This results in a first completely automatic computed whole mitochondrial phylogeny tree. Secondly, we fully automatically compute the language tree of 52 different languages.
Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity
 IEEE Transactions on Information Theory
, 1998
"... The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition un ..."
Abstract

Cited by 66 (7 self)
 Add to MetaCart
The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be applied is encapsulated as the Fundamental Inequality, which in broad terms states that the principle is valid when the data are random, relative to every contemplated hypothesis and also these hypotheses are random relative to the (universal) prior. Basically, the ideal principle states that the prior probability associated with the hypothesis should be given by the algorithmic universal probability, and the sum of the log universal probability of the model plus the log of the probability of the data given the model should be minimized. If we restrict the model class to the finite sets then application of the ideal principle turns into Kolmogorov's mi...
Algorithmic Statistics
 IEEE Transactions on Information Theory
, 2001
"... While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing the information in the data, for example, a finite set (or ..."
Abstract

Cited by 49 (13 self)
 Add to MetaCart
While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing the information in the data, for example, a finite set (or probability distribution) where the data sample typically came from. The statistical theory based on such relations between individual objects can be called algorithmic statistics, in contrast to classical statistical theory that deals with relations between probabilistic ensembles. We develop the algorithmic theory of statistic, sufficient statistic, and minimal sufficient statistic. This theory is based on twopart codes consisting of the code for the statistic (the model summarizing the regularity, the meaningful information, in the data) and the modeltodata code. In contrast to the situation in probabilistic statistical theory, the algorithmic relation of (minimal) sufficiency is an absolute relation between the individual model and the individual data sample. We distinguish implicit and explicit descriptions of the models. We give characterizations of algorithmic (Kolmogorov) minimal sufficient statistic for all data samples for both description modes in the explicit mode under some constraints. We also strengthen and elaborate earlier results on the "Kolmogorov structure function" and "absolutely nonstochastic objects" those rare objects for which the simplest models that summarize their relevant information (minimal sucient statistics) are at least as complex as the objects themselves. We demonstrate a close relation between the probabilistic notions and the algorithmic ones: (i) in both cases there is an "information nonincrease" law; (ii) it is shown that a function is a...
Discovering Neural Nets With Low Kolmogorov Complexity And High Generalization Capability
 Neural Networks
, 1997
"... Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universali ..."
Abstract

Cited by 49 (30 self)
 Add to MetaCart
Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most previous approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper addresses both issues. It first reviews some basic concepts of algorithmic complexity theory relevant to machine learning, and how the SolomonoffLevin distribution (or universal prior) deals with the prior problem. The universal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability. The method is based on Levin complexity (a timebounded generalization of Kolmogorov comple...
Logical Depth and Physical Complexity
 THE UNIVERSAL TURING MACHINE: A HALFCENTURY SURVEY
, 1988
"... Some mathematical and natural objects (a random sequence, a sequence of zeros, a perfect crystal, a gas) are intuitively trivial, while others (e.g. the human body, the digits of #) contain internal evidence of a nontrivial causal history. We formalize this ..."
Abstract

Cited by 48 (0 self)
 Add to MetaCart
Some mathematical and natural objects (a random sequence, a sequence of zeros, a perfect crystal, a gas) are intuitively trivial, while others (e.g. the human body, the digits of #) contain internal evidence of a nontrivial causal history. We formalize this
Hierarchies Of Generalized Kolmogorov Complexities And Nonenumerable Universal Measures Computable In The Limit
 INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE
, 2000
"... The traditional theory of Kolmogorov complexity and algorithmic probability focuses on monotone Turing machines with oneway writeonly output tape. This naturally leads to the universal enumerable SolomonoLevin measure. Here we introduce more general, nonenumerable but cumulatively enumerable m ..."
Abstract

Cited by 38 (20 self)
 Add to MetaCart
The traditional theory of Kolmogorov complexity and algorithmic probability focuses on monotone Turing machines with oneway writeonly output tape. This naturally leads to the universal enumerable SolomonoLevin measure. Here we introduce more general, nonenumerable but cumulatively enumerable measures (CEMs) derived from Turing machines with lexicographically nondecreasing output and random input, and even more general approximable measures and distributions computable in the limit. We obtain a natural hierarchy of generalizations of algorithmic probability and Kolmogorov complexity, suggesting that the "true" information content of some (possibly in nite) bitstring x is the size of the shortest nonhalting program that converges to x and nothing but x on a Turing machine that can edit its previous outputs. Among other things we show that there are objects computable in the limit yet more random than Chaitin's "number of wisdom" Omega, that any approximable measure of x is small for any x lacking a short description, that there is no universal approximable distribution, that there is a universal CEM, and that any nonenumerable CEM of x is small for any x lacking a short enumerating program. We briey mention consequences for universes sampled from such priors.
Discovering Solutions with Low Kolmogorov Complexity and High Generalization Capability
, 1995
"... Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data ( Occam's razor). Most practi cal implementations, however, use measures for "simplicity" that lack the power, univ ..."
Abstract

Cited by 37 (25 self)
 Add to MetaCart
Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data ( Occam's razor). Most practi cal implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most pre vious approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper ad dresses both issues. It first reviews some ba sic concepts of algorithmic complexity theory relevant to machine learning, and how the SolomonoffLevin distribution (or universal prior) deals with the prior problem. The uni versal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability.
Information Distance
, 1997
"... While Kolmogorov complexity is the accepted absolute measure of information content in an individual finite object, a similarly absolute notion is needed for the information distance between two individual objects, for example, two pictures. We give several natural definitions of a universal inf ..."
Abstract

Cited by 36 (4 self)
 Add to MetaCart
While Kolmogorov complexity is the accepted absolute measure of information content in an individual finite object, a similarly absolute notion is needed for the information distance between two individual objects, for example, two pictures. We give several natural definitions of a universal information metric, based on length of shortest programs for either ordinary computations or reversible (dissipationless) computations. It turns out that these definitions are equivalent up to an additive logarithmic term. We show that the information distance is a universal cognitive similarity distance. We investigate the maximal correlation of the shortest programs involved, the maximal uncorrelation of programs (a generalization of the SlepianWolf theorem of classical information theory), and the density properties of the discrete metric spaces induced by the information distances. A related distance measures the amount of nonreversibility of a computation. Using the physical theo...
Uniform test of algorithmic randomness over a general space
 Theoretical Computer Science
, 2005
"... ABSTRACT. The algorithmic theory of randomness is well developed when the underlying space is the set of finite or infinite sequences and the underlying probability distribution is the uniform distribution or a computable distribution. These restrictions seem artificial. Some progress has been made ..."
Abstract

Cited by 35 (4 self)
 Add to MetaCart
ABSTRACT. The algorithmic theory of randomness is well developed when the underlying space is the set of finite or infinite sequences and the underlying probability distribution is the uniform distribution or a computable distribution. These restrictions seem artificial. Some progress has been made to extend the theory to arbitrary Bernoulli distributions (by MartinLöf), and to arbitrary distributions (by Levin). We recall the main ideas and problems of Levin’s theory, and report further progress in the same framework. The issues are the following: – Allow noncompact spaces (like the space of continuous functions, underlying the Brownian motion). – The uniform test (deficiency of randomness) dP (x) (depending both on the outcome x and the measure P) should be defined in a general and natural way. – See which of the old results survive: existence of universal tests, conservation of randomness, expression of tests in terms of description complexity, existence of a universal measure, expression of mutual information as ”deficiency of independence”. – The negative of the new randomness test is shown to be a generalization of complexity in continuous spaces; we show that the addition theorem survives. The paper’s main contribution is introducing an appropriate framework for studying these questions and related ones (like statistics for a general family of distributions). 1.
Averagecase computational complexity theory
 Complexity Theory Retrospective II
, 1997
"... ABSTRACT Being NPcomplete has been widely interpreted as being computationally intractable. But NPcompleteness is a worstcase concept. Some NPcomplete problems are \easy on average", but some may not be. How is one to know whether an NPcomplete problem is \di cult on average"? The the ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
ABSTRACT Being NPcomplete has been widely interpreted as being computationally intractable. But NPcompleteness is a worstcase concept. Some NPcomplete problems are \easy on average", but some may not be. How is one to know whether an NPcomplete problem is \di cult on average"? The theory of averagecase computational complexity, initiated by Levin about ten years ago, is devoted to studying this problem. This paper is an attempt to provide an overview of the main ideas and results in this important new subarea of complexity theory. 1