Results 1  10
of
36
The Similarity Metric
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2003
"... A new class of distances appropriate for measuring similarity relations between sequences, say one type of similarity per distance, is studied. We propose a new "normalized information distance", based on the noncomputable notion of Kolmogorov complexity, and show that it is in this class and it min ..."
Abstract

Cited by 186 (26 self)
 Add to MetaCart
A new class of distances appropriate for measuring similarity relations between sequences, say one type of similarity per distance, is studied. We propose a new "normalized information distance", based on the noncomputable notion of Kolmogorov complexity, and show that it is in this class and it minorizes every computable distance in the class (that is, it is universal in that it discovers all computable similarities). We demonstrate that it is a metric and call it the similarity metric. This theory forms the foundation for a new practical tool. To evidence generality and robustness we give two distinctive applications in widely divergent areas using standard compression programs like gzip and GenCompress. First, we compare whole mitochondrial genomes and infer their evolutionary history. This results in a first completely automatic computed whole mitochondrial phylogeny tree. Secondly, we fully automatically compute the language tree of 52 different languages.
A tutorial introduction to the minimum description length principle
 in Advances in Minimum Description Length: Theory and Applications. 2005
"... ..."
Simplicity: A unifying principle in cognitive science?
 Trends in Cognitive Sciences
, 2003
"... This article reviews research exploring the idea that simplicity does, indeed, drive a wide range of cognitive processes. We outline mathematical theory, computational results, and empirical data underpinning this viewpoint. Key words: simplicity, Kolmogorov complexity, codes, learning, induction, B ..."
Abstract

Cited by 48 (2 self)
 Add to MetaCart
This article reviews research exploring the idea that simplicity does, indeed, drive a wide range of cognitive processes. We outline mathematical theory, computational results, and empirical data underpinning this viewpoint. Key words: simplicity, Kolmogorov complexity, codes, learning, induction, Bayesian inference 30word summary:This article outlines the proposal that many aspects of cognition, from perception, to language acquisition, to highlevel cognition involve finding patterns that provide the simplest explanation of available data. 3 The cognitive system finds patterns in the data that it receives. Perception involves finding patterns in the external world, from sensory input. Language acquisition involves finding patterns in linguistic input, to determine the structure of the language. Highlevel cognition involves finding patterns in information, to form categories, and to infer causal relations. Simplicity and the problem of induction A fundamental puzzle is what we term the problem of induction: infinitely many patterns are compatible with any finite set of data (see Box 1). So, for example, an infinity of curves pass through any finite set of points (Box 1a); an infinity of symbol sequences are compatible with any subsequence of symbols (Box 1b); infinitely many grammars are compatible with any finite set of observed sentences (Box 1c); and infinitely many perceptual organizations can fit any specific visual input (Box 1d). What principle allows the cognitive system to solve the problem of induction, and choose appropriately from these infinite sets of possibilities? Any such principle must meet two criteria: (i) it must solve the problem of induction successfully; (ii) it must explain empirical data in cognition. We argue that the best approach to (i)...
Kolmogorov’s structure functions and model selection
 IEEE Trans. Inform. Theory
"... approach to statistics and model selection. Let data be finite binary strings and models be finite sets of binary strings. Consider model classes consisting of models of given maximal (Kolmogorov) complexity. The “structure function ” of the given data expresses the relation between the complexity l ..."
Abstract

Cited by 32 (13 self)
 Add to MetaCart
approach to statistics and model selection. Let data be finite binary strings and models be finite sets of binary strings. Consider model classes consisting of models of given maximal (Kolmogorov) complexity. The “structure function ” of the given data expresses the relation between the complexity level constraint on a model class and the least logcardinality of a model in the class containing the data. We show that the structure function determines all stochastic properties of the data: for every constrained model class it determines the individual bestfitting model in the class irrespective of whether the “true ” model is in the model class considered or not. In this setting, this happens with certainty, rather than with high probability as is in the classical case. We precisely quantify the goodnessoffit of an individual model with respect to individual data. We show that—within the obvious constraints—every graph is realized by the structure function of some data. We determine the (un)computability properties of the various functions contemplated and of the “algorithmic minimal sufficient statistic.” Index Terms— constrained minimum description length (ML) constrained maximum likelihood (MDL) constrained bestfit model selection computability lossy compression minimal sufficient statistic nonprobabilistic statistics Kolmogorov complexity, Kolmogorov Structure function prediction sufficient statistic
Causal inference using the algorithmic Markov condition
, 2008
"... Inferring the causal structure that links n observables is usually basedupon detecting statistical dependences and choosing simple graphs that make the joint measure Markovian. Here we argue why causal inference is also possible when only single observations are present. We develop a theory how to g ..."
Abstract

Cited by 11 (11 self)
 Add to MetaCart
Inferring the causal structure that links n observables is usually basedupon detecting statistical dependences and choosing simple graphs that make the joint measure Markovian. Here we argue why causal inference is also possible when only single observations are present. We develop a theory how to generate causal graphs explaining similarities between single objects. To this end, we replace the notion of conditional stochastic independence in the causal Markov condition with the vanishing of conditional algorithmic mutual information anddescribe the corresponding causal inference rules. We explain why a consistent reformulation of causal inference in terms of algorithmic complexity implies a new inference principle that takes into account also the complexity of conditional probability densities, making it possible to select among Markov equivalent causal graphs. This insight provides a theoretical foundation of a heuristic principle proposed in earlier work. We also discuss how to replace Kolmogorov complexity with decidable complexity criteria. This can be seen as an algorithmic analog of replacing the empirically undecidable question of statistical independence with practical independence tests that are based on implicit or explicit assumptions on the underlying distribution. email:
Kolmogorov Complexity and Information Theory  With An Interpretation . . .
"... We compare the elementary theories of Shannon information and Kolmogorov complexity, the extent to which they have a common purpose, and where they are fundamentally different. We discuss and relate the basic notions of both theories: Shannon entropy, Kolmogorov complexity, Shannon mutual informati ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
We compare the elementary theories of Shannon information and Kolmogorov complexity, the extent to which they have a common purpose, and where they are fundamentally different. We discuss and relate the basic notions of both theories: Shannon entropy, Kolmogorov complexity, Shannon mutual information and Kolmogorov (`algorithmic') mutual information. We explain how universal coding may be viewed as a middle ground between the two theories. We consider Shannon's rate distortion theory, which quantifies useful (in a certain sense) information. We use the communication of information as our guiding motif, and we explain how it relates to sequential questionanswer sessions.
Kolmogorov’s structure functions with an application to the foundations of model selection
 In Proc. 43rd Symposium on Foundations of Computer Science
, 2002
"... We vindicate, for the first time, the rightness of the original “structure function”, proposed by Kolmogorov in 1974, by showing that minimizing a twopart code consisting of a model subject to (Kolmogorov) complexity constraints, together with a datatomodel code, produces a model of best fit (for ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
We vindicate, for the first time, the rightness of the original “structure function”, proposed by Kolmogorov in 1974, by showing that minimizing a twopart code consisting of a model subject to (Kolmogorov) complexity constraints, together with a datatomodel code, produces a model of best fit (for which the data is maximally “typical”). The method thus separates all possible model information from the remaining accidental information. This result gives a foundation for MDL, and related methods, in model selection. Settlement of this longstanding question is the more remarkable since the minimal randomness deficiency function (measuring maximal “typicality”) itself cannot be monotonically approximated, but the shortest twopart code can. We furthermore show that both the structure function and the minimum randomness deficiency function can assume all shapes over their full domain (improving an independent unpublished result of Levin on the former function of the early 70s, and extending a partial result of V’yugin on the latter function of the late 80s and also recent results on prediction loss measured by “snooping curves”). We give an explicit realization of optimal twopart codes at all levels of model complexity. We determine the (un)computability properties of the various functions and “algorithmic sufficient statistic ” considered. In our setting the models are finite sets, but the analysis is valid, up to logarithmic additive terms, for the model class of computable probability density functions, or the model class of total recursive functions. 1
Causal models as minimal descriptions of multivariate systems. http://parallel.vub.ac.be/∼jan
, 2006
"... ABSTRACT. By applying the minimality principle for model selection, one should seek the model that describes the data by a code of minimal length. Learning is viewed as data compression that exploits the regularities or qualitative properties found in the data, in order to build a model containing t ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
ABSTRACT. By applying the minimality principle for model selection, one should seek the model that describes the data by a code of minimal length. Learning is viewed as data compression that exploits the regularities or qualitative properties found in the data, in order to build a model containing the meaningful information. The theory of causal modeling can be interpreted by this approach. The regularities are the conditional independencies reducing a factorization and the vstructure regularities. In the absence of other regularities, a causal model is faithful and offers a minimal description of a probability distribution. The causal interpretation of a faithful Bayesian network is motivated by the canonical representation it offers and faithfulness. A causal model decomposes the distribution into independent atomic blocks and is able to explain all qualitative properties found in the data. The existence of faithful models depends on the additional regularities in the data. Local structure of the conditional probability distributions allow further compression of the model. Interfering regularities, however, generate conditional independencies that do not follow from the Markov condition. These regularities has to be incorporated into an augmented model for which the inference algorithms are adapted to take into account their influences. But for other regularities, like patterns in a string, causality does not offer a modeling framework that leads to a minimal description. 1
Individual Communication Complexity
 In Proc. STACS, LNCS
, 2004
"... We initiate the theory of communication complexity of individual inputs held by the agents, rather than worstcase or averagecase. We consider total, partial, and partially correct protocols, oneway versus twoway, with and without help bits. ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
We initiate the theory of communication complexity of individual inputs held by the agents, rather than worstcase or averagecase. We consider total, partial, and partially correct protocols, oneway versus twoway, with and without help bits.