Results 11  20
of
50
Applying MDL to Learning Best Model Granularity
, 1994
"... The Minimum Description Length (MDL) principle is solidly based on a provably ideal method of inference using Kolmogorov complexity. We test how the theory behaves in practice on a general problem in model selection: that of learning the best model granularity. The performance of a model depends ..."
Abstract

Cited by 20 (8 self)
 Add to MetaCart
The Minimum Description Length (MDL) principle is solidly based on a provably ideal method of inference using Kolmogorov complexity. We test how the theory behaves in practice on a general problem in model selection: that of learning the best model granularity. The performance of a model depends critically on the granularity, for example the choice of precision of the parameters. Too high precision generally involves modeling of accidental noise and too low precision may lead to confusion of models that should be distinguished. This precision is often determined ad hoc. In MDL the best model is the one that most compresses a twopart code of the data set: this embodies "Occam's Razor." In two quite different experimental settings the theoretical value determined using MDL coincides with the best value found experimentally. In the first experiment the task is to recognize isolated handwritten characters in one subject's handwriting, irrespective of size and orientation. Base...
Discovering patterns to extract protein–protein interactions from the literature
 Part II. Bioinformatics
, 2005
"... doi:10.1093/bioinformatics/bti493 ..."
Application of Kolmogorov complexity and universal codes to identity testing and nonparametric testing of serial independence for time series
, 2006
"... ..."
General Loss Bounds for Universal Sequence Prediction
, 2001
"... The Bayesian framework is ideally suited for induction problems. The probability of observing $x_k$ at time $k$, given past observations $x_1...x_{k1}$ can be computed with Bayes' rule if the true distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. The problem, however, is that in many cas ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
The Bayesian framework is ideally suited for induction problems. The probability of observing $x_k$ at time $k$, given past observations $x_1...x_{k1}$ can be computed with Bayes' rule if the true distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. The problem, however, is that in many cases one does not even have a reasonable estimate of the true distribution. In order to overcome this problem a universal distribution $\xi$ is defined as a weighted sum of distributions $\mu_i\in M$, where $M$ is any countable set of distributions including $\mu$. This is a generalization of Solomonoff induction, in which $M$ is the set of all enumerable semimeasures. Systems which predict $y_k$, given $x_1...x_{k1}$ and which receive loss $l_{x_k y_k}$ if $x_k$ is the true next symbol of the sequence are considered. It is proven that using the universal $\xi$ as a prior is nearly as good as using the unknown true distribution $\mu$. Furthermore, games of chance, defined as a sequence of bets, observations, and rewards are studied. The time needed to reach the winning zone is estimated. Extensions to arbitrary alphabets, partial and delayed prediction, and more active systems are discussed.
Model Selection by Normalized Maximum Likelihood
, 2005
"... The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a data set by extracting useful information in the data apart from random noise. The goal of model selection is to identify the model, from a set of candidate models, that permits the shortest description length (code) of the data. Since Rissanen originally formalized the problem using the crude ‘twopart code ’ MDL method in the 1970s, many significant strides have been made, especially in the 1990s, with the culmination of the development of the refined ‘universal code’ MDL method, dubbed Normalized Maximum Likelihood (NML). It represents an elegant solution to the model selection problem. The present paper provides a tutorial review on these latest developments with a special focus on NML. An application example of NML in cognitive modeling is also provided.
Compact genetic codes as a search strategy of evolutionary processes
 In Foundations of Genetic Algorithms 8 (FOGA VIII), LNCS
, 2005
"... Abstract. The choice of genetic representation crucially determines the capability of evolutionary processes to find complex solutions in which many variables interact. The question is how good genetic representations can be found and how they can be adapted online to account for what can be learned ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
Abstract. The choice of genetic representation crucially determines the capability of evolutionary processes to find complex solutions in which many variables interact. The question is how good genetic representations can be found and how they can be adapted online to account for what can be learned about the structure of the problem from previous samples. We address these questions in a scenario that we term indirect EstimationofDistribution: We consider a decorrelated search distribution (mutational variability) on a variable length genotype space. A onetoone encoding onto the phenotype space then needs to induce an adapted phenotypic variability incorporating the dependencies between phenotypic variables that have been observed successful previously. Formalizing this in the framework of EstimationofDistribution Algorithms, an adapted phenotypic variability can be characterized as minimizing the KullbackLeibler divergence to a population of previously selected individuals (parents). Our core result is a relation between the KullbackLeibler divergence and the description length of the encoding in the specific scenario, stating that compact codes provide a way to minimize this divergence. A proposed class of Compression Evolutionary Algorithms and preliminary experiments with an Lsystem compression scheme illustrate the approach. We also discuss the implications for the selfadaptive evolution of genetic representations on the basis of neutrality (σevolution) towards compact codes. 1
Unsupervised Lexical Learning as Inductive Inference
, 2000
"... To learn a language, the learners must first learn its words, the essential building blocks for utterances. The difficulty in learning words lies in the unavailability of explicit word boundaries in speech input. The learners have to infer lexical items with some innately endowed learning mechanism( ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
To learn a language, the learners must first learn its words, the essential building blocks for utterances. The difficulty in learning words lies in the unavailability of explicit word boundaries in speech input. The learners have to infer lexical items with some innately endowed learning mechanism(s) for regularity detection regularities in the speech normally indicate word patterns. With respect to Zipf's leasteffort principle and Chomsky's thoughts on the minimality of grammar for human language, we hypothesise a cognitive mechanism underlying language learning that seeks for the leasteffort representation for input data. Accordingly, lexical learning is to infer the minimalcost representation for the input under the constraint of permissible representation for lexical items. The main theme of this thesis is to examine how far this learning mechanism can go in unsupervised lexical learning from real language data without any predefined (e.g., prosodic and phonotactic) cues, but entirely resting on statistical induction of structural patterns for the most economic representation for the data. We first review
Compact representations as a search strategy: compression edas
 Theoretical Compututer Scicience
"... The choice of representation crucially determines the capability of search processes to find complex solutions in which many variables interact. The question is how good representations can be found and how they can be adapted online to account for what can be learned about the structure of the prob ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
The choice of representation crucially determines the capability of search processes to find complex solutions in which many variables interact. The question is how good representations can be found and how they can be adapted online to account for what can be learned about the structure of the problem from previous samples. We address these questions in a scenario that we term indirect EstimationofDistribution: We consider a decorrelated search distribution (mutational variability) on a variable length genotype space. A onetoone encoding onto the phenotype space then needs to induce an adapted phenotypic search distribution incorporating the dependencies between phenotypic variables that have been observed successful previously. Formalizing this in the framework of EstimationofDistribution Algorithms, an adapted phenotypic search distribution can be characterized as minimizing the KullbackLeibler divergence to a population of previously selected samples (parents). The paper derives a relation between this KullbackLeibler divergence and the description length of the encoding, stating that compact representations provide a way to minimize the divergence. A proposed class of Compression Evolutionary Algorithms and experiments with an grammarbased compression scheme illustrate the new concept. Key words: EstimationofDistribution Algorithms, factorial representations, compression, minimal description length, Evolutionary Algorithms, genotypephenotype mapping. 1
On the Existence and Convergence of Computable Universal Priors
 In Proc. 14th International Conf. on Algorithmic Learning Theory (ALT2003), volume 2842 of LNAI
, 2003
"... Solomonoff unified Occam's razor and Epicurus' principle of multiple explanations to one elegant, formal, universal theory of inductive inference, which initiated the field of algorithmic information theory. His central result is that the posterior of his universal semimeasure M converges rapidly to ..."
Abstract

Cited by 7 (7 self)
 Add to MetaCart
Solomonoff unified Occam's razor and Epicurus' principle of multiple explanations to one elegant, formal, universal theory of inductive inference, which initiated the field of algorithmic information theory. His central result is that the posterior of his universal semimeasure M converges rapidly to the true sequence generating posterior μ, if the latter is computable. Hence, M is eligible as a universal predictor in case of unknown μ. We investigate the existence and convergence of computable universal (semi)measures for a hierarchy of computability classes: finitely computable, estimable, enumerable, and approximable. For instance, M is known...