Results 11 - 20
of
45
Complexity Distortion Theory
- Proceedings 1997 IEEE International Symposium on Information Theory
, 1997
"... We investigate the efficiency of lossy algorithmic representations of information and show that "Complexity Distortion" is asymptotically equivalent to Rate Distortion for stationary ergodic sources. I. Introduction The concept of efficiently representing information dates back to the late 40's wi ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
We investigate the efficiency of lossy algorithmic representations of information and show that "Complexity Distortion" is asymptotically equivalent to Rate Distortion for stationary ergodic sources. I. Introduction The concept of efficiently representing information dates back to the late 40's with the pioneering work of C. E. Shannon. Since then, compression has been a continuously growing research field with numerous fundamental contributions to communication system design and implemention. Several diverse techniques have emerged recently, including fractals, modelbased coding, and sophisticated programmable designs that allow flexibility and extensibility via downloadable code [2]. Most of these techniques cannot be analyzed within traditional theoretical frameworks. We propose a new theory, called Complexity Distortion Theory, which uses complexities or length of lossy descriptions to provide a much broader and unifying perspective on media representation. The key component of t...
General Loss Bounds for Universal Sequence Prediction
, 2001
"... The Bayesian framework is ideally suited for induction problems. The probability of observing $x_k$ at time $k$, given past observations $x_1...x_{k-1}$ can be computed with Bayes' rule if the true distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. The problem, however, is that in many cas ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
The Bayesian framework is ideally suited for induction problems. The probability of observing $x_k$ at time $k$, given past observations $x_1...x_{k-1}$ can be computed with Bayes' rule if the true distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. The problem, however, is that in many cases one does not even have a reasonable estimate of the true distribution. In order to overcome this problem a universal distribution $\xi$ is defined as a weighted sum of distributions $\mu_i\in M$, where $M$ is any countable set of distributions including $\mu$. This is a generalization of Solomonoff induction, in which $M$ is the set of all enumerable semi-measures. Systems which predict $y_k$, given $x_1...x_{k-1}$ and which receive loss $l_{x_k y_k}$ if $x_k$ is the true next symbol of the sequence are considered. It is proven that using the universal $\xi$ as a prior is nearly as good as using the unknown true distribution $\mu$. Furthermore, games of chance, defined as a sequence of bets, observations, and rewards are studied. The time needed to reach the winning zone is estimated. Extensions to arbitrary alphabets, partial and delayed prediction, and more active systems are discussed.
Discovering patterns to extract protein–protein interactions from the literature
- Part II. Bioinformatics
, 2005
"... doi:10.1093/bioinformatics/bti493 ..."
Compact genetic codes as a search strategy of evolutionary processes
- In Foundations of Genetic Algorithms 8 (FOGA VIII), LNCS
, 2005
"... Abstract. The choice of genetic representation crucially determines the capability of evolutionary processes to find complex solutions in which many variables interact. The question is how good genetic representations can be found and how they can be adapted online to account for what can be learned ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Abstract. The choice of genetic representation crucially determines the capability of evolutionary processes to find complex solutions in which many variables interact. The question is how good genetic representations can be found and how they can be adapted online to account for what can be learned about the structure of the problem from previous samples. We address these questions in a scenario that we term indirect Estimation-of-Distribution: We consider a decorrelated search distribution (mutational variability) on a variable length genotype space. A one-to-one encoding onto the phenotype space then needs to induce an adapted phenotypic variability incorporating the dependencies between phenotypic variables that have been observed successful previously. Formalizing this in the framework of Estimation-of-Distribution Algorithms, an adapted phenotypic variability can be characterized as minimizing the Kullback-Leibler divergence to a population of previously selected individuals (parents). Our core result is a relation between the Kullback-Leibler divergence and the description length of the encoding in the specific scenario, stating that compact codes provide a way to minimize this divergence. A proposed class of Compression Evolutionary Algorithms and preliminary experiments with an L-system compression scheme illustrate the approach. We also discuss the implications for the self-adaptive evolution of genetic representations on the basis of neutrality (σ-evolution) towards compact codes. 1
Unsupervised Lexical Learning as Inductive Inference
, 2000
"... To learn a language, the learners must first learn its words, the essential building blocks for utterances. The difficulty in learning words lies in the unavailability of explicit word boundaries in speech input. The learners have to infer lexical items with some innately endowed learning mechanism( ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
To learn a language, the learners must first learn its words, the essential building blocks for utterances. The difficulty in learning words lies in the unavailability of explicit word boundaries in speech input. The learners have to infer lexical items with some innately endowed learning mechanism(s) for regularity detection- regularities in the speech normally indicate word patterns. With respect to Zipf's least-effort principle and Chomsky's thoughts on the minimality of grammar for human language, we hypothesise a cognitive mechanism underlying language learning that seeks for the least-effort representation for input data. Accordingly, lexical learning is to infer the minimal-cost representation for the input under the constraint of permissible representation for lexical items. The main theme of this thesis is to examine how far this learning mechanism can go in unsupervised lexical learning from real language data without any pre-defined (e.g., prosodic and phonotactic) cues, but entirely resting on statistical induction of structural patterns for the most economic representation for the data. We first review
On the Existence and Convergence of Computable Universal Priors
- In Proc. 14th International Conf. on Algorithmic Learning Theory (ALT-2003), volume 2842 of LNAI
, 2003
"... Solomonoff unified Occam's razor and Epicurus' principle of multiple explanations to one elegant, formal, universal theory of inductive inference, which initiated the field of algorithmic information theory. His central result is that the posterior of his universal semimeasure M converges rapidly to ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
Solomonoff unified Occam's razor and Epicurus' principle of multiple explanations to one elegant, formal, universal theory of inductive inference, which initiated the field of algorithmic information theory. His central result is that the posterior of his universal semimeasure M converges rapidly to the true sequence generating posterior μ, if the latter is computable. Hence, M is eligible as a universal predictor in case of unknown μ. We investigate the existence and convergence of computable universal (semi)measures for a hierarchy of computability classes: finitely computable, estimable, enumerable, and approximable. For instance, M is known...
Model Selection by Normalized Maximum Likelihood
, 2005
"... The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a data set by extracting useful information in the data apart from random noise. The goal of model selection is to identify the model, from a set of candidate models, that permits the shortest description length (code) of the data. Since Rissanen originally formalized the problem using the crude ‘two-part code ’ MDL method in the 1970s, many significant strides have been made, especially in the 1990s, with the culmination of the development of the refined ‘universal code’ MDL method, dubbed Normalized Maximum Likelihood (NML). It represents an elegant solution to the model selection problem. The present paper provides a tutorial review on these latest developments with a special focus on NML. An application example of NML in cognitive modeling is also provided.
Compact representations as a search strategy: compression edas
- Theoretical Compututer Scicience
"... The choice of representation crucially determines the capability of search processes to find complex solutions in which many variables interact. The question is how good representations can be found and how they can be adapted online to account for what can be learned about the structure of the prob ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
The choice of representation crucially determines the capability of search processes to find complex solutions in which many variables interact. The question is how good representations can be found and how they can be adapted online to account for what can be learned about the structure of the problem from previous samples. We address these questions in a scenario that we term indirect Estimationof-Distribution: We consider a decorrelated search distribution (mutational variability) on a variable length genotype space. A one-to-one encoding onto the phenotype space then needs to induce an adapted phenotypic search distribution incorporating the dependencies between phenotypic variables that have been observed successful previously. Formalizing this in the framework of Estimation-of-Distribution Algorithms, an adapted phenotypic search distribution can be characterized as minimizing the Kullback-Leibler divergence to a population of previously selected samples (parents). The paper derives a relation between this Kullback-Leibler divergence and the description length of the encoding, stating that compact representations provide a way to minimize the divergence. A proposed class of Compression Evolutionary Algorithms and experiments with an grammar-based compression scheme illustrate the new concept. Key words: Estimation-of-Distribution Algorithms, factorial representations, compression, minimal description length, Evolutionary Algorithms, genotype-phenotype mapping. 1
Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet
- In
, 2002
"... The Bayesian framework is ideally suited for induction problems. The probability of observing $x_t$ at time $t$, given past observations $x_1...x_{t-1}$ can be computed with Bayes' rule if the true generating distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. The problem, however, is that ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
The Bayesian framework is ideally suited for induction problems. The probability of observing $x_t$ at time $t$, given past observations $x_1...x_{t-1}$ can be computed with Bayes' rule if the true generating distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. The problem, however, is that in many cases one does not even have a reasonable guess of the true distribution. In order to overcome this problem a universal (or mixture) distribution $\xi$ is defined as a weighted sum or integral of distributions $ u\!\in\!\M$, where $\M$ is any countable or continuous set of distributions including $\mu$. This is a generalization of Solomonoff induction, in which $\M$ is the set of all enumerable semi-measures. It is shown for several performance measures that using the universal $\xi$ as a prior is nearly as good as using the unknown true distribution $\mu$. In a sense, this solves the problem of the unknown prior in a universal way. All results are obtained for general finite alphabet. Convergence of $\xi$ to $\mu$ in a conditional mean squared sense and of $\xi/\mu\to 1$ with $\mu$ probability $1$ is proven. The number of additional errors $E_\xi$ made by the optimal universal prediction scheme based on $\xi$ minus the number of errors $E_\mu$ of the optimal informed prediction scheme based on $\mu$ is proven to be bounded by $O(\sqrt{E_\mu})$. The prediction framework is generalized to arbitrary loss functions. A system is allowed to take an action $y_t$, given $x_1...x_{t-1}$ and receives loss $\ell_{x_t y_t}$ if $x_t$ is the next symbol of the sequence. No assumptions on $\ell$ are necessary, besides boundedness. Optimal universal $\Lambda_\xi$ and optimal informed $\Lambda_\mu$ prediction schemes are defined and the total loss of $\Lambda_\xi$ is bounded in terms of the total loss of $\Lambda_\mu$, similar to the error bounds. We show that the bounds are tight and that no other predictor can lead to smaller bounds. Furthermore, for various performance measures we show Pareto-optimality of $\xi$ in the sense that there is no other predictor which performs better or equal in all environments $ u\in\M$ and strictly better in at least one. So, optimal predictors can (w.r.t.\ to most performance measures in expectation) be based on the mixture $\xi$. Finally we give an Occam's razor argument that Solomonoff's choice $w_ u\sim 2^{-K( u)}$ for the weights is optimal, where $K( u)$ is the length of the shortest program describing $ u$. Furthermore, games of chance, defined as a sequence of bets, observations, and rewards are studied. The average profit achieved by the $\Lambda_\xi$ scheme rapidly converges to the best possible profit. The time needed to reach the winning zone is proportional to the relative entropy of $\mu$ and $\xi$. The prediction schemes presented here are compared to the weighted majority algorithm(s). Although the algorithms, the settings, and the proofs are quite different the bounds of both schemes have a very similar structure. Extensions to infinite alphabets, partial, delayed and probabilistic prediction, classification, and more active systems are briefly discussed.
Strong asymptotic assertions for discrete MDL in regression and classification
- In Benelearn 2005 (Ann. Machine Learning Conf. of Belgium and the
, 2005
"... We study the properties of the MDL (or maximum penalized complexity) estimator for Regression and Classification, where the underlying model class is countable. We show in particular a finite bound on the Hellinger losses under the only assumption that there is a “true ” model contained in the class ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
We study the properties of the MDL (or maximum penalized complexity) estimator for Regression and Classification, where the underlying model class is countable. We show in particular a finite bound on the Hellinger losses under the only assumption that there is a “true ” model contained in the class. This implies almost sure convergence of the predictive distribution to the true one at a fast rate. It corresponds to Solomonoff’s central theorem of universal induction, however with a bound that is exponentially larger.

