Results 11  20
of
79
Application of Kolmogorov complexity and universal codes to identity testing and nonparametric testing of serial independence for time series
, 2006
"... ..."
Model Selection by Normalized Maximum Likelihood
, 2005
"... The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a ..."
Abstract

Cited by 24 (9 self)
 Add to MetaCart
The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a data set by extracting useful information in the data apart from random noise. The goal of model selection is to identify the model, from a set of candidate models, that permits the shortest description length (code) of the data. Since Rissanen originally formalized the problem using the crude ‘twopart code ’ MDL method in the 1970s, many significant strides have been made, especially in the 1990s, with the culmination of the development of the refined ‘universal code’ MDL method, dubbed Normalized Maximum Likelihood (NML). It represents an elegant solution to the model selection problem. The present paper provides a tutorial review on these latest developments with a special focus on NML. An application example of NML in cognitive modeling is also provided.
Quantum Kolmogorov complexity based on classical descriptions
 IEEE Trans. Inform. Theory
, 2001
"... Abstract—We develop a theory of the algorithmic information in bits contained in an individual pure quantum state. This extends classical Kolmogorov complexity to the quantum domain retaining classical descriptions. Quantum Kolmogorov complexity coincides with the classical Kolmogorov complexity on ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
(Show Context)
Abstract—We develop a theory of the algorithmic information in bits contained in an individual pure quantum state. This extends classical Kolmogorov complexity to the quantum domain retaining classical descriptions. Quantum Kolmogorov complexity coincides with the classical Kolmogorov complexity on the classical domain. Quantum Kolmogorov complexity is upper bounded and can be effectively approximated from above under certain conditions. With high probability a quantum object is incompressible. Upper and lower bounds of the quantum complexity of multiple copies of individual pure quantum states are derived and may shed some light on the nocloning properties of quantum states. In the quantum situation complexity is not subadditive. We discuss some relations with “nocloning ” and “approximate cloning ” properties. Keywords — Algorithmic information theory, quantum; classical descriptions of quantum states; information theory, quantum; Kolmogorov complexity, quantum; quantum cloning. I.
Meaningful Information
, 2006
"... The information in an individual finite object (like a binary string) is commonly measured by its Kolmogorov complexity. One can divide that information into two parts: the information accounting for the useful regularity present in the object and the information accounting for the remaining accide ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
The information in an individual finite object (like a binary string) is commonly measured by its Kolmogorov complexity. One can divide that information into two parts: the information accounting for the useful regularity present in the object and the information accounting for the remaining accidental information. There can be several ways (model classes) in which the regularity is expressed. Kolmogorov has proposed the model class of finite sets, generalized later to computable probability mass functions. The resulting theory, known as Algorithmic Statistics, analyzes the algorithmic sufficient statistic when the statistic is restricted to the given model class. However, the most general way to proceed is perhaps to express the useful information as a total recursive function. The resulting measure has been called the “sophistication ” of the object. We develop the theory of recursive functions statistic, the maximum and minimum value, the existence of absolutely nonstochastic objects (that have maximal sophistication—all the information in them is meaningful and there is no residual randomness), determine its relation with the more restricted model classes of finite sets, and computable probability distributions, in particular with respect to the algorithmic (Kolmogorov) minimal sufficient statistic, the relation to the halting problem and further algorithmic properties.
Convergence and Error Bounds for Universal Prediction of Nonbinary Sequences
 Proceedings of the 12th Eurpean Conference on Machine Learning (ECML2001
, 2001
"... Solomonoff's uncomputable universal prediction scheme ß allows to predict the next symbol x k of a sequence x 1 ...x k1 for any Turing computable, but otherwise unknown, probabilistic environment µ . This scheme will be generalized to arbitrary environmental classes, which, ..."
Abstract

Cited by 22 (16 self)
 Add to MetaCart
Solomonoff's uncomputable universal prediction scheme &szlig; allows to predict the next symbol x k of a sequence x 1 ...x k1 for any Turing computable, but otherwise unknown, probabilistic environment &micro; . This scheme will be generalized to arbitrary environmental classes, which, among others, allows the construction of computable universal prediction schemes &szlig; . Convergence of &szlig; to &micro; in a conditional mean squared sense and with &micro; probability 1 is proven. It is shown that the average number of prediction errors made by the universal &szlig; scheme rapidly converges to those made by the best possible informed &micro; scheme. The schemes, theorems and proofs are given for general finite alphabet, which results in additional complications as compared to the binary case. Several extensions of the presented theory and results are outlined. They include general loss functions and bounds, games of chance, infinite alphabet, partial and delayed prediction, classification, and more active systems.
Applying MDL to Learning Best Model Granularity
, 1994
"... The Minimum Description Length (MDL) principle is solidly based on a provably ideal method of inference using Kolmogorov complexity. We test how the theory behaves in practice on a general problem in model selection: that of learning the best model granularity. The performance of a model depends ..."
Abstract

Cited by 21 (8 self)
 Add to MetaCart
The Minimum Description Length (MDL) principle is solidly based on a provably ideal method of inference using Kolmogorov complexity. We test how the theory behaves in practice on a general problem in model selection: that of learning the best model granularity. The performance of a model depends critically on the granularity, for example the choice of precision of the parameters. Too high precision generally involves modeling of accidental noise and too low precision may lead to confusion of models that should be distinguished. This precision is often determined ad hoc. In MDL the best model is the one that most compresses a twopart code of the data set: this embodies "Occam's Razor." In two quite different experimental settings the theoretical value determined using MDL coincides with the best value found experimentally. In the first experiment the task is to recognize isolated handwritten characters in one subject's handwriting, irrespective of size and orientation. Base...
Convergence and Loss Bounds for Bayesian Sequence Prediction
 In
, 2003
"... The probability of observing $x_t$ at time $t$, given past observations $x_1...x_{t1}$ can be computed with Bayes rule if the true generating distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. If $\mu$ is unknown, but known to belong to a class $M$ one can base ones prediction on the Baye ..."
Abstract

Cited by 19 (19 self)
 Add to MetaCart
(Show Context)
The probability of observing $x_t$ at time $t$, given past observations $x_1...x_{t1}$ can be computed with Bayes rule if the true generating distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. If $\mu$ is unknown, but known to belong to a class $M$ one can base ones prediction on the Bayes mix $\xi$ defined as a weighted sum of distributions $ u\in M$. Various convergence results of the mixture posterior $\xi_t$ to the true posterior $\mu_t$ are presented. In particular a new (elementary) derivation of the convergence $\xi_t/\mu_t\to 1$ is provided, which additionally gives the rate of convergence. A general sequence predictor is allowed to choose an action $y_t$ based on $x_1...x_{t1}$ and receives loss $\ell_{x_t y_t}$ if $x_t$ is the next symbol of the sequence. No assumptions are made on the structure of $\ell$ (apart from being bounded) and $M$. The Bayesoptimal prediction scheme $\Lambda_\xi$ based on mixture $\xi$ and the Bayesoptimal informed prediction scheme $\Lambda_\mu$ are defined and the total loss $L_\xi$ of $\Lambda_\xi$ is bounded in terms of the total loss $L_\mu$ of $\Lambda_\mu$. It is shown that $L_\xi$ is bounded for bounded $L_\mu$ and $L_\xi/L_\mu\to 1$ for $L_\mu\to \infty$. Convergence of the instantaneous losses is also proven.
A tutorial introduction to Bayesian models of cognitive development
"... We present an introduction to Bayesian inference as it is used in probabilistic models of cognitive development. Our goal is to provide an intuitive and accessible guide to the what, the how, and the why of the Bayesian approach: what sorts of problems and data the framework is most relevant for, an ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
(Show Context)
We present an introduction to Bayesian inference as it is used in probabilistic models of cognitive development. Our goal is to provide an intuitive and accessible guide to the what, the how, and the why of the Bayesian approach: what sorts of problems and data the framework is most relevant for, and how and why it may be useful for developmentalists. We emphasize a qualitative understanding of Bayesian inference, but also include information about additional resources for those interested in the cognitive science applications, mathematical foundations, or machine learning details in more depth. In addition, we discuss some important interpretation issues that often arise when evaluating Bayesian models in cognitive science.
General Loss Bounds for Universal Sequence Prediction
, 2001
"... The Bayesian framework is ideally suited for induction problems. The probability of observing $x_k$ at time $k$, given past observations $x_1...x_{k1}$ can be computed with Bayes' rule if the true distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. The problem, however, is that in man ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
(Show Context)
The Bayesian framework is ideally suited for induction problems. The probability of observing $x_k$ at time $k$, given past observations $x_1...x_{k1}$ can be computed with Bayes' rule if the true distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. The problem, however, is that in many cases one does not even have a reasonable estimate of the true distribution. In order to overcome this problem a universal distribution $\xi$ is defined as a weighted sum of distributions $\mu_i\in M$, where $M$ is any countable set of distributions including $\mu$. This is a generalization of Solomonoff induction, in which $M$ is the set of all enumerable semimeasures. Systems which predict $y_k$, given $x_1...x_{k1}$ and which receive loss $l_{x_k y_k}$ if $x_k$ is the true next symbol of the sequence are considered. It is proven that using the universal $\xi$ as a prior is nearly as good as using the unknown true distribution $\mu$. Furthermore, games of chance, defined as a sequence of bets, observations, and rewards are studied. The time needed to reach the winning zone is estimated. Extensions to arbitrary alphabets, partial and delayed prediction, and more active systems are discussed.