Results 1 
9 of
9
Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity
 IEEE Transactions on Information Theory
, 1998
"... The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition un ..."
Abstract

Cited by 67 (7 self)
 Add to MetaCart
The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be applied is encapsulated as the Fundamental Inequality, which in broad terms states that the principle is valid when the data are random, relative to every contemplated hypothesis and also these hypotheses are random relative to the (universal) prior. Basically, the ideal principle states that the prior probability associated with the hypothesis should be given by the algorithmic universal probability, and the sum of the log universal probability of the model plus the log of the probability of the data given the model should be minimized. If we restrict the model class to the finite sets then application of the ideal principle turns into Kolmogorov's mi...
An Entropic Estimator for Structure Discovery
, 1999
"... We introduce a novel framework for simultaneous structure and parameter learning in hiddenvariable conditional probability models, based on an entropic prior and a solution for its maximum a posteriori (MAP) estimator. The MAP estimate minimizes uncertainty in all respects: crossentropy between mo ..."
Abstract

Cited by 35 (0 self)
 Add to MetaCart
We introduce a novel framework for simultaneous structure and parameter learning in hiddenvariable conditional probability models, based on an entropic prior and a solution for its maximum a posteriori (MAP) estimator. The MAP estimate minimizes uncertainty in all respects: crossentropy between model and data; entropy of the model; entropy of the data's descriptive statistics. Iterative estimation extinguishes weakly supported parameters, compressing and sparsifying the model. Trimming operators accelerate this process by removing excess parameters and, unlike most pruning schemes, guarantee an increase in posterior probability. Entropic estimation takes a overcomplete random model and simplifies it, inducing the structure of relations between hidden and observed variables. Applied to hidden Markov models (HMMs), it finds a concise finitestate machine representing the hidden structure of a signal. We entropically model music, handwriting, and video timeseries, and show that the res...
Algorithmic Complexity and Stochastic Properties of Finite Binary Sequences
, 1999
"... This paper is a survey of concepts and results related to simple Kolmogorov complexity, prefix complexity and resourcebounded complexity. We also consider a new type of complexity statistical complexity closely related to mathematical statistics. Unlike other discoverers of algorithmic complexit ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
This paper is a survey of concepts and results related to simple Kolmogorov complexity, prefix complexity and resourcebounded complexity. We also consider a new type of complexity statistical complexity closely related to mathematical statistics. Unlike other discoverers of algorithmic complexity, A. N. Kolmogorov's leading motive was developing on its basis a mathematical theory more adequately substantiating applications of probability theory, mathematical statistics and information theory. Kolmogorov wanted to deduce properties of a random object from its complexity characteristics without use of the notion of probability. In the first part of this paper we present several results in this direction. Though the subsequent development of algorithmic complexity and randomness was different, algorithmic complexity has successful applications in a traditional probabilistic framework. In the second part of the paper we consider applications to the estimation of parameters and the definition of Bernoulli sequences. All considerations have finite combinatorial character. 1.
Ideal MDL and Its Relation To Bayesianism
"... : Statistics based inference methods like minimum message length (MML) and minimum description length (MDL), are widely applied approaches. They are the tools to use with particular machine learning praxis such as simulated annealing, genetic algorithms, genetic programming, artificial neural networ ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
: Statistics based inference methods like minimum message length (MML) and minimum description length (MDL), are widely applied approaches. They are the tools to use with particular machine learning praxis such as simulated annealing, genetic algorithms, genetic programming, artificial neural networks, and the like. These methods select the hypothesis which minimizes the sum of the length of the description of the hypothesis (also called `model') and the length of the description of the data relative to the hypothesis. Ideally, MDL uses shortest effective descriptions and is expressed in terms of Kolmogorov complexity. We derive Ideal MDL from first principles and explain correspondences and differences between MDL and Bayesian reasoning, and in particular why the latter is prone to overfitting and the former isn't. In particular we show that Ideal MDL can be reduced to the Bayesian approach using the universal prior distribution, provided the minimum description length is reached for ...
On Prediction by Data Compression
 In 9th European Conference on Machine Learning. Lecture Notes in Artificial Intelligence
, 1997
"... . Traditional wisdom has it that the better a theory compresses the learning data concerning some phenomenon under investigation, the better we learn, generalize, and the better the theory predicts unknown data. This belief is vindicated in practice but apparently has not been rigorously proved in a ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
. Traditional wisdom has it that the better a theory compresses the learning data concerning some phenomenon under investigation, the better we learn, generalize, and the better the theory predicts unknown data. This belief is vindicated in practice but apparently has not been rigorously proved in a general setting. Making these ideas rigorous involves the length of the shortest effective description of an individual object: its Kolmogorov complexity. In a previous paper we have shown that optimal compression is almost always a best strategy in hypotheses identification (an ideal form of the minimum description length (MDL) principle). Whereas the single best hypothesis does not necessarily give the best prediction, we demonstrate that nonetheless compression is almost always the best strategy in prediction methods in the style of R. Solomonoff. 1 Introduction Given a body of data concerning some phenomenon under investigation, we want to select the most plausible hypothesis from amon...
Learning concise models of human activity from ambient video via a structureinducing Mstep estimator
, 1997
"... We introduce a method for structure discovery in data and use it to learn a normative theory about the behavior of the visual world from coarse image representations. The theory takes the form of a concise probabilistic automaton  specifically, a continuousoutput hidden Markov model (HMM)  but ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
We introduce a method for structure discovery in data and use it to learn a normative theory about the behavior of the visual world from coarse image representations. The theory takes the form of a concise probabilistic automaton  specifically, a continuousoutput hidden Markov model (HMM)  but the induction method applies generally to any conditional probability model. The learning algorithm introduces and exploits an entropic prior for fast, simultaneous estimation of model structure and parameters. Although not motivated as such, the prior and its maximum a posteriori (MAP) estimator can be understood as an exact formulation of minimum description length (MDL) for Bayesian point estimation; we present an exact solution for the MAP estimator which thus folds MDL into the Mstep of expectationmaximization (EM) algorithms. Consequently there is no speculative or wasted computation as in searchbased MDL approaches. In contrast to conventionally trained HMMs, entropically trained mod...
Computational Machine Learning in Theory and Praxis
, 1995
"... In the last few decades a computational approach to machine learning has emerged based on paradigms from recursion theory and the theory of computation. Such ideas include learning in the limit, learning by enumeration, and probably approximately correct (pac) learning. These models usually are not ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
In the last few decades a computational approach to machine learning has emerged based on paradigms from recursion theory and the theory of computation. Such ideas include learning in the limit, learning by enumeration, and probably approximately correct (pac) learning. These models usually are not suitable in practical situations. In contrast, statistics based inference methods have enjoyed a long and distinguished career. Currently, Bayesian reasoning in various forms, minimum message length (MML) and minimum description length (MDL), are widely applied approaches. They are the tools to use with particular machine learning praxis such as simulated annealing, genetic algorithms, genetic programming, artificial neural networks, and the like. These statistical inference methods select the hypothesis which minimizes the sum of the length of the description of the hypothesis (also called `model') and the length of the description of the data relative to the hypothesis. It appears to us th...
Simplicity, Information, Kolmogorov Complexity, and Prediction
, 1998
"... In contrast to statistical entropy which measures the quantity of information in an average object of a given probabilistic ensemble, Kolmogorov complexity is the quantity of absolute information in an individual object. It is a novel notion of randomness and resolves problems in probability the ..."
Abstract
 Add to MetaCart
In contrast to statistical entropy which measures the quantity of information in an average object of a given probabilistic ensemble, Kolmogorov complexity is the quantity of absolute information in an individual object. It is a novel notion of randomness and resolves problems in probability theory, statistical information theory, and philosophy.