Results 1 -
5 of
5
Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity
- IEEE Transactions on Information Theory
, 1998
"... The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition un ..."
Abstract
-
Cited by 60 (7 self)
- Add to MetaCart
The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be applied is encapsulated as the Fundamental Inequality, which in broad terms states that the principle is valid when the data are random, relative to every contemplated hypothesis and also these hypotheses are random relative to the (universal) prior. Basically, the ideal principle states that the prior probability associated with the hypothesis should be given by the algorithmic universal probability, and the sum of the log universal probability of the model plus the log of the probability of the data given the model should be minimized. If we restrict the model class to the finite sets then application of the ideal principle turns into Kolmogorov's mi...
Document Classification Using a Finite Mixture Model
- In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics
, 1997
"... We propose a new method of classifying documents into categories. We define for each category a finite mixture model based on soft clustering of words. We treat the problem of classifying documents as that of conducting statistical hypothesis testing over finite mixture models, and employ the EM alg ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
We propose a new method of classifying documents into categories. We define for each category a finite mixture model based on soft clustering of words. We treat the problem of classifying documents as that of conducting statistical hypothesis testing over finite mixture models, and employ the EM algorithm to efficiently estimate parameters in a finite mixture model. Exper- imental results indicate that our method outperforms existing methods.
On Prediction by Data Compression
- In 9th European Conference on Machine Learning. Lecture Notes in Artificial Intelligence
, 1997
"... . Traditional wisdom has it that the better a theory compresses the learning data concerning some phenomenon under investigation, the better we learn, generalize, and the better the theory predicts unknown data. This belief is vindicated in practice but apparently has not been rigorously proved in a ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
. Traditional wisdom has it that the better a theory compresses the learning data concerning some phenomenon under investigation, the better we learn, generalize, and the better the theory predicts unknown data. This belief is vindicated in practice but apparently has not been rigorously proved in a general setting. Making these ideas rigorous involves the length of the shortest effective description of an individual object: its Kolmogorov complexity. In a previous paper we have shown that optimal compression is almost always a best strategy in hypotheses identification (an ideal form of the minimum description length (MDL) principle). Whereas the single best hypothesis does not necessarily give the best prediction, we demonstrate that nonetheless compression is almost always the best strategy in prediction methods in the style of R. Solomonoff. 1 Introduction Given a body of data concerning some phenomenon under investigation, we want to select the most plausible hypothesis from amon...
Learning concise models of human activity from ambient video via a structure-inducing M-step estimator
, 1997
"... We introduce a method for structure discovery in data and use it to learn a normative theory about the behavior of the visual world from coarse image representations. The theory takes the form of a concise probabilistic automaton -- specifically, a continuous-output hidden Markov model (HMM) -- but ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
We introduce a method for structure discovery in data and use it to learn a normative theory about the behavior of the visual world from coarse image representations. The theory takes the form of a concise probabilistic automaton -- specifically, a continuous-output hidden Markov model (HMM) -- but the induction method applies generally to any conditional probability model. The learning algorithm introduces and exploits an entropic prior for fast, simultaneous estimation of model structure and parameters. Although not motivated as such, the prior and its maximum a posteriori (MAP) estimator can be understood as an exact formulation of minimum description length (MDL) for Bayesian point estimation; we present an exact solution for the MAP estimator which thus folds MDL into the M-step of expectation-maximization (EM) algorithms. Consequently there is no speculative or wasted computation as in search-based MDL approaches. In contrast to conventionally trained HMMs, entropically trained mod...
Simplicity, Information, Kolmogorov Complexity, and Prediction
, 1998
"... In contrast to statistical entropy which measures the quantity of information in an average object of a given probabilistic ensemble, Kolmogorov complexity is the quantity of absolute information in an individual object. It is a novel notion of randomness and resolves problems in probability the ..."
Abstract
- Add to MetaCart
In contrast to statistical entropy which measures the quantity of information in an average object of a given probabilistic ensemble, Kolmogorov complexity is the quantity of absolute information in an individual object. It is a novel notion of randomness and resolves problems in probability theory, statistical information theory, and philosophy.

