Results 1 
6 of
6
Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity
 IEEE Transactions on Information Theory
, 1998
"... The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition un ..."
Abstract

Cited by 67 (7 self)
 Add to MetaCart
The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be applied is encapsulated as the Fundamental Inequality, which in broad terms states that the principle is valid when the data are random, relative to every contemplated hypothesis and also these hypotheses are random relative to the (universal) prior. Basically, the ideal principle states that the prior probability associated with the hypothesis should be given by the algorithmic universal probability, and the sum of the log universal probability of the model plus the log of the probability of the data given the model should be minimized. If we restrict the model class to the finite sets then application of the ideal principle turns into Kolmogorov's mi...
Document Classification Using a Finite Mixture Model
 In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics
, 1997
"... We propose a new method of classifying documents into categories. We define for each category a finite mixture model based on soft clustering of words. We treat the problem of classifying documents as that of conducting statistical hypothesis testing over finite mixture models, and employ the EM alg ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
We propose a new method of classifying documents into categories. We define for each category a finite mixture model based on soft clustering of words. We treat the problem of classifying documents as that of conducting statistical hypothesis testing over finite mixture models, and employ the EM algorithm to efficiently estimate parameters in a finite mixture model. Exper imental results indicate that our method outperforms existing methods.
A Probabilistic Approach to Lexical Semantic Knowledge Acquisition and Structural Disambiguation
, 1998
"... Structural disambiguation in sentence analysis is still a central problem in natural language processing. Past researches have verified that using lexical semantic knowledge can, to a quite large extent, cope with this problem. Although there have been many studies conducted in the past to address t ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
Structural disambiguation in sentence analysis is still a central problem in natural language processing. Past researches have verified that using lexical semantic knowledge can, to a quite large extent, cope with this problem. Although there have been many studies conducted in the past to address the lexical knowledge acquisition problem, further investigation, especially that based on a principled methodology is still needed, and this is, in fact, the problem I address in this thesis.
On Prediction by Data Compression
 In 9th European Conference on Machine Learning. Lecture Notes in Artificial Intelligence
, 1997
"... . Traditional wisdom has it that the better a theory compresses the learning data concerning some phenomenon under investigation, the better we learn, generalize, and the better the theory predicts unknown data. This belief is vindicated in practice but apparently has not been rigorously proved in a ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
. Traditional wisdom has it that the better a theory compresses the learning data concerning some phenomenon under investigation, the better we learn, generalize, and the better the theory predicts unknown data. This belief is vindicated in practice but apparently has not been rigorously proved in a general setting. Making these ideas rigorous involves the length of the shortest effective description of an individual object: its Kolmogorov complexity. In a previous paper we have shown that optimal compression is almost always a best strategy in hypotheses identification (an ideal form of the minimum description length (MDL) principle). Whereas the single best hypothesis does not necessarily give the best prediction, we demonstrate that nonetheless compression is almost always the best strategy in prediction methods in the style of R. Solomonoff. 1 Introduction Given a body of data concerning some phenomenon under investigation, we want to select the most plausible hypothesis from amon...
Learning concise models of human activity from ambient video via a structureinducing Mstep estimator
, 1997
"... We introduce a method for structure discovery in data and use it to learn a normative theory about the behavior of the visual world from coarse image representations. The theory takes the form of a concise probabilistic automaton  specifically, a continuousoutput hidden Markov model (HMM)  but ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
We introduce a method for structure discovery in data and use it to learn a normative theory about the behavior of the visual world from coarse image representations. The theory takes the form of a concise probabilistic automaton  specifically, a continuousoutput hidden Markov model (HMM)  but the induction method applies generally to any conditional probability model. The learning algorithm introduces and exploits an entropic prior for fast, simultaneous estimation of model structure and parameters. Although not motivated as such, the prior and its maximum a posteriori (MAP) estimator can be understood as an exact formulation of minimum description length (MDL) for Bayesian point estimation; we present an exact solution for the MAP estimator which thus folds MDL into the Mstep of expectationmaximization (EM) algorithms. Consequently there is no speculative or wasted computation as in searchbased MDL approaches. In contrast to conventionally trained HMMs, entropically trained mod...
Simplicity, Information, Kolmogorov Complexity, and Prediction
, 1998
"... In contrast to statistical entropy which measures the quantity of information in an average object of a given probabilistic ensemble, Kolmogorov complexity is the quantity of absolute information in an individual object. It is a novel notion of randomness and resolves problems in probability the ..."
Abstract
 Add to MetaCart
In contrast to statistical entropy which measures the quantity of information in an average object of a given probabilistic ensemble, Kolmogorov complexity is the quantity of absolute information in an individual object. It is a novel notion of randomness and resolves problems in probability theory, statistical information theory, and philosophy.