Results 1  10
of
32
Akaike’s information criterion and recent developments in information complexity
 Journal of Mathematical Psychology
"... criterion (AIC). Then, we present some recent developments on a new entropic or information complexity (ICOMP) criterion of Bozdogan (1988a, 1988b, 1990, 1994d, 1996, 1998a, 1998b) for model selection. A rationale for ICOMP as a model selection criterion is that it combines a badnessoffit term (su ..."
Abstract

Cited by 112 (9 self)
 Add to MetaCart
criterion (AIC). Then, we present some recent developments on a new entropic or information complexity (ICOMP) criterion of Bozdogan (1988a, 1988b, 1990, 1994d, 1996, 1998a, 1998b) for model selection. A rationale for ICOMP as a model selection criterion is that it combines a badnessoffit term (such as minus twice the maximum log likelihood) with a measure of complexity of a model differently than AIC, or its variants, by taking into account the interdependencies of the parameter estimates as well as the dependencies of the model residuals. We operationalize the general form of ICOMP based on the quantification of the concept of overall model complexity in terms of the estimated inverseFisher information matrix. This approach results in an approximation to the sum of two KullbackLeibler distances. Using the correlational form of the complexity, we further provide yet another form of ICOMP to take into account the interdependencies (i.e., correlations) among the parameter estimates of the model. Later, we illustrate the practical utility and the importance of this new model selection criterion by providing several
Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity
 IEEE Transactions on Information Theory
, 1998
"... The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic conditi ..."
Abstract

Cited by 78 (8 self)
 Add to MetaCart
The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be applied is encapsulated as the Fundamental Inequality, which in broad terms states that the principle is valid when the data are random, relative to every contemplated hypothesis and also these hypotheses are random relative to the (universal) prior. Basically, the ideal principle states that the prior probability associated with the hypothesis should be given by the algorithmic universal probability, and the sum of the log universal probability of the model plus the log of the probability of the data given the model should be minimized. If we restrict the model class to the finite sets then application of the ideal principle turns into Kolmogorov's mi...
Discovering Neural Nets With Low Kolmogorov Complexity And High Generalization Capability
 Neural Networks
, 1997
"... Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity&quo ..."
Abstract

Cited by 54 (30 self)
 Add to MetaCart
(Show Context)
Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most previous approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper addresses both issues. It first reviews some basic concepts of algorithmic complexity theory relevant to machine learning, and how the SolomonoffLevin distribution (or universal prior) deals with the prior problem. The universal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability. The method is based on Levin complexity (a timebounded generalization of Kolmogorov comple...
Kolmogorov’s structure functions and model selection
 IEEE Trans. Inform. Theory
"... approach to statistics and model selection. Let data be finite binary strings and models be finite sets of binary strings. Consider model classes consisting of models of given maximal (Kolmogorov) complexity. The “structure function ” of the given data expresses the relation between the complexity l ..."
Abstract

Cited by 53 (16 self)
 Add to MetaCart
(Show Context)
approach to statistics and model selection. Let data be finite binary strings and models be finite sets of binary strings. Consider model classes consisting of models of given maximal (Kolmogorov) complexity. The “structure function ” of the given data expresses the relation between the complexity level constraint on a model class and the least logcardinality of a model in the class containing the data. We show that the structure function determines all stochastic properties of the data: for every constrained model class it determines the individual bestfitting model in the class irrespective of whether the “true ” model is in the model class considered or not. In this setting, this happens with certainty, rather than with high probability as is in the classical case. We precisely quantify the goodnessoffit of an individual model with respect to individual data. We show that—within the obvious constraints—every graph is realized by the structure function of some data. We determine the (un)computability properties of the various functions contemplated and of the “algorithmic minimal sufficient statistic.” Index Terms— constrained minimum description length (ML) constrained maximum likelihood (MDL) constrained bestfit model selection computability lossy compression minimal sufficient statistic nonprobabilistic statistics Kolmogorov complexity, Kolmogorov Structure function prediction sufficient statistic
Fifty Years of Shannon Theory
, 1998
"... A brief chronicle is given of the historical development of the central problems in the theory of fundamental limits of data compression and reliable communication. ..."
Abstract

Cited by 49 (1 self)
 Add to MetaCart
A brief chronicle is given of the historical development of the central problems in the theory of fundamental limits of data compression and reliable communication.
Algorithmic Theories Of Everything
, 2000
"... The probability distribution P from which the history of our universe is sampled represents a theory of everything or TOE. We assume P is formally describable. Since most (uncountably many) distributions are not, this imposes a strong inductive bias. We show that P(x) is small for any universe x lac ..."
Abstract

Cited by 36 (15 self)
 Add to MetaCart
The probability distribution P from which the history of our universe is sampled represents a theory of everything or TOE. We assume P is formally describable. Since most (uncountably many) distributions are not, this imposes a strong inductive bias. We show that P(x) is small for any universe x lacking a short description, and study the spectrum of TOEs spanned by two Ps, one reflecting the most compact constructive descriptions, the other the fastest way of computing everything. The former derives from generalizations of traditional computability, Solomonoff’s algorithmic probability, Kolmogorov complexity, and objects more random than Chaitin’s Omega, the latter from Levin’s universal search and a natural resourceoriented postulate: the cumulative prior probability of all x incomputable within time t by this optimal algorithm should be 1/t. Between both Ps we find a universal cumulatively enumerable measure that dominates traditional enumerable measures; any such CEM must assign low probability to any universe lacking a short enumerating program. We derive Pspecific consequences for evolving observers, inductive reasoning, quantum physics, philosophy, and the expected duration of our universe.
Algorithmic Complexity and Stochastic Properties of Finite Binary Sequences
, 1999
"... This paper is a survey of concepts and results related to simple Kolmogorov complexity, prefix complexity and resourcebounded complexity. We also consider a new type of complexity statistical complexity closely related to mathematical statistics. Unlike other discoverers of algorithmic complexit ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
This paper is a survey of concepts and results related to simple Kolmogorov complexity, prefix complexity and resourcebounded complexity. We also consider a new type of complexity statistical complexity closely related to mathematical statistics. Unlike other discoverers of algorithmic complexity, A. N. Kolmogorov's leading motive was developing on its basis a mathematical theory more adequately substantiating applications of probability theory, mathematical statistics and information theory. Kolmogorov wanted to deduce properties of a random object from its complexity characteristics without use of the notion of probability. In the first part of this paper we present several results in this direction. Though the subsequent development of algorithmic complexity and randomness was different, algorithmic complexity has successful applications in a traditional probabilistic framework. In the second part of the paper we consider applications to the estimation of parameters and the definition of Bernoulli sequences. All considerations have finite combinatorial character. 1.
Discovering Problem Solutions with Low Kolmogorov Complexity and High Generalization Capability
 MACHINE LEARNING: PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE
, 1994
"... Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data (! Occam's razor). Most practical implementations, however, use measures for "simplicity" ..."
Abstract

Cited by 16 (8 self)
 Add to MetaCart
Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data (! Occam's razor). Most practical implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most previous approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper addresses both issues. It first reviews some basic concepts of algorithmic complexity theory relevant to machine learning, and how the SolomonoffLevin distribution (or universal prior) deals with the prior problem. The universal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability. The method is based on Levin complexity (a timebounded generalization of Kolmogorov complexity) and...
Kolmogorov’s structure functions with an application to the foundations of model selection
 In Proc. 43rd Symposium on Foundations of Computer Science
, 2002
"... We vindicate, for the first time, the rightness of the original “structure function”, proposed by Kolmogorov in 1974, by showing that minimizing a twopart code consisting of a model subject to (Kolmogorov) complexity constraints, together with a datatomodel code, produces a model of best fit (for ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
(Show Context)
We vindicate, for the first time, the rightness of the original “structure function”, proposed by Kolmogorov in 1974, by showing that minimizing a twopart code consisting of a model subject to (Kolmogorov) complexity constraints, together with a datatomodel code, produces a model of best fit (for which the data is maximally “typical”). The method thus separates all possible model information from the remaining accidental information. This result gives a foundation for MDL, and related methods, in model selection. Settlement of this longstanding question is the more remarkable since the minimal randomness deficiency function (measuring maximal “typicality”) itself cannot be monotonically approximated, but the shortest twopart code can. We furthermore show that both the structure function and the minimum randomness deficiency function can assume all shapes over their full domain (improving an independent unpublished result of Levin on the former function of the early 70s, and extending a partial result of V’yugin on the latter function of the late 80s and also recent results on prediction loss measured by “snooping curves”). We give an explicit realization of optimal twopart codes at all levels of model complexity. We determine the (un)computability properties of the various functions and “algorithmic sufficient statistic ” considered. In our setting the models are finite sets, but the analysis is valid, up to logarithmic additive terms, for the model class of computable probability density functions, or the model class of total recursive functions. 1
UPCROSSING INEQUALITIES FOR STATIONARY SEQUENCES AND APPLICATIONS
, 2006
"... Abstract. Let g be a function which assigns to each stationary process (Xn) ∞ n=1 and to each sample X1... Xn of the process a real number g(X1,...,Xn), which may also depend on the distribution of (Xn). We obtain effective bounds on the probability that the sequence (g(X1,...,Xn)) ∞ n=1 crosses a ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Let g be a function which assigns to each stationary process (Xn) ∞ n=1 and to each sample X1... Xn of the process a real number g(X1,...,Xn), which may also depend on the distribution of (Xn). We obtain effective bounds on the probability that the sequence (g(X1,...,Xn)) ∞ n=1 crosses a fixed interval some number of times in terms of a quantity measuring the “average subadditivity ” of g. As applications we derive universal upcrossing inequalities for Kingman’s subadditive ergodic theorem, the ShannonMcMillanBreiman theorem and the Kolmogorov complexity statistic. 1.