Results 1 
5 of
5
Discovering Neural Nets With Low Kolmogorov Complexity And High Generalization Capability
 Neural Networks
, 1997
"... Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universali ..."
Abstract

Cited by 49 (30 self)
 Add to MetaCart
Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most previous approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper addresses both issues. It first reviews some basic concepts of algorithmic complexity theory relevant to machine learning, and how the SolomonoffLevin distribution (or universal prior) deals with the prior problem. The universal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability. The method is based on Levin complexity (a timebounded generalization of Kolmogorov comple...
Hierarchies Of Generalized Kolmogorov Complexities And Nonenumerable Universal Measures Computable In The Limit
 INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE
, 2000
"... The traditional theory of Kolmogorov complexity and algorithmic probability focuses on monotone Turing machines with oneway writeonly output tape. This naturally leads to the universal enumerable SolomonoLevin measure. Here we introduce more general, nonenumerable but cumulatively enumerable m ..."
Abstract

Cited by 38 (20 self)
 Add to MetaCart
The traditional theory of Kolmogorov complexity and algorithmic probability focuses on monotone Turing machines with oneway writeonly output tape. This naturally leads to the universal enumerable SolomonoLevin measure. Here we introduce more general, nonenumerable but cumulatively enumerable measures (CEMs) derived from Turing machines with lexicographically nondecreasing output and random input, and even more general approximable measures and distributions computable in the limit. We obtain a natural hierarchy of generalizations of algorithmic probability and Kolmogorov complexity, suggesting that the "true" information content of some (possibly in nite) bitstring x is the size of the shortest nonhalting program that converges to x and nothing but x on a Turing machine that can edit its previous outputs. Among other things we show that there are objects computable in the limit yet more random than Chaitin's "number of wisdom" Omega, that any approximable measure of x is small for any x lacking a short description, that there is no universal approximable distribution, that there is a universal CEM, and that any nonenumerable CEM of x is small for any x lacking a short enumerating program. We briey mention consequences for universes sampled from such priors.
Algorithmic Theories Of Everything
, 2000
"... The probability distribution P from which the history of our universe is sampled represents a theory of everything or TOE. We assume P is formally describable. Since most (uncountably many) distributions are not, this imposes a strong inductive bias. We show that P(x) is small for any universe x lac ..."
Abstract

Cited by 31 (15 self)
 Add to MetaCart
The probability distribution P from which the history of our universe is sampled represents a theory of everything or TOE. We assume P is formally describable. Since most (uncountably many) distributions are not, this imposes a strong inductive bias. We show that P(x) is small for any universe x lacking a short description, and study the spectrum of TOEs spanned by two Ps, one reflecting the most compact constructive descriptions, the other the fastest way of computing everything. The former derives from generalizations of traditional computability, Solomonoff’s algorithmic probability, Kolmogorov complexity, and objects more random than Chaitin’s Omega, the latter from Levin’s universal search and a natural resourceoriented postulate: the cumulative prior probability of all x incomputable within time t by this optimal algorithm should be 1/t. Between both Ps we find a universal cumulatively enumerable measure that dominates traditional enumerable measures; any such CEM must assign low probability to any universe lacking a short enumerating program. We derive Pspecific consequences for evolving observers, inductive reasoning, quantum physics, philosophy, and the expected duration of our universe.
Discovering Problem Solutions with Low Kolmogorov Complexity and High Generalization Capability
 MACHINE LEARNING: PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE
, 1994
"... Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data (! Occam's razor). Most practical implementations, however, use measures for "simplicity" that lack the power, universality ..."
Abstract

Cited by 16 (8 self)
 Add to MetaCart
Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data (! Occam's razor). Most practical implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most previous approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper addresses both issues. It first reviews some basic concepts of algorithmic complexity theory relevant to machine learning, and how the SolomonoffLevin distribution (or universal prior) deals with the prior problem. The universal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability. The method is based on Levin complexity (a timebounded generalization of Kolmogorov complexity) and...
The Fastest Way of Computing All Universes
"... Is there a short and fast program that can compute the precise history of our universe, including all seemingly random but possibly actually deterministic and pseudorandom quantum fluctuations? There is no physical evidence against this possibility. So let us start searching! We already know a shor ..."
Abstract
 Add to MetaCart
Is there a short and fast program that can compute the precise history of our universe, including all seemingly random but possibly actually deterministic and pseudorandom quantum fluctuations? There is no physical evidence against this possibility. So let us start searching! We already know a short program that computes all constructively computable universes in parallel, each in the asymptotically fastest way. Assuming ours is computed by this optimal method, we can predict that it is among the fastest compatible with our existence. This yields testable predictions. Note: This paper extends an overview of previous work 51–54,58,59 presented in a survey for the German edition of Scientific American. 61 1.