Results 1  10
of
20
Optimal Ordered Problem Solver
, 2002
"... We present a novel, general, optimally fast, incremental way of searching for a universal algorithm that solves each task in a sequence of tasks. The Optimal Ordered Problem Solver (OOPS) continually organizes and exploits previously found solutions to earlier tasks, eciently searching not only the ..."
Abstract

Cited by 62 (20 self)
 Add to MetaCart
We present a novel, general, optimally fast, incremental way of searching for a universal algorithm that solves each task in a sequence of tasks. The Optimal Ordered Problem Solver (OOPS) continually organizes and exploits previously found solutions to earlier tasks, eciently searching not only the space of domainspecific algorithms, but also the space of search algorithms. Essentially we extend the principles of optimal nonincremental universal search to build an incremental universal learner that is able to improve itself through experience.
Discovering Neural Nets With Low Kolmogorov Complexity And High Generalization Capability
 Neural Networks
, 1997
"... Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universali ..."
Abstract

Cited by 49 (30 self)
 Add to MetaCart
Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most previous approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper addresses both issues. It first reviews some basic concepts of algorithmic complexity theory relevant to machine learning, and how the SolomonoffLevin distribution (or universal prior) deals with the prior problem. The universal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability. The method is based on Levin complexity (a timebounded generalization of Kolmogorov comple...
Hierarchies Of Generalized Kolmogorov Complexities And Nonenumerable Universal Measures Computable In The Limit
 INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE
, 2000
"... The traditional theory of Kolmogorov complexity and algorithmic probability focuses on monotone Turing machines with oneway writeonly output tape. This naturally leads to the universal enumerable SolomonoLevin measure. Here we introduce more general, nonenumerable but cumulatively enumerable m ..."
Abstract

Cited by 38 (20 self)
 Add to MetaCart
The traditional theory of Kolmogorov complexity and algorithmic probability focuses on monotone Turing machines with oneway writeonly output tape. This naturally leads to the universal enumerable SolomonoLevin measure. Here we introduce more general, nonenumerable but cumulatively enumerable measures (CEMs) derived from Turing machines with lexicographically nondecreasing output and random input, and even more general approximable measures and distributions computable in the limit. We obtain a natural hierarchy of generalizations of algorithmic probability and Kolmogorov complexity, suggesting that the "true" information content of some (possibly in nite) bitstring x is the size of the shortest nonhalting program that converges to x and nothing but x on a Turing machine that can edit its previous outputs. Among other things we show that there are objects computable in the limit yet more random than Chaitin's "number of wisdom" Omega, that any approximable measure of x is small for any x lacking a short description, that there is no universal approximable distribution, that there is a universal CEM, and that any nonenumerable CEM of x is small for any x lacking a short enumerating program. We briey mention consequences for universes sampled from such priors.
A computer scientist’s view of life, the universe, and everything
 Foundations of Computer Science: Potential  Theory  Cognition
, 1997
"... Is the universe computable? If so, it may be much cheaper in terms of information requirements to compute all computable universes instead of just ours. I apply basic concepts of Kolmogorov complexity theory to the set of possible universes, and chat about perceived and true randomness, life, genera ..."
Abstract

Cited by 38 (15 self)
 Add to MetaCart
Is the universe computable? If so, it may be much cheaper in terms of information requirements to compute all computable universes instead of just ours. I apply basic concepts of Kolmogorov complexity theory to the set of possible universes, and chat about perceived and true randomness, life, generalization, and learning in a given universe. Preliminaries Assumptions. A long time ago, the Great Programmer wrote a program that runs all possible universes on His Big Computer. “Possible ” means “computable”: (1) Each universe evolves on a discrete time scale. (2) Any universe’s state at a given time is describable by a finite number of bits. One of the many universes is ours, despite some who evolved in it and claim it is incomputable. Computable universes. Let TM denote an arbitrary universal Turing machine with unidirectional output tape. TM’s input and output symbols are “0”, “1”, and “, ” (comma). TM’s possible input programs can be ordered
Discovering Solutions with Low Kolmogorov Complexity and High Generalization Capability
, 1995
"... Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data ( Occam's razor). Most practi cal implementations, however, use measures for "simplicity" that lack the power, univ ..."
Abstract

Cited by 37 (25 self)
 Add to MetaCart
Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data ( Occam's razor). Most practi cal implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most pre vious approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper ad dresses both issues. It first reviews some ba sic concepts of algorithmic complexity theory relevant to machine learning, and how the SolomonoffLevin distribution (or universal prior) deals with the prior problem. The uni versal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability.
On Learning How to Learn Learning Strategies
, 1995
"... This paper introduces the "incremental selfimprovement paradigm". Unlike previous methods, incremental selfimprovement encourages a reinforcement learning system to improve the way it learns, and to improve the way it improves the way it learns ..., without significant theoretical limitations  ..."
Abstract

Cited by 36 (15 self)
 Add to MetaCart
This paper introduces the "incremental selfimprovement paradigm". Unlike previous methods, incremental selfimprovement encourages a reinforcement learning system to improve the way it learns, and to improve the way it improves the way it learns ..., without significant theoretical limitations  the system is able to "shift its inductive bias" in a universal way. Its major features are: (1) There is no explicit difference between "learning", "metalearning", and other kinds of information processing. Using a Turing machine equivalent programming language, the system itself occasionally executes selfdelimiting, initially highly random "selfmodification programs" which modify the contextdependent probabilities of future action sequences (including future selfmodification programs). (2) The system keeps only those probability modifications computed by "useful" selfmodification programs: those which bring about more payoff (reward, reinforcement) per time than all previous selfmodi...
Algorithmic Theories Of Everything
, 2000
"... The probability distribution P from which the history of our universe is sampled represents a theory of everything or TOE. We assume P is formally describable. Since most (uncountably many) distributions are not, this imposes a strong inductive bias. We show that P(x) is small for any universe x lac ..."
Abstract

Cited by 31 (15 self)
 Add to MetaCart
The probability distribution P from which the history of our universe is sampled represents a theory of everything or TOE. We assume P is formally describable. Since most (uncountably many) distributions are not, this imposes a strong inductive bias. We show that P(x) is small for any universe x lacking a short description, and study the spectrum of TOEs spanned by two Ps, one reflecting the most compact constructive descriptions, the other the fastest way of computing everything. The former derives from generalizations of traditional computability, Solomonoff’s algorithmic probability, Kolmogorov complexity, and objects more random than Chaitin’s Omega, the latter from Levin’s universal search and a natural resourceoriented postulate: the cumulative prior probability of all x incomputable within time t by this optimal algorithm should be 1/t. Between both Ps we find a universal cumulatively enumerable measure that dominates traditional enumerable measures; any such CEM must assign low probability to any universe lacking a short enumerating program. We derive Pspecific consequences for evolving observers, inductive reasoning, quantum physics, philosophy, and the expected duration of our universe.
A General Method for Incremental SelfImprovement and MultiAgent Learning in Unrestricted Environments
 Evolutionary Computation: Theory and Applications. Scientific Publ. Co., Singapore. In
, 1996
"... I describe a novel paradigm for reinforcement learning (RL) with limited computational resources in realistic, nonresettable environments. The learner's policy is an arbitrary modifiable algorithm mapping environmental inputs and internal states to outputs and new internal states. Like in the re ..."
Abstract

Cited by 22 (8 self)
 Add to MetaCart
I describe a novel paradigm for reinforcement learning (RL) with limited computational resources in realistic, nonresettable environments. The learner's policy is an arbitrary modifiable algorithm mapping environmental inputs and internal states to outputs and new internal states. Like in the real world, any event in system life and any learning process computing policy modifications may affect future performance and preconditions of future learning processes. Unlike with most previous RL approaches, the expected reward for a certain behavior may change during successive "trials". At a given time in system life, there is only one single training example to evaluate the current longterm usefulness of any given previous policy modification, namely the average reinforcement per time since that modification occurred. At certain times in system life called checkpoints, such singular observations are used by a stackbased backtracking method which invalidates certain previous po...
Gödel Machines: SelfReferential Universal Problem Solvers Making Provably Optimal SelfImprovements
, 2003
"... An old dream of computer scientists is to build an optimally ecient universal problem solver. We show how to solve arbitrary computational problems in an optimal fashion inspired by Kurt Gödel's celebrated selfreferential formulas (1931). Our Godel machine's initial software includes an axiomat ..."
Abstract

Cited by 16 (7 self)
 Add to MetaCart
An old dream of computer scientists is to build an optimally ecient universal problem solver. We show how to solve arbitrary computational problems in an optimal fashion inspired by Kurt Gödel's celebrated selfreferential formulas (1931). Our Godel machine's initial software includes an axiomatic description of: the Godel machine's hardware, the problemspeci c utility function (such as the expected future reward of a robot), known aspects of the environment, costs of actions and computations, and the initial software itself (this is possible without introducing circularity). It also includes a typically suboptimal initial problemsolving policy and an asymptotically optimal proof searcher searching the space of computable proof techniquesthat is, programs whose outputs are proofs. Unlike previous approaches, the selfreferential Gödel machine will rewrite any part of its software, including axioms and proof searcher, as soon as it has found a proof that this will improve its future performance, given its typically limited computational resources. We show that selfrewrites are globally optimalno local minima!since provably none of all the alternative rewrites and proofs (those that could be found by continuing the proof search) are worth waiting for.
Discovering Problem Solutions with Low Kolmogorov Complexity and High Generalization Capability
 MACHINE LEARNING: PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE
, 1994
"... Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data (! Occam's razor). Most practical implementations, however, use measures for "simplicity" that lack the power, universality ..."
Abstract

Cited by 16 (8 self)
 Add to MetaCart
Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data (! Occam's razor). Most practical implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most previous approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper addresses both issues. It first reviews some basic concepts of algorithmic complexity theory relevant to machine learning, and how the SolomonoffLevin distribution (or universal prior) deals with the prior problem. The universal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability. The method is based on Levin complexity (a timebounded generalization of Kolmogorov complexity) and...