Results 1  10
of
41
Shifting Inductive Bias with SuccessStory Algorithm, Adaptive Levin Search, and Incremental SelfImprovement
 MACHINE LEARNING
, 1997
"... We study task sequences that allow for speeding up the learner's average reward intake through appropriate shifts of inductive bias (changes of the learner's policy). To evaluate longterm effects of bias shifts setting the stage for later bias shifts we use the "successstory algorithm" (SSA). SSA ..."
Abstract

Cited by 62 (30 self)
 Add to MetaCart
We study task sequences that allow for speeding up the learner's average reward intake through appropriate shifts of inductive bias (changes of the learner's policy). To evaluate longterm effects of bias shifts setting the stage for later bias shifts we use the "successstory algorithm" (SSA). SSA is occasionally called at times that may depend on the policy itself. It uses backtracking to undo those bias shifts that have not been empirically observed to trigger longterm reward accelerations (measured up until the current SSA call). Bias shifts that survive SSA represent a lifelong success history. Until the next SSA call, they are considered useful and build the basis for additional bias shifts. SSA allows for plugging in a wide variety of learning algorithms. We plug in (1) a novel, adaptive extension of Levin search and (2) a method for embedding the learner's policy modification strategy within the policy itself (incremental selfimprovement). Our inductive transfer case studies...
Optimal Ordered Problem Solver
, 2002
"... We present a novel, general, optimally fast, incremental way of searching for a universal algorithm that solves each task in a sequence of tasks. The Optimal Ordered Problem Solver (OOPS) continually organizes and exploits previously found solutions to earlier tasks, eciently searching not only the ..."
Abstract

Cited by 62 (20 self)
 Add to MetaCart
We present a novel, general, optimally fast, incremental way of searching for a universal algorithm that solves each task in a sequence of tasks. The Optimal Ordered Problem Solver (OOPS) continually organizes and exploits previously found solutions to earlier tasks, eciently searching not only the space of domainspecific algorithms, but also the space of search algorithms. Essentially we extend the principles of optimal nonincremental universal search to build an incremental universal learner that is able to improve itself through experience.
The Speed Prior: A New Simplicity Measure Yielding NearOptimal Computable Predictions
 Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT 2002), Lecture Notes in Artificial Intelligence
, 2002
"... Solomonoff's optimal but noncomputable method for inductive inference assumes that observation sequences x are drawn from an recursive prior distribution p(x). Instead of using the unknown p() he predicts using the celebrated universal enumerable prior M() which for all exceeds any recursive p() ..."
Abstract

Cited by 51 (20 self)
 Add to MetaCart
Solomonoff's optimal but noncomputable method for inductive inference assumes that observation sequences x are drawn from an recursive prior distribution p(x). Instead of using the unknown p() he predicts using the celebrated universal enumerable prior M() which for all exceeds any recursive p(), save for a constant factor independent of x. The simplicity measure M() naturally implements "Occam's razor " and is closely related to the Kolmogorov complexity of . However, M assigns high probability to certain data that are extremely hard to compute. This does not match our intuitive notion of simplicity. Here we suggest a more plausible measure derived from the fastest way of computing data. In absence of contrarian evidence, we assume that the physical world is generated by a computational process, and that any possibly infinite sequence of observations is therefore computable in the limit (this assumption is more radical and stronger than Solomonoff's).
A computer scientist’s view of life, the universe, and everything
 Foundations of Computer Science: Potential  Theory  Cognition
, 1997
"... Is the universe computable? If so, it may be much cheaper in terms of information requirements to compute all computable universes instead of just ours. I apply basic concepts of Kolmogorov complexity theory to the set of possible universes, and chat about perceived and true randomness, life, genera ..."
Abstract

Cited by 38 (15 self)
 Add to MetaCart
Is the universe computable? If so, it may be much cheaper in terms of information requirements to compute all computable universes instead of just ours. I apply basic concepts of Kolmogorov complexity theory to the set of possible universes, and chat about perceived and true randomness, life, generalization, and learning in a given universe. Preliminaries Assumptions. A long time ago, the Great Programmer wrote a program that runs all possible universes on His Big Computer. “Possible ” means “computable”: (1) Each universe evolves on a discrete time scale. (2) Any universe’s state at a given time is describable by a finite number of bits. One of the many universes is ours, despite some who evolved in it and claim it is incomputable. Computable universes. Let TM denote an arbitrary universal Turing machine with unidirectional output tape. TM’s input and output symbols are “0”, “1”, and “, ” (comma). TM’s possible input programs can be ordered
The Fastest And Shortest Algorithm For All WellDefined Problems
, 2002
"... An algorithm M is described that solves any welldefined problem p as quickly as the fastest algorithm computing a solution to p, save for a factor of 5 and loworder additive terms. M optimally distributes resources between the execution of provably correct psolving programs and an enumeration of ..."
Abstract

Cited by 35 (7 self)
 Add to MetaCart
An algorithm M is described that solves any welldefined problem p as quickly as the fastest algorithm computing a solution to p, save for a factor of 5 and loworder additive terms. M optimally distributes resources between the execution of provably correct psolving programs and an enumeration of all proofs, including relevant proofs of program correctness and of time bounds on program runtimes. M avoids Blum's speedup theorem by ignoring programs without correctness proof. M has broader applicability and can be faster than Levin's universal search, the fastest method for inverting functions save for a large multiplicative constant. An extension of Kolmogorov complexity and two novel natural measures of function complexity are used to show that the most efficient program computing some function f is also among the shortest programs provably computing f.
Reinforcement Learning With SelfModifying Policies
 IN S. THRUN , L. PRATT (EDS.), LEARNING TO LEARN
, 1997
"... A learner's modifiable components are called its policy. An algorithm that modifies the policy is a learning algorithm. If the learning algorithm has modifiable components represented as part of the policy, then we speak of a selfmodifying policy (SMP). SMPs can modify the way they modify themselve ..."
Abstract

Cited by 33 (22 self)
 Add to MetaCart
A learner's modifiable components are called its policy. An algorithm that modifies the policy is a learning algorithm. If the learning algorithm has modifiable components represented as part of the policy, then we speak of a selfmodifying policy (SMP). SMPs can modify the way they modify themselves etc. They are of interest in situations where the initial learning algorithm itself can be improved by experience  this is what we call "learning to learn". How can we force some (stochastic) SMP to trigger better and better selfmodifications? The successstory algorithm (SSA) addresses this question in a lifelong reinforcement learning context. During the learner's lifetime, SSA is occasionally called at times computed according to SMP itself. SSA uses backtracking to undo those SMPgenerated SMPmodifications that have not been empirically observed to trigger lifelong reward accelerations (measured up until the current SSA call  this evaluates the longterm effects of SMPmodifi...
Algorithmic Theories Of Everything
, 2000
"... The probability distribution P from which the history of our universe is sampled represents a theory of everything or TOE. We assume P is formally describable. Since most (uncountably many) distributions are not, this imposes a strong inductive bias. We show that P(x) is small for any universe x lac ..."
Abstract

Cited by 31 (15 self)
 Add to MetaCart
The probability distribution P from which the history of our universe is sampled represents a theory of everything or TOE. We assume P is formally describable. Since most (uncountably many) distributions are not, this imposes a strong inductive bias. We show that P(x) is small for any universe x lacking a short description, and study the spectrum of TOEs spanned by two Ps, one reflecting the most compact constructive descriptions, the other the fastest way of computing everything. The former derives from generalizations of traditional computability, Solomonoff’s algorithmic probability, Kolmogorov complexity, and objects more random than Chaitin’s Omega, the latter from Levin’s universal search and a natural resourceoriented postulate: the cumulative prior probability of all x incomputable within time t by this optimal algorithm should be 1/t. Between both Ps we find a universal cumulatively enumerable measure that dominates traditional enumerable measures; any such CEM must assign low probability to any universe lacking a short enumerating program. We derive Pspecific consequences for evolving observers, inductive reasoning, quantum physics, philosophy, and the expected duration of our universe.
Towards a universal theory of artificial intelligence based on algorithmic probability and sequential decisions
 Proceedings of the 12 th Eurpean Conference on Machine Learning (ECML2001
, 2001
"... Abstract. Decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental probability distribution is known. Solomonoff’s theory of universal induction formally solves the problem of sequence prediction for unknown distributions. We unify both theories an ..."
Abstract

Cited by 26 (10 self)
 Add to MetaCart
Abstract. Decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental probability distribution is known. Solomonoff’s theory of universal induction formally solves the problem of sequence prediction for unknown distributions. We unify both theories and give strong arguments that the resulting universal AIξ model behaves optimally in any computable environment. The major drawback of the AIξ model is that it is uncomputable. To overcome this problem, we construct a modified algorithm AIξ tl, which is still superior to any other time t and length l bounded agent. The computation time of AIξ tl is of the order t·2 l. 1
Gödel machines: Fully selfreferential optimal universal selfimprovers
 Goertzel and C. Pennachin, Artificial General Intelligence
, 2006
"... Summary. We present the first class of mathematically rigorous, general, fully selfreferential, selfimproving, optimally efficient problem solvers. Inspired by Kurt Gödel’s celebrated selfreferential formulas (1931), such a problem solver rewrites any part of its own code as soon as it has found ..."
Abstract

Cited by 25 (12 self)
 Add to MetaCart
Summary. We present the first class of mathematically rigorous, general, fully selfreferential, selfimproving, optimally efficient problem solvers. Inspired by Kurt Gödel’s celebrated selfreferential formulas (1931), such a problem solver rewrites any part of its own code as soon as it has found a proof that the rewrite is useful, where the problemdependent utility function and the hardware and the entire initial code are described by axioms encoded in an initial proof searcher which is also part of the initial code. The searcher systematically and efficiently tests computable proof techniques (programs whose outputs are proofs) until it finds a provably useful, computable selfrewrite. We show that such a selfrewrite is globally optimal—no local maxima!—since the code first had to prove that it is not useful to continue the proof search for alternative selfrewrites. Unlike previous nonselfreferential methods based on hardwired proof searchers, ours not only boasts an optimal order of complexity but can optimally reduce any slowdowns hidden by the O()notation, provided the utility of such speedups is provable at all. 1
Exploring the Predictable
, 2002
"... Details of complex event sequences are often not predictable, but their reduced abstract representations are. I study an embedded active learner that can limit its predictions to almost arbitrary computable aspects of spatiotemporal events. It constructs probabilistic algorithms that (1) control in ..."
Abstract

Cited by 23 (10 self)
 Add to MetaCart
Details of complex event sequences are often not predictable, but their reduced abstract representations are. I study an embedded active learner that can limit its predictions to almost arbitrary computable aspects of spatiotemporal events. It constructs probabilistic algorithms that (1) control interaction with the world, (2) map event sequences to abstract internal representations (IRs), (3) predict IRs from IRs computed earlier. Its goal is to create novel algorithms generating IRs useful for correct IR predictions, without wasting time on those learned before. This requires an adaptive novelty measure which is implemented by a coevolutionary scheme involving two competing modules collectively designing (initially random) algorithms representing experiments. Using special instructions, the modules can bet on the outcome of IR predictions computed by algorithms they have agreed upon. If their opinions dier then the system checks who's right, punishes the loser (the surprised one), and rewards the winner. An evolutionary or reinforcement learning algorithm forces each module to maximize reward. This motivates both modules to lure each other into agreeing upon experiments involving predictions that surprise it. Since each module essentially can veto experiments it does not consider profitable, the system is motivated to focus on those computable aspects of the environment where both modules still have confident but different opinions. Once both share the same opinion on a particular issue (via the loser's learning process, e.g., the winner is simply copied onto the loser), the winner loses a source of reward  an incentive to shift the focus of interest onto novel experiments. My simulations include an example where surprisegeneration of this kind helps to speed up ...