## Optimal Ordered Problem Solver (2002)

### Cached

### Download Links

- [ftp.idsia.ch]
- [www.idsia.ch]
- [ftp.idsia.ch]
- [www.denizyuret.com]
- [www2.denizyuret.com]
- [www.cs.bham.ac.uk]
- [www6.in.tum.de]
- [arxiv.org]
- DBLP

### Other Repositories/Bibliography

Citations: | 62 - 20 self |

### BibTeX

@MISC{Schmidhuber02optimalordered,

author = {Jürgen Schmidhuber},

title = {Optimal Ordered Problem Solver},

year = {2002}

}

### Years of Citing Articles

### OpenURL

### Abstract

We present a novel, general, optimally fast, incremental way of searching for a universal algorithm that solves each task in a sequence of tasks. The Optimal Ordered Problem Solver (OOPS) continually organizes and exploits previously found solutions to earlier tasks, eciently searching not only the space of domain-specific algorithms, but also the space of search algorithms. Essentially we extend the principles of optimal nonincremental universal search to build an incremental universal learner that is able to improve itself through experience.

### Citations

5304 |
Neural Networks for Pattern Recognition
- Bishop
- 1995
(Show Context)
Citation Context ...primitive instructions for massively parallel cellular automata [54, 56, 60], or on a few nonlinear operations on matrix22like data structures such as those used in recurrent neural network research =-=[5]-=-. For example, we could use the principles of oops to create a non-gradient-based, near-bias-optimal variant of Hochreiter’s successful recurrent network metalearner [13]. It should also be of interes... |

4147 |
Artificial Intelligence: A Modern Approach
- Russell, Norvig
- 1995
(Show Context)
Citation Context ...urrent invocation of Try. This will also restore instruction pointer ip(r0) and original search distribution p(r0). Return the value of Done. —————————————————————————————– 10In planning terminology =-=[36]-=-, Try conducts a depth-first search in program space, where the branches of the search tree are program prefixes (each modifying a bunch of task-specific states), and backtracking is triggered once th... |

3005 |
Adaptation in Natural and Artificial Systems
- Holland
- 1975
(Show Context)
Citation Context ...uch as Genetic Programming (GP) [8, 2]. Unlike logic-based program synthesizers [12, 57, 9], program evolvers use biology-inspired concepts of Evolutionary Computation [34, 48] and Genetic Algorithms =-=[14]-=- to evolve better and better computer programs. Most existing GP implementations, however, do not even allow for programs with loops and recursion, thus ignoring a main motivation for search in progra... |

1779 | An Introduction to Kolmogorov Complexity and its Applications
- Li, Vitányi
- 1993
(Show Context)
Citation Context ...profit from earlier solutions? At first naive glance this seems unlikely, since most possible pairs of symbol strings (such as problemsolving programs) do not share any algorithmic information (e.g., =-=[28]-=-). Why not? Most possible combinations of strings x, y are algorithmically incompressible, that is, the shortest algorithm computing y, given x, has the size of the shortest algorithm computing y, giv... |

1389 | Reinforcement learning: A survey
- Kaelbling, Littman, et al.
- 1996
(Show Context)
Citation Context ...rogram space. They either have very limited search spaces (where solution candidate runtime is not even an issue), or are far from bias-optimal, or both. Similarly, traditional reinforcement learners =-=[20]-=- are neither general nor close to being bias-optimal. Hsearch and Lsearch / Osearch (Sections 2.1, 2.2, 2.3) are nonincremental in the sense that they do not attempt to minimize their constant slowdow... |

1283 |
On computable numbers with an application to the Entscheidungs problem
- Turing
- 1937
(Show Context)
Citation Context ...earch algorithms do not even mention a very simple asymptotically optimal algorithm for problems with quickly verifiable solutions: Method 2.1 (Lsearch) Given a problem and a universal Turing machine =-=[53]-=-, every 2 n steps on average execute one instruction of the n-th binary string (interpreted as a program) in an alphabetical list of all strings, until one of them finds a solution. Given some problem... |

706 |
Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Frommann-Holzboog
- Rechenberg
- 1973
(Show Context)
Citation Context ...te [33] and simpler heuristics such as Genetic Programming (GP) [8, 2]. Unlike logic-based program synthesizers [12, 57, 9], program evolvers use biology-inspired concepts of Evolutionary Computation =-=[34, 48]-=- and Genetic Algorithms [14] to evolve better and better computer programs. Most existing GP implementations, however, do not even allow for programs with loops and recursion, thus ignoring a main mot... |

589 |
The Theory of Self reproducing Automata
- Neumann
- 1966
(Show Context)
Citation Context ...ial languages are not traditional programming languages similar to the Forth-like one from Section 5, but instead based on a handful of primitive instructions for massively parallel cellular automata =-=[54, 56, 60]-=-, or on a few nonlinear operations on matrix22like data structures such as those used in recurrent neural network research [5]. For example, we could use the principles of oops to create a non-gradie... |

558 |
Three Approaches to the Quantitative Definition of Information. Problems of Information Transmission
- Kolmogorov
- 1965
(Show Context)
Citation Context ...lity properties of Lsearch and Hsearch. Binary self-delimiting programs were studied [27, 7] in the context of Turing machines [53] and the theory of Kolmogorov complexity and algorithmic probability =-=[49, 22]-=-. Here we will use a more practical, not necessarily binary framework that does not exclude long programs with high probability. Subsection 3.1 will introduce notation. Subsection 3.2 will introduce a... |

424 |
A formal theory of inductive inference
- Solomonoff
- 1964
(Show Context)
Citation Context ...lity properties of Lsearch and Hsearch. Binary self-delimiting programs were studied [27, 7] in the context of Turing machines [53] and the theory of Kolmogorov complexity and algorithmic probability =-=[49, 22]-=-. Here we will use a more practical, not necessarily binary framework that does not exclude long programs with high probability. Subsection 3.1 will introduce notation. Subsection 3.2 will introduce a... |

419 |
Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme, I.Monatshefte für Mathematik und Physik 38
- Gödel
- 1931
(Show Context)
Citation Context ...g states. We focus on S being the set of integers and Q := {1, 2, . . ., nQ} representing a set of nQ instructions of some programming language. (The first universal programming language due to Gödel =-=[11]-=- was based on integers as well, but ours will be more practical.) Q and nQ may be variable: new tokens may be defined by combining previous tokens, just as traditional programming languages allow for ... |

367 | Ant algorithms for discrete optimization - Dorigo, Caro, et al. - 1999 |

342 | A theory of program size formally identical to information theory
- Chaitin
- 1975
(Show Context)
Citation Context ...given task, programs that halt because they have found a solution or encountered some error cannot request any more tokens. Given the current task, no halting program can be the prefix of another one =-=[27, 7]-=-. On a different task, however, the same program may continue to request additional tokens — this is important for our novel approach. Access to previous solutions. Let p n denote a found prefix solvi... |

287 |
Genetic programming: An introduction
- Banzhaf, Nordin, et al.
- 1998
(Show Context)
Citation Context ... by analogy, etc. can be found in Mitchell’s book [30]. Relatively recent general attempts include program evolvers such as Olsson’s Adate [33] and simpler heuristics such as Genetic Programming (GP) =-=[8, 2]-=-. Unlike logic-based program synthesizers [12, 57, 9], program evolvers use biology-inspired concepts of Evolutionary Computation [34, 48] and Genetic Algorithms [14] to evolve better and better compu... |

277 | GPS: A program that simulates human thought - Newell, Simon - 1961 |

256 | No free lunch theorems for search
- Wolpert, MacReady
- 1995
(Show Context)
Citation Context ...thm computing y, given nothing (typically a bit more than l(y) symbols), which means that x does not tell us anything about y. (Papers in evolutionary computation often mention no free lunch theorems =-=[59]-=- which are variations of this ancient insight of theoretical computer science). Typically, however, successive real world problems are not sampled from a uniform i.i.d. distribution on a large set of ... |

236 | Application of theorem proving to problem solving
- Green
- 1969
(Show Context)
Citation Context ...k [30]. Relatively recent general attempts include program evolvers such as Olsson’s Adate [33] and simpler heuristics such as Genetic Programming (GP) [8, 2]. Unlike logic-based program synthesizers =-=[12, 57, 9]-=-, program evolvers use biology-inspired concepts of Evolutionary Computation [34, 48] and Genetic Algorithms [14] to evolve better and better computer programs. Most existing GP implementations, howev... |

232 |
A representation for the adaptive generation of simple sequential programs
- Cramer
- 1985
(Show Context)
Citation Context ... by analogy, etc. can be found in Mitchell’s book [30]. Relatively recent general attempts include program evolvers such as Olsson’s Adate [33] and simpler heuristics such as Genetic Programming (GP) =-=[8, 2]-=-. Unlike logic-based program synthesizers [12, 57, 9], program evolvers use biology-inspired concepts of Evolutionary Computation [34, 48] and Genetic Algorithms [14] to evolve better and better compu... |

229 |
Conservative logic
- Fredkin, Toffoli
- 1982
(Show Context)
Citation Context ...rsible computation will encounter fundamental heating problems associated with high density computing [4]. Remarkably, however, oops can be naturally implemented using reversible computing strategies =-=[10]-=-, since it completely resets all state modifications due to the programs it tests. But even when we naively extrapolate Moore’s law, within the next century oops will hit the Bremermann limit [6]: app... |

198 |
Numerische Optimierung von Computer-modellen mittels der Evolutionsstrategie
- Schwefel
- 1977
(Show Context)
Citation Context ...te [33] and simpler heuristics such as Genetic Programming (GP) [8, 2]. Unlike logic-based program synthesizers [12, 57, 9], program evolvers use biology-inspired concepts of Evolutionary Computation =-=[34, 48]-=- and Genetic Algorithms [14] to evolve better and better computer programs. Most existing GP implementations, however, do not even allow for programs with loops and recursion, thus ignoring a main mot... |

175 | Extending planning graphs to an ADL subset
- Koehler, Nebel, et al.
- 1997
(Show Context)
Citation Context ...mber of moves grows only linearly with the number of disks, not exponentially; we were able to replicate their results for n up to 5 [23].) Traditional AI planning procedures (e.g, chapter V of [36], =-=[21]-=-) do not learn but systematically explore all possible move combinations, using only absolutely necessary task-specific primitives (while oops will later use more than 70 general instructions, most of... |

125 |
Shift of Bias For Inductive Concept Learning
- Utgoff
- 1985
(Show Context)
Citation Context ..., 35], much work has been done to develop mostly heuristic machine learning algorithms that solve new problems based on experience with previous problems, by incrementally shifting the inductive bias =-=[55]-=-. Many pointers to learning by chunking, learning by macros, hierarchical learning, learning by analogy, etc. can be found in Mitchell’s book [30]. Relatively recent general attempts include program e... |

123 |
Universal sequential search problems
- Levin
- 1973
(Show Context)
Citation Context ...tested prefix may completely reshape the most likely paths through the search space of its own continuations, based on experience ignored by Levin’s and Hutter’s nonincremental optimal search methods =-=[26, 17]-=-. This may introduce 2significant problem class-specific knowledge derived from solutions to earlier tasks. Two searches. Novel oops provides equal resources for two near-bias-optimal searches (see b... |

108 | Adaptation in Natural and Arti Systems. The - Holland - 1975 |

77 |
The Thermodynamics of Computation -A Review
- Bennett
- 1982
(Show Context)
Citation Context ...r by cost, reflecting Moore’s empirical law first formulated in 1965. Within a few decades nonreversible computation will encounter fundamental heating problems associated with high density computing =-=[4]-=-. Remarkably, however, oops can be naturally implemented using reversible computing strategies [10], since it completely resets all state modifications due to the programs it tests. But even when we n... |

67 | Inductive functional programming using incremental program transformation
- Olsson
- 1995
(Show Context)
Citation Context ...chunking, learning by macros, hierarchical learning, learning by analogy, etc. can be found in Mitchell’s book [30]. Relatively recent general attempts include program evolvers such as Olsson’s Adate =-=[33]-=- and simpler heuristics such as Genetic Programming (GP) [8, 2]. Unlike logic-based program synthesizers [12, 57, 9], program evolvers use biology-inspired concepts of Evolutionary Computation [34, 48... |

63 | Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement - Schmidhuber, Zhao, et al. - 1997 |

55 | Learning and Problem solving with multilayer connectionist systems
- Anderson
- 1986
(Show Context)
Citation Context ...er, it is essential to efficiently allocate time to algorithm tests. This is what oops does, in near-bias-optimal incremental fashion. Untrained humans find it hard to solve instances n > 6. Anderson =-=[1]-=- applied traditional reinforcement learning methods and was able to solve instances up to n = 3, solvable within at most 7 moves. Langley [24] used learning production systems and was able to solve in... |

52 | The Speed Prior: a new simplicity measure yielding near-optimal computable predictions
- Schmidhuber
(Show Context)
Citation Context ...were used to solve machine learning toy problems unsolvable by traditional methods [58, 47]. Probabilistic alternatives based on probabilistically chosen maximal program runtimes in Speed-Prior style =-=[41, 45]-=- also outperformed traditional methods on certain toy problems [39, 40]. 52.4 Incremental Search? Since Newell & Simon’s early attempts at building a “General Problem Solver” [32, 35], much work has ... |

49 | Discovering neural nets with low kolmogorov complexity and high generalization capability
- Schmidhuber
- 1997
(Show Context)
Citation Context ...nal methods [58, 47]. Probabilistic alternatives based on probabilistically chosen maximal program runtimes in Speed-Prior style [41, 45] also outperformed traditional methods on certain toy problems =-=[39, 40]-=-. 52.4 Incremental Search? Since Newell & Simon’s early attempts at building a “General Problem Solver” [32, 35], much work has been done to develop mostly heuristic machine learning algorithms that ... |

48 | Universal sequential search problems. Problems of Information Transmission - Levin - 1973 |

48 | An ant colony system hybridized with a new local search for the sequential ordering problem - Gambardella, Dorigo |

46 | Random Processes and Transformations
- Ulam
(Show Context)
Citation Context ...ial languages are not traditional programming languages similar to the Forth-like one from Section 5, but instead based on a handful of primitive instructions for massively parallel cellular automata =-=[54, 56, 60]-=-, or on a few nonlinear operations on matrix22like data structures such as those used in recurrent neural network research [5]. For example, we could use the principles of oops to create a non-gradie... |

40 | Learning to Search: From Weak Methods to Domain-Specific Heuristics
- Langley
- 1985
(Show Context)
Citation Context ...ned humans find it hard to solve instances n > 6. Anderson [1] applied traditional reinforcement learning methods and was able to solve instances up to n = 3, solvable within at most 7 moves. Langley =-=[24]-=- used learning production systems and was able to solve instances up to n = 5, solvable within at most 31 moves. (Side note: Baum and Durdanovic also applied an alternative reinforcement learner based... |

40 | Hierarchies of generalized Kolmogorov complexities and nonenumerable universal measures computable in the limit - Schmidhuber |

39 | Logic program synthesis
- Deville, Lau
- 1994
(Show Context)
Citation Context ...k [30]. Relatively recent general attempts include program evolvers such as Olsson’s Adate [33] and simpler heuristics such as Genetic Programming (GP) [8, 2]. Unlike logic-based program synthesizers =-=[12, 57, 9]-=-, program evolvers use biology-inspired concepts of Evolutionary Computation [34, 48] and Genetic Algorithms [14] to evolve better and better computer programs. Most existing GP implementations, howev... |

39 |
Properties of the Bucket Brigade
- Holland
- 1985
(Show Context)
Citation Context ...er’s cumulative reward per time interval. Our earlier meta-GP algorithm [37] was designed to learn better GP-like strategies. We also combined Holland’s principles of reinforcement learning economies =-=[15]-=- with a “self-referential” metalearning approach [37]. Our gradient-based metalearning technique [38] for continuous program spaces 6of differentiable recurrent neural networks (RNNs) was also design... |

38 | Reinforcement Learning with Self-Modifying Policies
- Schmidhuber, Zhao, et al.
- 1997
(Show Context)
Citation Context ...of incremental search for improved, probabilistically generated code that modifies the probability distribution on the possible code continuations has been used before: our incremental self-improvers =-=[46]-=- use the success-story algorithm SSA [46] to undo those self-generated probability modifications that in the long run do not contribute to increasing the learner’s cumulative reward per time interval.... |

37 | The fastest and shortest algorithm for all well-defined problems
- Hutter
(Show Context)
Citation Context ...tested prefix may completely reshape the most likely paths through the search space of its own continuations, based on experience ignored by Levin’s and Hutter’s nonincremental optimal search methods =-=[26, 17]-=-. This may introduce 2significant problem class-specific knowledge derived from solutions to earlier tasks. Two searches. Novel oops provides equal resources for two near-bias-optimal searches (see b... |

37 | Discovering solutions with low Kolmogorov complexity and high generalization capability
- Schmidhuber
- 1995
(Show Context)
Citation Context ...nal methods [58, 47]. Probabilistic alternatives based on probabilistically chosen maximal program runtimes in Speed-Prior style [41, 45] also outperformed traditional methods on certain toy problems =-=[39, 40]-=-. 52.4 Incremental Search? Since Newell & Simon’s early attempts at building a “General Problem Solver” [32, 35], much work has been done to develop mostly heuristic machine learning algorithms that ... |

34 | A formal theory of inductive inference - Solomono |

34 | Ultimate physical limits to computation
- Lloyd
(Show Context)
Citation Context ...ts. But even when we naively extrapolate Moore’s law, within the next century oops will hit the Bremermann limit [6]: approximately 10 51 operations per second on 10 32 bits for the “ultimate laptop” =-=[29]-=- with 1 kg of mass and 1 liter of volume. Clearly, the Bremermann limit constrains the maximal “conceptual jump size” [50, 51] from one problem to the next. For example, given some prior code bias der... |

33 |
Theory formation by heuristic search
- Lenat
- 1983
(Show Context)
Citation Context ...tions on learning to learn or metalearning [37], where the goal is to learn better learning algorithms through selfimprovement without human intervention (compare Lenat’s human-assisted self-improver =-=[25]-=-). In particular, the concept of incremental search for improved, probabilistically generated code that modifies the probability distribution on the possible code continuations has been used before: o... |

32 |
Evolutionary principles in self-referential learning. on learning now to learn: The meta-meta-meta...-hook. Diploma thesis, Technische Universitat Munchen
- Schmidhuber
- 1987
(Show Context)
Citation Context ...ent tasks. One contribution of this paper is to overcome this drawback in a principled way. Our method draws inspiration from several of our previous publications on learning to learn or metalearning =-=[37]-=-, where the goal is to learn better learning algorithms through selfimprovement without human intervention (compare Lenat’s human-assisted self-improver [25]). In particular, the concept of incrementa... |

32 | Algorithmic Theories of Everything
- Schmidhuber
- 2000
(Show Context)
Citation Context ...were used to solve machine learning toy problems unsolvable by traditional methods [58, 47]. Probabilistic alternatives based on probabilistically chosen maximal program runtimes in Speed-Prior style =-=[41, 45]-=- also outperformed traditional methods on certain toy problems [39, 40]. 52.4 Incremental Search? Since Newell & Simon’s early attempts at building a “General Problem Solver” [32, 35], much work has ... |

32 |
PROW: A Step Towards Automatic Program Writing
- Lee
- 1977
(Show Context)
Citation Context ...k [30]. Relatively recent general attempts include program evolvers such as Olsson’s Adate [33] and simpler heuristics such as Genetic Programming (GP) [8, 2]. Unlike logic-based program synthesizers =-=[12, 57, 9]-=-, program evolvers use biology-inspired concepts of Evolutionary Computation [34, 48] and Genetic Algorithms [14] to evolve better and better computer programs. Most existing GP implementations, howev... |

31 | Self-optimizing and Pareto-optimal policies in general environments based on Bayes-mixtures
- Hutter
(Show Context)
Citation Context ...t future tasks from previous ones, and currently does not spend a fraction of its time on solving predicted tasks. This is what an optimal universal reinforcement learner based on Hutter’s AIXI model =-=[16, 18]-=- would do. Future research should lead to a marriage of the asymptotically optimal AIXI and the near-bias-optimal oops. 3.4 Example Initial Programming Language The efficient search and backtracking m... |

31 |
The SOAR Papers
- Rosenbloom, Laird, et al.
- 1993
(Show Context)
Citation Context ...eed-Prior style [41, 45] also outperformed traditional methods on certain toy problems [39, 40]. 52.4 Incremental Search? Since Newell & Simon’s early attempts at building a “General Problem Solver” =-=[32, 35]-=-, much work has been done to develop mostly heuristic machine learning algorithms that solve new problems based on experience with previous problems, by incrementally shifting the inductive bias [55].... |

30 | An application of algorithmic probability to problems in artificial intelligence
- Solomonoff
- 1986
(Show Context)
Citation Context ...antially faster than oops. A byproduct of this optimality property is that it gives us a natural and precise measure of bias and bias shifts, conceptually related to Solomonoff’s conceptual jump size =-=[50, 51]-=-. An example initial language. For an illustrative application, we wrote an interpreter for a stack-based universal programming language inspired by Forth [31], with initial primitives for defining an... |

24 | Solving POMDPs with Levin search and EIRA
- Wiering, Schmidhuber
- 1996
(Show Context)
Citation Context ...on and nonessential speed-ups due to halting programs if there are any). Nonbinary, nonuniversal variants of Osearch were used to solve machine learning toy problems unsolvable by traditional methods =-=[58, 47]-=-. Probabilistic alternatives based on probabilistically chosen maximal program runtimes in Speed-Prior style [41, 45] also outperformed traditional methods on certain toy problems [39, 40]. 52.4 Incr... |