## Gödel Machines: Self-Referential Universal Problem Solvers Making Provably Optimal Self-Improvements (2003)

Citations: | 16 - 7 self |

### BibTeX

@MISC{Schmidhuber03gödelmachines:,

author = {Jürgen Schmidhuber},

title = {Gödel Machines: Self-Referential Universal Problem Solvers Making Provably Optimal Self-Improvements},

year = {2003}

}

### Years of Citing Articles

### OpenURL

### Abstract

An old dream of computer scientists is to build an optimally efficient universal problem solver. We show how to solve arbitrary computational problems in an optimal fashion inspired by Kurt Gödel's celebrated self-referential formulas (1931). Our Gödel machine's initial software includes an axiomatic description of: the Gödel machine's hardware, the problem-specific utility function (such as the expected future reward of a robot), known aspects of the environment, costs of actions and computations, and the initial software itself (this is possible without introducing circularity). It also includes a typically sub-optimal initial problem-solving policy and an asymptotically optimal proof searcher searching the space of computable proof techniques -- that is, programs whose outputs are proofs. Unlike previous approaches, the self-referential Gödel machine will rewrite any part of its software, including axioms and proof searcher, as soon as it has found a proof that this will improve its future performance, given its typically limited computational resources. We show that self-rewrites are globally optimal -- no local minima!|since provably none of all the alternative rewrites and proofs (those that could be found by continuing the proof search) are worth waiting for.

### Citations

3773 | Reinforcement Learning: An Introduction
- Sutton, Barto
- 1998
(Show Context)
Citation Context ...te by any method. Or the interface to the environment is Markovian [30], that is, the current input always uniquely identifies the environmental state—a lot of work has been done on this special case =-=[28, 2, 51]-=-. Even more restrictively, the environment may evolve in completely predictable fashion known in advance. All such prior assumptions are perfectly formalizable. (d) Uncertainty axioms: Standard axioms... |

1682 | An Introduction to Kolmogorov Complexity and its Applications
- Li, Vitányi
- 1997
(Show Context)
Citation Context ...enerating all proofs in order of their sizes. To produce a certain proof, this approach takes time exponential in proof size. Instead our p(1) will produce many proofs with low algorithmic complexity =-=[48, 20, 25]-=- much more quickly. It runs and evaluates proof techniques composed from instructions of the p(1)-encoded language L. For example, L may be a variant of PROLOG [7] or the universal Forth[26]-inspired ... |

1303 | Reinforcement Learning: A Survey
- Kaelbling, Littman, et al.
- 1996
(Show Context)
Citation Context ...lly unknown environment that produces a continual stream of inputs and feedback signals, such as in autonomous robot control tasks, where the goal may be to maximize expected cumulative future reward =-=[18]-=-. This may require the solution of essentially arbitrary problems (examples in Section 3.2 formulate traditional problems as special cases). 1.1 Previous Work: Best General Methods Need Proof Searcher... |

1157 |
M.: On computable numbers, with an application to the Entscheidungsproblem
- Turing
- 1936
(Show Context)
Citation Context ... 3, . . . of Aixi(t,l)’s lifetime, action y(k) results in perception x(k) and reward r(k), where all quantities may depend on the complete history. Using a universal computer such as a Turing machine =-=[52]-=-, Aixi(t,l) needs an initial offline setup phase (prior to interaction with the environment) to examine all proofs of length at most lP, filtering out those that identify programs (of maximal size l a... |

613 | Some studies in machine learning using the game of checkers
- Samuel
- 1959
(Show Context)
Citation Context ...te by any method. Or the interface to the environment is Markovian [30], that is, the current input always uniquely identifies the environmental state—a lot of work has been done on this special case =-=[28, 2, 51]-=-. Even more restrictively, the environment may evolve in completely predictable fashion known in advance. All such prior assumptions are perfectly formalizable. (d) Uncertainty axioms: Standard axioms... |

521 |
Three approaches to the quantitative definition of information
- Kolmogorov
- 1965
(Show Context)
Citation Context ...enerating all proofs in order of their sizes. To produce a certain proof, this approach takes time exponential in proof size. Instead our p(1) will produce many proofs with low algorithmic complexity =-=[48, 20, 25]-=- much more quickly. It runs and evaluates proof techniques composed from instructions of the p(1)-encoded language L. For example, L may be a variant of PROLOG [7] or the universal Forth[26]-inspired ... |

409 |
First-Order Logic and Automated Theorem Proving
- Fitting
- 2012
(Show Context)
Citation Context ...text, ‘Gödel’s putative theorem-proving machine’ [27]! 3s1.3 Fast Initial Proof Searcher There are many ways of initializing the proof searcher. But since proof verification is a fast business (e.g., =-=[9]-=-), we may construct an asymptotically optimal initialization called BiasOptimal Proof Search (Biops)—see Section 2.3. Biops uses variants of Universal Search [22] and the Optimal Ordered Problem Solve... |

404 |
A formal theory of inductive inference
- Solomonoff
- 1964
(Show Context)
Citation Context ...enerating all proofs in order of their sizes. To produce a certain proof, this approach takes time exponential in proof size. Instead our p(1) will produce many proofs with low algorithmic complexity =-=[48, 20, 25]-=- much more quickly. It runs and evaluates proof techniques composed from instructions of the p(1)-encoded language L. For example, L may be a variant of PROLOG [7] or the universal Forth[26]-inspired ... |

385 |
Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme.” Monatshefte für Mathematik und Physik
- Gödel
- 1931
(Show Context)
Citation Context ...provability, using Cantor’s diagonalization trick [5] to demonstrate that formal systems such as traditional mathematics are either flawed in a certain sense or contain unprovable but true statements =-=[10]-=-. Since Gödel’s exhibition of the fundamental limits of proof and computation, and Konrad Zuse’s subsequent construction of the first working programmable computer (1935-1941), there has been a lot of... |

330 | A theory of program size formally identical to information theory
- Chaitin
(Show Context)
Citation Context ...dered Problem Solver Oops [38, 40] to a sequence of proof search tasks. As long as p = p(1) (which is true at least as long as no target theorem has been found), p searches a space of self-delimiting =-=[23, 6, 25, 38]-=- programs written in L. The reader is asked to consult previous work for details [38, 40]; here we just outline the basic procedure: any currently running proof technique w is an instruction sequence ... |

321 | Shadows of the Mind
- Penrose
- 1994
(Show Context)
Citation Context ...r ‘Goedel machine’, to avoid the Umlaut. But ‘Godel machine’ would not be quite correct. Not to be confused with what Penrose calls, in a different context, ‘Gödel’s putative theorem-proving machine’ =-=[27]-=-! 3s1.3 Fast Initial Proof Searcher There are many ways of initializing the proof searcher. But since proof verification is a fast business (e.g., [9]), we may construct an asymptotically optimal init... |

270 |
Genetic Programming: an Introduction
- Banzhalf, Nordin, et al.
- 1989
(Show Context)
Citation Context ...courage provably optimal ones. Similar drawbacks hold for Lenat’s human-assisted, non-autonomous, self-modifying learner [21], our Meta-Genetic Programming [29] extending Cramer’s Genetic Programming =-=[8, 1]-=-, our metalearning economies [29] extending Holland’s machine learning economies [14], and gradient-based metalearners for continuous program spaces of differentiable recurrent neural networks [31, 12... |

239 |
Adaptive Control Processes
- Bellman
- 1961
(Show Context)
Citation Context ...te by any method. Or the interface to the environment is Markovian [30], that is, the current input always uniquely identifies the environmental state—a lot of work has been done on this special case =-=[28, 2, 51]-=-. Even more restrictively, the environment may evolve in completely predictable fashion known in advance. All such prior assumptions are perfectly formalizable. (d) Uncertainty axioms: Standard axioms... |

216 |
A representation for the adaptive generation of simple sequential programs
- Cramer
- 1985
(Show Context)
Citation Context ...courage provably optimal ones. Similar drawbacks hold for Lenat’s human-assisted, non-autonomous, self-modifying learner [21], our Meta-Genetic Programming [29] extending Cramer’s Genetic Programming =-=[8, 1]-=-, our metalearning economies [29] extending Holland’s machine learning economies [14], and gradient-based metalearners for continuous program spaces of differentiable recurrent neural networks [31, 12... |

153 |
Grundbegriffe der Wahrscheinlichkeitsrechnung
- Kolmogorov
- 1933
(Show Context)
Citation Context ... in completely predictable fashion known in advance. All such prior assumptions are perfectly formalizable. (d) Uncertainty axioms: Standard axioms for arithmetics and calculus and probability theory =-=[19]-=- and statistics and string manipulation that (in conjunction with the environment axioms) allow for constructing proofs concerning (possibly uncertain) properties of s as well as bounds on expected re... |

146 |
A machine-independent theory of the complexity of recursive functions
- Blum
- 1967
(Show Context)
Citation Context ...l axioms. In particular, one can construct examples of environments and utility functions that make it impossible for the Gödel machine to ever prove a target theorem. Compare Blum’s speed-up theorem =-=[3, 4]-=- based on certain incomputable predicates. Similarly, a realistic Gödel machine with limited resources cannot profit from self-improvements whose usefulness it cannot prove within its time and space c... |

129 |
Über den anschaulichen Inhalt der quantentheoretischen Kinematik und Mechanik, Zeitschrift für Physik
- Heisenberg
- 1927
(Show Context)
Citation Context ...after the fact). The values of other variables at given times, however, may not be deducable at all. Such limits of self-observability are reminiscent of Heisenberg’s celebrated uncertainty principle =-=[11]-=-, which states that certain physical measurements are necessarily imprecise, since the measuring process affects the measured quantity. 11sreal current state. If a target theorem has been found, check... |

118 |
Universal sequential search problems
- Levin
- 1973
(Show Context)
Citation Context ...ssentially arbitrary problems (examples in Section 3.2 formulate traditional problems as special cases). 1.1 Previous Work: Best General Methods Need Proof Searchers! Neither Levin’s universal search =-=[22]-=- nor its incremental extension, the Optimal Ordered Problem Solver [38, 40], nor Solomonoff’s recent ideas [50] are ‘universal enough’ for such general setups, and our earlier self-modifying online le... |

101 |
Randomness conservation inequalities: Information and independence in mathematical theories
- Levin
- 1984
(Show Context)
Citation Context ...e. 2.3.1 How the Initial Proof Searcher May Use Biops to Solve the First Proof Search Task Biops first invokes a variant of Levin’s universal search [22] (Levin attributes similar ideas 14sto Adleman =-=[24]-=-). Universal search is a simple, asymptotically optimal [22, 24, 25, 16, 50], near-bias-optimal [38, 40] way of solving a broad class of problems whose solutions can be quickly verified. It was origin... |

66 |
An Eternal Golden Braid Basic Books
- Hofstadter, Godel, et al.
- 1999
(Show Context)
Citation Context ...roof searcher. The Gödel machine may be viewed as a self-referential universal problem solver that can formally talk about itself, in particular about its performance. It may ‘step outside of itself’ =-=[13]-=- by rewriting its axioms and utility function or augmenting its hardware, provided this is provably useful. Its conceptual simplicity notwithstanding, the Gödel machine explicitly addresses the ‘Grand... |

62 | Optimal ordered problem solver
- Schmidhuber
- 2004
(Show Context)
Citation Context ...ional problems as special cases). 1.1 Previous Work: Best General Methods Need Proof Searchers! Neither Levin’s universal search [22] nor its incremental extension, the Optimal Ordered Problem Solver =-=[38, 40]-=-, nor Solomonoff’s recent ideas [50] are ‘universal enough’ for such general setups, and our earlier self-modifying online learning systems [29, 32, 46, 45, 47] are not necessarily optimal. Hutter’s r... |

62 | Shifting inductive bias with success-story algorithm, adaptive Levin seach, and incremental self-improvement
- Schmidhuber, Zhao, et al.
- 1997
(Show Context)
Citation Context ...ental extension, the Optimal Ordered Problem Solver [38, 40], nor Solomonoff’s recent ideas [50] are ‘universal enough’ for such general setups, and our earlier self-modifying online learning systems =-=[29, 32, 46, 45, 47]-=- are not necessarily optimal. Hutter’s recent Aixi model [15] does execute optimal actions in very general environments evolving according to arbitrary, unknown, yet computable probabilistic laws, but... |

53 | Reinforcement learning in Markovian and nonMarkovian environments
- Schmidhuber
- 1991
(Show Context)
Citation Context ...stic computer program [53, 34] sampled from the Speed Prior [39] which assigns low probability to environments that are hard to compute by any method. Or the interface to the environment is Markovian =-=[30]-=-, that is, the current input always uniquely identifies the environmental state—a lot of work has been done on this special case [28, 2, 51]. Even more restrictively, the environment may evolve in com... |

51 | The speed prior: A new simplicity measure yielding near-optimal computable predictions
- Schmidhuber
- 2002
(Show Context)
Citation Context ...vious history [48, 49, 15], or at least limit-computable [36, 37]. Or, more restrictively, the environment may be some unknown but deterministic computer program [53, 34] sampled from the Speed Prior =-=[39]-=- which assigns low probability to environments that are hard to compute by any method. Or the interface to the environment is Markovian [30], that is, the current input always uniquely identifies the ... |

49 | Discovering neural nets with low kolmogorov complexity and high generalization capability
- Schmidhuber
- 1997
(Show Context)
Citation Context ...or time-optimal backtracking in program space to perform efficient storage management on realistic, limited computers. Previous practical variants and extensions of universal search have been applied =-=[33, 35, 47, 38, 40]-=- to offline program search tasks where the program inputs are fixed such that the same program always produces the same results. This is not the case in the present online setting: the same proof tech... |

48 | Universal sequential search problems. Problemi Peredachi - Levin - 1973 |

38 | A computer scientist’s view of life, the universe, and everything
- Schmidhuber
(Show Context)
Citation Context ...tion that is computable, given the previous history [48, 49, 15], or at least limit-computable [36, 37]. Or, more restrictively, the environment may be some unknown but deterministic computer program =-=[53, 34]-=- sampled from the Speed Prior [39] which assigns low probability to environments that are hard to compute by any method. Or the interface to the environment is Markovian [30], that is, the current inp... |

38 | Hierarchies of generalized Kolmogorov complexities and nonenumerable universal measures computable in the limit
- Schmidhuber
(Show Context)
Citation Context ... example, it may be known in advance that the environment is sampled from an unknown probability distribution that is computable, given the previous history [48, 49, 15], or at least limit-computable =-=[36, 37]-=-. Or, more restrictively, the environment may be some unknown but deterministic computer program [53, 34] sampled from the Speed Prior [39] which assigns low probability to environments that are hard ... |

37 |
Properties of the Bucket brigade
- Holland
- 1985
(Show Context)
Citation Context ...utonomous, self-modifying learner [21], our Meta-Genetic Programming [29] extending Cramer’s Genetic Programming [8, 1], our metalearning economies [29] extending Holland’s machine learning economies =-=[14]-=-, and gradient-based metalearners for continuous program spaces of differentiable recurrent neural networks [31, 12]. All these methods, however, could be used to seed p(1) with an initial policy. 3.5... |

37 | On learning how to learn learning strategies
- Schmidhuber
- 1994
(Show Context)
Citation Context ...ental extension, the Optimal Ordered Problem Solver [38, 40], nor Solomonoff’s recent ideas [50] are ‘universal enough’ for such general setups, and our earlier self-modifying online learning systems =-=[29, 32, 46, 45, 47]-=- are not necessarily optimal. Hutter’s recent Aixi model [15] does execute optimal actions in very general environments evolving according to arbitrary, unknown, yet computable probabilistic laws, but... |

37 | Discovering solutions with low Kolmogorov complexity and high generalization capability
- Schmidhuber
- 1995
(Show Context)
Citation Context ...or time-optimal backtracking in program space to perform efficient storage management on realistic, limited computers. Previous practical variants and extensions of universal search have been applied =-=[33, 35, 47, 38, 40]-=- to offline program search tasks where the program inputs are fixed such that the same program always produces the same results. This is not the case in the present online setting: the same proof tech... |

36 |
Über eine Eigenschaft des Inbegriffes aller reellen algebraischen Zahlen
- Cantor
(Show Context)
Citation Context ...ding arbitrary proofs, given an arbitrary enumerable set of axioms. He went on to construct self-referential formal statements that claim their own unprovability, using Cantor’s diagonalization trick =-=[5]-=- to demonstrate that formal systems such as traditional mathematics are either flawed in a certain sense or contain unprovable but true statements [10]. Since Gödel’s exhibition of the fundamental lim... |

35 | The fastest and shortest algorithm for all well-defined problems
- Hutter
(Show Context)
Citation Context ...tationally intractable, especially when M includes all computable distributions. This drawback motivated work on the time-bounded, asymptotically optimal Aixi(t,l) system [15] and the related Hsearch =-=[16]-=-, both already discussed in the introduction. Both methods could be used to seed the Gödel machine with an initial policy. Unlike Aixi(t,l) and Hsearch, however, the Gödel machine is not only asymptot... |

34 | Reinforcement learning with self-modifying policies
- Schmidhuber, Zhao, et al.
- 1997
(Show Context)
Citation Context ...ental extension, the Optimal Ordered Problem Solver [38, 40], nor Solomonoff’s recent ideas [50] are ‘universal enough’ for such general setups, and our earlier self-modifying online learning systems =-=[29, 32, 46, 45, 47]-=- are not necessarily optimal. Hutter’s recent Aixi model [15] does execute optimal actions in very general environments evolving according to arbitrary, unknown, yet computable probabilistic laws, but... |

33 |
Theory formation by heuristic search
- Lenat
- 1983
(Show Context)
Citation Context ...ures the usefulness of previous self-modifications, and does not necessarily encourage provably optimal ones. Similar drawbacks hold for Lenat’s human-assisted, non-autonomous, self-modifying learner =-=[21]-=-, our Meta-Genetic Programming [29] extending Cramer’s Genetic Programming [8, 1], our metalearning economies [29] extending Holland’s machine learning economies [14], and gradient-based metalearners ... |

33 | A formal theory of inductive inference - Solomono - 1964 |

31 | Self-optimizing and Pareto-optimal policies in general environments based on Bayes-mixtures
- Hutter
(Show Context)
Citation Context ...8, 49], where the sum of the weights does not exceed 1. In cycle k + 1, Aixi selects as next action the first in an action sequence maximizing ξ-predicted reward up to some given horizon. Recent work =-=[17]-=- demonstrated Aixi ’s optimal use of observations as follows. The Bayes-optimal policy p ξ based on the mixture ξ is self-optimizing in the sense that its average utility value converges asymptoticall... |

31 |
Evolutionary principles in self-referential learning
- Schmidhuber
- 1987
(Show Context)
Citation Context |

31 | Algorithmic theories of everything
- Schmidhuber
- 2000
(Show Context)
Citation Context ... example, it may be known in advance that the environment is sampled from an unknown probability distribution that is computable, given the previous history [48, 49, 15], or at least limit-computable =-=[36, 37]-=-. Or, more restrictively, the environment may be some unknown but deterministic computer program [53, 34] sampled from the Speed Prior [39] which assigns low probability to environments that are hard ... |

26 | Towards a universal theory of artificial intelligence based on algorithmic probability and sequential decisions
- Hutter
(Show Context)
Citation Context ...s recent ideas [50] are ‘universal enough’ for such general setups, and our earlier self-modifying online learning systems [29, 32, 46, 45, 47] are not necessarily optimal. Hutter’s recent Aixi model =-=[15]-=- does execute optimal actions in very general environments evolving according to arbitrary, unknown, yet computable probabilistic laws, but only under the unrealistic assumption of unlimited computati... |

23 |
of information (nongrowth) and aspects of the foundation of probability theory
- Levin
(Show Context)
Citation Context ...dered Problem Solver oops [38, 40] to a sequence of proof search tasks. As long as p = p(1) (which is true at least as long as no target theorem has been found), p searches a space of self-delimiting =-=[23, 6, 25, 38]-=- programs written in L. The reader is asked to consult previous work for details [38, 40]; here we just outline the basic procedure: any currently running proof technique w is an instruction sequence ... |

20 |
On effective procedures for speeding up algorithms
- Blum
- 1971
(Show Context)
Citation Context ...l axioms. In particular, one can construct examples of environments and utility functions that make it impossible for the Gödel machine to ever prove a target theorem. Compare Blum’s speed-up theorem =-=[3, 4]-=- based on certain incomputable predicates. Similarly, a realistic Gödel machine with limited resources cannot profit from self-improvements whose usefulness it cannot prove within its time and space c... |

16 | FORTH - a language for interactive computing - Moore, Leach |

16 | A self-referential weight matrix
- Schmidhuber
- 1993
(Show Context)
Citation Context ... [8, 1], our metalearning economies [29] extending Holland’s machine learning economies [14], and gradient-based metalearners for continuous program spaces of differentiable recurrent neural networks =-=[31, 12]-=-. All these methods, however, could be used to seed p(1) with an initial policy. 3.5.2 Gödel Machine vs Oops and Oops-rl The Optimal Ordered Problem Solver Oops [38, 40] (used by Biops in Section 2.3)... |

16 | Simple principles of metalearning
- Schmidhuber, Zhao, et al.
- 1996
(Show Context)
Citation Context |

15 | The new AI: General & sound & relevant for physics
- Schmidhuber
- 2007
(Show Context)
Citation Context ...er general reinforcement learner (Oops-rl), one module learning a predictive model of the environment, the other one using this world model to search for an action sequence maximizing expected reward =-=[38, 42]-=-. Despite the bias-optimality properties of Oops for certain ordered task sequences, however, Oops-rl is not necessarily the best way of spending limited computation time in general RL situations. A p... |

15 | Three approaches to the quantitative de of information. Problems of Information Transmission - Kolmogorov - 1965 |

14 | Bias-optimal incremental problem solving
- Schmidhuber
- 2002
(Show Context)
Citation Context ...ional problems as special cases). 1.1 Previous Work: Best General Methods Need Proof Searchers! Neither Levin’s universal search [22] nor its incremental extension, the Optimal Ordered Problem Solver =-=[38, 40]-=-, nor Solomonoff’s recent ideas [50] are ‘universal enough’ for such general setups, and our earlier self-modifying online learning systems [29, 32, 46, 45, 47] are not necessarily optimal. Hutter’s r... |

14 |
Complexity-based induction systems
- Solomonoff
- 1978
(Show Context)
Citation Context ... in reaction to sequences of outputs y. For example, it may be known in advance that the environment is sampled from an unknown probability distribution that is computable, given the previous history =-=[48, 49, 15]-=-, or at least limit-computable [36, 37]. Or, more restrictively, the environment may be some unknown but deterministic computer program [53, 34] sampled from the Speed Prior [39] which assigns low pro... |

13 | Learning to learn using gradient descent
- Hochreiter, AS, et al.
- 2001
(Show Context)
Citation Context ... [8, 1], our metalearning economies [29] extending Holland’s machine learning economies [14], and gradient-based metalearners for continuous program spaces of differentiable recurrent neural networks =-=[31, 12]-=-. All these methods, however, could be used to seed p(1) with an initial policy. 3.5.2 Gödel Machine vs Oops and Oops-rl The Optimal Ordered Problem Solver Oops [38, 40] (used by Biops in Section 2.3)... |