## Universal Algorithmic Intelligence: A mathematical top-down approach (2005)

Venue: | Artificial General Intelligence |

Citations: | 22 - 6 self |

### BibTeX

@INPROCEEDINGS{Hutter05universalalgorithmic,

author = {Marcus Hutter},

title = {Universal Algorithmic Intelligence: A mathematical top-down approach},

booktitle = {Artificial General Intelligence},

year = {2005},

pages = {200--227},

publisher = {Springer-Verlag}

}

### Years of Citing Articles

### OpenURL

### Abstract

Artificial intelligence; algorithmic probability; sequential decision theory; rational

### Citations

4208 |
Artificial Intelligence: A Modern approach (2 nd Edition
- Russell, Norvig
- 2002
(Show Context)
Citation Context ...onsciousness. An annotated bibliography concludes this chapter. 2 Agents in Known Probabilistic Environments The general framework for AI might be viewed as the design and study of intelligent agents =-=[53]-=-. An agent is a cybernetic system with some internal state, which acts with output yk on some environment in cycle k, perceives some input xk from the environment and updates its internal state. Then ... |

4165 | Reinforcement Learning - An Introduction - Sutton, Barto - 1998 |

4144 | Introduction to Automata Theory, Languages, and Computation - Hopcroft, Ullman - 1979 |

2991 |
Game Theory
- Fudenberg, Tirole
- 1991
(Show Context)
Citation Context ...e game). The players are assumed to be rational, taking into account all information they posses. The different goals of the players are usually in conflict. For an introduction into game theory, see =-=[16, 48, 53, 47]-=-. If we interpret the AI system as one player, and the environment models the other rational player and the environment provides the reinforcement feedback rk, we see that the agent-environment config... |

2901 | Dynamic programming - Bellman - 1957 |

1915 | A Course in Game Theory - Osborne, Rubinstein - 1994 |

1782 | An Introduction to Kolmogorov Complexity and Its Applications, 2nd edn - Li, Vitányi - 1997 |

1754 | A theory of the learnable - Valiant - 1984 |

1405 | Reinforcement learning: A survey - Kaelbling, Littman, et al. - 1996 |

1170 |
Theory of games and economic behavior
- Neumann, Morgenstern
- 1944
(Show Context)
Citation Context ...nd output spaces X and Y and the lifetime of the agent, and their universal choice, which we have in mind. There is nothing remarkable in this section; it is the essence of sequential decision theory =-=[47, 2, 3, 66]-=-, presented in a new form. Notation and formulas needed in later sections are simply developed. There are two major remaining problems: the problem of the unknown true probability distribution µ, whic... |

804 | Nonlinear Programming. Athena Scientific - Bertsekas - 1995 |

701 | The weighted majority algorithm
- Littlestone, Warmuth
- 1994
(Show Context)
Citation Context ...Y and mk. ξ ˜t ˜l has to be evaluated |Y| hk hk |X | times in (22). It is possible to optimize the algorithm and perform the computation within time t(˙y AIξ˜t˜ l k ) = O(|Y| hk |X | hk · 2 ˜ l · ˜t) =-=(43)-=- per cycle. If we assume that the computation time of µ ˜t ˜l is exactly ˜t for all arguments, the brute-force time ¯t for calculating the sums and maxs in (11) is ¯t(˙y AIµ˜t˜ l k )≥|Y| hk hk |X | ·˜... |

538 |
The Emperor’s New Mind
- Penrose
- 1989
(Show Context)
Citation Context ...able to behave in an intelligent way, is the very basis of AI, justifying the hope to be able to construct agents that eventually reach and outperform human intelligence. For a contrary viewpoint see =-=[45, 49, 50]-=-. It is not necessary to discuss here what is meant by “reasonable time/intelligence” and “sufficient power”. What we are interested in, in this section, is whether there is a computable version AIXI˜... |

433 | First-Order Logic and Automated Theorem Proving - Fitting - 1996 |

427 |
A formal theory of inductive inference
- Solomonoff
- 1964
(Show Context)
Citation Context ...plexity measure, any monotone decreasing function of K, e.g.ξ(x)=2−K(x) would satisfy this criterion. But ξ also has to satisfy the probability axioms, so we have to be a bit more careful. Solomonoff =-=[61, 62]-=- defined the universal prior ξ(x) as the probability that the output of a universal Turing machine U starts with x when provided with fair coin flips on the input tape. Formally, ξ can be defined as ξ... |

340 | A theory of program size formally identical to information theory - Chaitin - 1975 |

318 |
Inductive Inference: Theory and Methods
- Angluin, Smith
- 1983
(Show Context)
Citation Context ...K(y), K(xy) + ≤ K(x)+K(y). v) K(x|y,K(y))+K(y) + = K(x,y) + = K(y,x) + = K(y|x,K(x))+K(x). vi) K(f(x)) + ≤ K(x)+K(f) if f :IB ∗ →IB ∗ is recursive/computable. vii)K(x) + ≤−log2P (x)+K(P ) if P :IB ∗ →=-=[0,1]-=- is recursive and ∑ xP (x)≤1 All (in)equalities remain valid if K is (further) conditioned under some z, i.e. K(...) ❀ K(...|z) andK(...|y) ❀ K(...|y,z). Those stated are all valid within an additive ... |

222 | Mathematical logic - Shoenfield - 1967 |

189 | Statistical theory: the prequential approach - Dawid - 1984 |

181 |
Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability
- Hutter
- 2005
(Show Context)
Citation Context ...systems should be to be useful to humans. The problem is that, except for special cases, ∗ This article grew out of the technical report [19] and summarizes and contains excerpts of the Springer book =-=[30]-=-.228 Marcus Hutter we know neither the utility function nor the environment in which the agent will operate in advance. The mathematical theory, coined AIXI, is supposed to solve these problems. Assu... |

170 |
Shadows of the Mind: A Search for the Missing Science of Consciousness
- Penrose
- 1994
(Show Context)
Citation Context ...able to behave in an intelligent way, is the very basis of AI, justifying the hope to be able to construct agents that eventually reach and outperform human intelligence. For a contrary viewpoint see =-=[45, 49, 50]-=-. It is not necessary to discuss here what is meant by “reasonable time/intelligence” and “sufficient power”. What we are interested in, in this section, is whether there is a computable version AIXI˜... |

161 | Universal prediction of individual sequences
- Feder, Merhav, et al.
- 1992
(Show Context)
Citation Context ...f ξ. This could be done in the same way as the theory of universal induction was downscaled with many insights to the Minimum Description Length principle [40, 52] or to the domain of finite automata =-=[14]-=-. The AIXI model might similarly serve as a supermodel or as the very definition of (universal unbiased) intelligence, from which specialized models could be derived. Implementation and approximation.... |

130 |
Complexity-based induction systems: comparisons and convergence theorems
- Solomonoff
- 1978
(Show Context)
Citation Context ...plexity measure, any monotone decreasing function of K, e.g.ξ(x)=2−K(x) would satisfy this criterion. But ξ also has to satisfy the probability axioms, so we have to be a bit more careful. Solomonoff =-=[61, 62]-=- defined the universal prior ξ(x) as the probability that the output of a universal Turing machine U starts with x when provided with fair coin flips on the input tape. Formally, ξ can be defined as ξ... |

123 | Universal sequential search problems - Levin - 1973 |

101 |
Laws of information conservation (non-growth) and aspects of the foundation of probability theory
- LEVIN
- 1974
(Show Context)
Citation Context ...iversal (so-called prefix) Turing machine U with unidirectional binary input and output tapes and a bidirectional work tape [42, 30]. We can then define the (conditional) prefix Kolmogorov complexity =-=[5, 17, 33, 38]-=- of a binary string x as the length l of the shortest program p, forwhichU outputs the binary string x (given y). Definition 4 (Kolmogorov complexity). Let U be a universal prefix Turing machine U. Th... |

94 |
On the symmetry of algorithmic information
- GÁCS
- 1974
(Show Context)
Citation Context ...iversal (so-called prefix) Turing machine U with unidirectional binary input and output tapes and a bidirectional work tape [42, 30]. We can then define the (conditional) prefix Kolmogorov complexity =-=[5, 17, 33, 38]-=- of a binary string x as the length l of the shortest program p, forwhichU outputs the binary string x (given y). Definition 4 (Kolmogorov complexity). Let U be a universal prefix Turing machine U. Th... |

83 | defence of probability
- In
- 1985
(Show Context)
Citation Context ...have to be learned by AIξ from scratch. Other theories, like fuzzy logic, possibility theory, Dempster-Shafer theory, and so on are partly outdated and partly reducible to Bayesian probability theory =-=[7, 8]-=-. The interpretation and consequences of the evidence gap g := 1− ∑ xk ξ(yx<kyxk )>0inξ may be similar to those in Dempster-Shafer theory. Boolean logical reasoning about the external world plays, at ... |

61 | Optimal ordered problem solver
- Schmidhuber
(Show Context)
Citation Context ...arp in the AIXI model. Training. We have not said much about the training process itself, as it is not specific to the AIXI model and has been discussed in literature in various forms and disciplines =-=[63, 56, 57]-=-. By a training process we mean a sequence of simple-to-complex tasks to solve, with the simpler ones helping in learning the more complex ones. A serious discussion would be out of place. To repeat a... |

60 | Inductive Reasoning and Kolmogorov Complexity - Li, Vit'anyi - 1992 |

52 | frequency and reasonable expectations - Probability - 1946 |

51 | The speed prior: A new simplicity measure yielding nearoptimal computable predictions
- Schmidhuber
- 2002
(Show Context)
Citation Context ...s “weighted majority”. 6.1 Time-Limited Probability Distributions In the literature one can find time-limited versions of Kolmogorov complexity [11, 12, 32] and the time-limited universal semimeasure =-=[39, 42, 55]-=-. In the following, we utilize and adapt the latter and see how far we get. One way to define a time-limited universal chronological semimeasure is as a mixture over enumerable chronological semimeasu... |

49 | Discovering neural nets with low kolmogorov complexity and high generalization capability
- Schmidhuber
- 1997
(Show Context)
Citation Context ...ption Length [40, 52]. Resource-bounded complexity is discussed in [11, 12, 14, 32, 51], resource-bounded universal probability in [39, 42, 55]. Implementations are rare and mainly due to Schmidhuber =-=[9, 54, 58, 56, 57]-=-. Excellent reviews with a philosophical touch are [41, 64]. For an older general review of inductive inference see Angluin [1]. Sequential decision theory. The other ingredient in our AIξ model is se... |

48 |
Universal sequential search problems. Problems of Information Transmission
- Levin
- 1973
(Show Context)
Citation Context ...her interesting applications can be found in [6, 59, 69]. Related topics are the Weighted Majority algorithm invented by Littlestone and Warmuth [44], universal forecasting by Vovk [68], Levin search =-=[37]-=-, PAC-learning introduced by Valiant [67] and Minimum Description Length [40, 52]. Resource-bounded complexity is discussed in [11, 12, 14, 32, 51], resource-bounded universal probability in [39, 42, ... |

43 | The discovery of algorithmic probability
- Solomonoff
- 1997
(Show Context)
Citation Context ..., 12, 14, 32, 51], resource-bounded universal probability in [39, 42, 55]. Implementations are rare and mainly due to Schmidhuber [9, 54, 58, 56, 57]. Excellent reviews with a philosophical touch are =-=[41, 64]-=-. For an older general review of inductive inference see Angluin [1]. Sequential decision theory. The other ingredient in our AIξ model is sequential decision theory. We do not need much more than the... |

40 |
An inquiry into computer understanding
- Cheeseman
- 1988
(Show Context)
Citation Context ...have to be learned by AIξ from scratch. Other theories, like fuzzy logic, possibility theory, Dempster-Shafer theory, and so on are partly outdated and partly reducible to Bayesian probability theory =-=[7, 8]-=-. The interpretation and consequences of the evidence gap g := 1− ∑ xk ξ(yx<kyxk )>0inξ may be similar to those in Dempster-Shafer theory. Boolean logical reasoning about the external world plays, at ... |

36 | The fastest and shortest algorithm for all well-defined problems
- Hutter
(Show Context)
Citation Context ...d inference rules. A proof is a sequence of formulas, where each formula is either an axiom or inferred from previous formulas in the sequence by applying the inference rules. Details can be found in =-=[25]-=- in a related construction or in any textbook on logic or proof theory, e.g. [15, 60]. We only need to know that provability and Turing Machines can be formalized. The setup time in the theorem is jus... |

30 | Self-optimizing and Pareto-optimal policies in general environments based on Bayes-mixtures
- Hutter
- 2002
(Show Context)
Citation Context ...ssive effective environments, then AIXI reduces to the Λξ predictor that has been shown to perform well. One can show that these loss/value bounds generalize to wider classes, at least asymptotically =-=[26]-=-. Promising classes are, again, the ones described in Sect. 4.3. In particular, for ergodic mdps weshowedthatAIζis self-optimizing. Obviously, the least we must demand from M to have a chance of findi... |

30 | An application of algorithmic probability to problems in artificial intelligence
- Solomonoff
- 1986
(Show Context)
Citation Context ...arp in the AIXI model. Training. We have not said much about the training process itself, as it is not specific to the AIXI model and has been discussed in literature in various forms and disciplines =-=[63, 56, 57]-=-. By a training process we mean a sequence of simple-to-complex tasks to solve, with the simpler ones helping in learning the more complex ones. A serious discussion would be out of place. To repeat a... |

26 | Towards a universal theory of artificial intelligence based on algorithmic probability and sequential decisions
- Hutter
(Show Context)
Citation Context ...ous reports or publications or the book [30] by the author: The AIξ model was first introduced and discussed in March 2000 in [19] in a 62-page-long report. More succinct descriptions were publishedin=-=[23,24]-=-.TheAIξmodelhas been argued to formally solve a number of problem classes, including sequence prediction, strategic games, function minimization, reinforcement and supervised learning [19]. A variant ... |

26 |
Universal portfolio selection
- Vovk, Watkins
- 1998
(Show Context)
Citation Context ...with Chaitin [4, 5], this was the invention of what is now called Algorithmic Information theory. For further literature and many applications see [42]. Other interesting applications can be found in =-=[6, 59, 69]-=-. Related topics are the Weighted Majority algorithm invented by Littlestone and Warmuth [44], universal forecasting by Vovk [68], Levin search [37], PAC-learning introduced by Valiant [67] and Minimu... |

25 |
On the notion of infinite pseudorandom sequences. Theoretical Computer Science
- Ko
- 1986
(Show Context)
Citation Context ...n [44, 68] for sequence prediction, and is referred to as “weighted majority”. 6.1 Time-Limited Probability Distributions In the literature one can find time-limited versions of Kolmogorov complexity =-=[11, 12, 32]-=- and the time-limited universal semimeasure [39, 42, 55]. In the following, we utilize and adapt the latter and see how far we get. One way to define a time-limited universal chronological semimeasure... |

24 |
Optimality of universal Bayesian prediction for general loss and alphabet
- Hutter
- 2001
(Show Context)
Citation Context ...s no other predictor that leads to equalorsmallerlossinall environments. Any improvement achieved by some predictor Λ over Λξ in some environments is balanced by a deterioration in other environments =-=[29]-=-. 4 The Universal Algorithmic Agent AIXI Active systems, like game playing (SG) and optimization (FM), cannot be reduced to induction systems. The main idea of this work is to generalize universal ind... |

22 | New error bounds for Solomonoff prediction
- Hutter
- 1999
(Show Context)
Citation Context ...timal [26]. The construction of a general fastest algorithm for all well-defined problems [25] arose from the construction of the time-bounded AIXItl model [23]. Convergence [28] and tight [29] error =-=[22, 20]-=- and loss [21, 27] bounds for Solomonoff’s universal sequence prediction scheme have been proven. Loosely related ideas on a market/economy-based reinforcement learner [36] and gradient-basedUniversa... |

22 | Convergence and loss bounds for Bayesian sequence prediction
- Hutter
- 2003
(Show Context)
Citation Context ...ence not only for the true k converges to ˙yµ k if ξ converges uniformly to µ, i.e. if in addition to (23) ∣ µ(yx<ky ′ kx ′ k ) − ξ(yx<ky ′ kx ′ k )∣ ∣ ∣ <c· ∣µ(yx<kyx k ) − ξ(yx<kyxk ) ∣ ′ ∀y kx ′ k =-=(27)-=- output ˙yk, but also for alternative outputs yk. ˙y ξ holds for some constant c (at least in a µ-expected sense). We call µ satisfying (27) uniform. For uniform µ one can show (26) with appropriately... |

21 | Convergence and error bounds of universal prediction for general alphabet
- Hutter
- 2001
(Show Context)
Citation Context ...timal [26]. The construction of a general fastest algorithm for all well-defined problems [25] arose from the construction of the time-bounded AIXItl model [23]. Convergence [28] and tight [29] error =-=[22, 20]-=- and loss [21, 27] bounds for Solomonoff’s universal sequence prediction scheme have been proven. Loosely related ideas on a market/economy-based reinforcement learner [36] and gradient-basedUniversa... |

18 | A Theory of Universal Artificial Intelligence based on Algorithmic Complexity
- Hutter
- 2000
(Show Context)
Citation Context ...of animals and humans is to survive and spread. The goal of AI systems should be to be useful to humans. The problem is that, except for special cases, ∗ This article grew out of the technical report =-=[19]-=- and summarizes and contains excerpts of the Springer book [30].228 Marcus Hutter we know neither the utility function nor the environment in which the agent will operate in advance. The mathematical... |

18 |
Dynamic Programming
- RE
- 1957
(Show Context)
Citation Context ...˙y ˙x<k. The expected reward for the next m−k+1 cycles (given the above history) is called the value of policy p and is given by a conditional probability: ∑ V pµ km (˙y ˙x<k) := q∈ ˙ Qk pq µ(q)Vkm . =-=(2)-=- ∑ q∈ ˙ Qk µ(q) Policy p and environment µ do not determine history ˙y ˙x<k, unlike the deterministic case, because the history is no longer deterministically determined by p and q, but depends on p a... |

17 |
Reinforcement learning: a survey
- LP, ML, et al.
- 1996
(Show Context)
Citation Context ... correct prediction (˙yk =˙zk) and 0 otherwise. One can show [19, 30] that the number of wrong of the AIξ model (30) in these environments is bounded by predictions E AIξ ∞ E AIξ ∞ × ≤ 2 K(˙z1:∞) < ∞ =-=(31)-=- for a computable deterministic environment string ˙z1 ˙z2.... The intuitive interpretation is that each wrong prediction eliminates at least one program p of size l(p) + ≤ K(˙z). The size is smaller ... |

14 |
Game-playing and game-learning automata
- Michie
- 1966
(Show Context)
Citation Context ...( ∑ xi ) and maximizing over the yi. Thishas to be done in chronological order to correctly incorporate the dependencies of xi and yi on the history. This is essentially the expectimax algorithm/tree =-=[46, 53]-=-. The AIµ model is optimal in the sense that no other policy leads to higher expected reward. The value for a general policy p can be written in the form V pµ km (yx<k) := ∑ (rk + ... + rm)µ(yx<kyxk:m... |

14 | General loss bounds for universal sequence prediction
- Hutter
- 2001
(Show Context)
Citation Context ...f µ is known, Λµ is obviously the best prediction scheme in the sense of achieving minimal expected loss (LΛµ n ≤ LΛn for any Λ). One can prove the following loss bound for the universal Λξ predictor =-=[21, 20, 27]-=- 0 ≤ L Λξ n − LΛµ n ≤ 2ln2· K(µ)+2√LΛµ n ln 2 · K(µ). (19)246 Marcus Hutter Together with Ln ≤ n this shows that 1 − 1 = O(n−1/2 ), i.e. asympnLΛξ n nLΛµ n totically Λξ achieves the optimal average l... |