## On the foundations of universal sequence prediction (2006)

Venue: | In Proc. 3rd Annual Conference on Theory and Applications of Models of Computation (TAMC’06), volume 3959 of LNCS |

Citations: | 11 - 4 self |

### BibTeX

@INPROCEEDINGS{Hutter06onthe,

author = {Marcus Hutter},

title = {On the foundations of universal sequence prediction},

booktitle = {In Proc. 3rd Annual Conference on Theory and Applications of Models of Computation (TAMC’06), volume 3959 of LNCS},

year = {2006},

pages = {408--420},

publisher = {Springer}

}

### Years of Citing Articles

### OpenURL

### Abstract

Solomonoff completed the Bayesian framework by providing a rigorous, unique, formal, and universal choice for the model class and the prior. We discuss in breadth how and in which sense universal (non-i.i.d.) sequence prediction solves various (philosophical) problems of traditional Bayesian sequence prediction. We show that Solomonoff’s model possesses many desirable properties: Fast convergence and strong bounds, and in contrast to most classical continuous prior densities has no zero p(oste)rior problem, i.e. can confirm universal hypotheses, is reparametrization and regrouping invariant, and avoids the old-evidence and updating problem. It even performs well (actually better) in non-computable environments.

### Citations

1682 | An Introduction to Kolmogorov Complexity and its Applications
- Li, Vitányi
- 1997
(Show Context)
Citation Context ...s). Let us choose some universal (so-called prefix) Turing machine U with binary input=program tape, Xary output tape, and bidirectional work tape. We can then define the prefix Kolmogorov complexity =-=[LV97]-=- of string x as the length ℓ of the shortest binary program p for which U outputs x: K(x) := min p {ℓ(p) : U(p) = x}. For non-string objects o (like numbers and functions) we define K(o) := K(〈o〉), wh... |

404 |
A formal theory of inductive inference
- Solomonoff
- 1964
(Show Context)
Citation Context ...f in ν or should be non-informative or neutral or objective if no prior knowledge is available. But these are informal and ambiguous considerations outside the formal Bayesian framework. Solomonoff’s =-=[Sol64]-=- rigorous, essentially unique, formal, and universal solution to this problem is to consider a single large universal class MU suitable for all induction problems. The corresponding universal prior w ... |

184 | The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms
- Zvonkin, Levin
- 1970
(Show Context)
Citation Context ...s not enumerable, since the class of computable functions f :X ∗→IR is not enumerable (halting problem), nor is it decidable whether a function is a measure. Hence ξ is completely incomputable. Levin =-=[ZL70]-=- had the idea to “slightly” extend the class and include also lower semi-computable semimeasures. One can show that this class MU ={ν1,ν2,...} is enumerable, hence ξU(x) = � ν(x) (10) ν∈MU is itself l... |

179 | Clustering by compression - Cilibrasi, Vitányi |

173 |
Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability
- Hutter
(Show Context)
Citation Context ...[st] ≤ Dn(µ||ξ) := E[ln µ(ω1:n) ] ≤ ln w−1 ξ(ω1:n) µ , (4) where st = st(µ t,ξ t) can be the squared Euclidian or Hellinger or absolute or KL distance between µ t and ξ t, or the squared Bayes-regret =-=[Hut04]-=-. The first inequality actually holds for any two (semi)measures, and the last inequality follows from (2). These bounds (with n=∞) imply ξ(xt|ω<t) − µ(xt|ω<t) → 0 for any xt rapid w.p.1 for t → ∞. On... |

162 | The selection of prior distributions by formal rules - Kass, Wasserman - 1996 |

127 | Complexity-based induction systems: comparisons and convergence theorems - Solomonoff - 1978 |

123 |
Inferences from multinomial data: learning about a bag of marbles (with discussion
- Walley
- 1996
(Show Context)
Citation Context ...w(θ)∝1−θ is not uniform, i.e. the indifference principle is not invariant under splitting/grouping, or general regrouping. Regrouping invariance is regarded as a very important and desirable property =-=[Wal96]-=-. We now consider general i.i.d. processes νθ(x) = �d i=1θni i . Dirichlet priors w(θ) ∝ �d i=1θαi−1 i form a natural conjugate class (w(θ|x) ∝ �d i=1θni+αi−1 i ) and are the default priors for multin... |

112 |
or bust? A critical examination of Bayesian con rmation theory
- Earman, \Bayes
- 1992
(Show Context)
Citation Context ...radiction. What we are interested in is a formal general theory for prediction. Bayesian sequence prediction. The Bayesian framework is the most consistent and successful framework developed thus far =-=[Ear93]-=-. A Bayesian considers a set of environments=hypotheses=models M which includes the true data generating probability distribution µ. From one’s prior belief wν in environment ν ∈M and the observed dat... |

107 | Information-theoretic asymptotics of bayes methods
- Clarke, Barron
- 1990
(Show Context)
Citation Context ... group of the problem (like permutations for finite M or translations for Θ=IR) and require the prior to be invariant under group transformations. Another solution is the minimax approach by Bernardo =-=[CB90]-=- which minimizes (the quite tight) bound (6) for the worst µ∈M. Choice w(θ)∝ � det¯jn(θ) equalizes and hence minimizes (6). Problems are that there may be no obvious symmetry, the resulting prior can ... |

86 |
2005): Statistical and Inductive Inference by Minimum Message
- Wallace
(Show Context)
Citation Context ... but since they are only semi-computable, they have to be (crudely) approximated in practice. Levin complexity [LV97], Schmidhuber’s speed prior, the minimal message and description length principles =-=[Wal05]-=-, and off-the-shelf compressors like Lempel-Ziv are such approximations, which have been successfully applied to a plethora of problems [CV05, Sch04]. Summary. We compared traditional Bayesian sequenc... |

62 | Optimal ordered problem solver - Schmidhuber - 2004 |

38 | Hierarchies of generalized Kolmogorov complexities and nonenumerable universal measures computable in the limit
- Schmidhuber
(Show Context)
Citation Context ...r most n by (7b) and (7c), most νn have prior approximately reciprocal to their index n. In some sense MU is the largest class of environments for which ξ is in some sense computable [Hut04], but see =-=[Sch02]-=- for even larger classes. The problem of old evidence. An important problem in Bayesian inference in general and (Bayesian) confirmation theory [Ear93] in particular is how to deal with ‘old evidence’... |

12 |
On the convergence speed of MDL predictions for Bernoulli sequences
- Poland, Hutter
- 2004
(Show Context)
Citation Context ...ssign non-zero prior to all computable θ. This is another motivation for the universal prior wU θ =2−K(θ) (8) constructed in Section 3. It is difficult but not impossible to operate with such a prior =-=[PH04]-=-. One may want to mix the discrete prior wU ν with a continuous (e.g. uniform) prior density, so that the set of non-computable θ keeps a non-zero density. Although possible, we will see that this is ... |