## General Loss Bounds for Universal Sequence Prediction (2001)

### Cached

### Download Links

- [ftp.idsia.ch]
- [arxiv.org]
- [arxiv.org]
- [www.hutter1.net]
- DBLP

### Other Repositories/Bibliography

Citations: | 14 - 9 self |

### BibTeX

@MISC{Hutter01generalloss,

author = {Marcus Hutter},

title = {General Loss Bounds for Universal Sequence Prediction},

year = {2001}

}

### Years of Citing Articles

### OpenURL

### Abstract

The Bayesian framework is ideally suited for induction problems. The probability of observing $x_k$ at time $k$, given past observations $x_1...x_{k-1}$ can be computed with Bayes' rule if the true distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. The problem, however, is that in many cases one does not even have a reasonable estimate of the true distribution. In order to overcome this problem a universal distribution $\xi$ is defined as a weighted sum of distributions $\mu_i\in M$, where $M$ is any countable set of distributions including $\mu$. This is a generalization of Solomonoff induction, in which $M$ is the set of all enumerable semi-measures. Systems which predict $y_k$, given $x_1...x_{k-1}$ and which receive loss $l_{x_k y_k}$ if $x_k$ is the true next symbol of the sequence are considered. It is proven that using the universal $\xi$ as a prior is nearly as good as using the unknown true distribution $\mu$. Furthermore, games of chance, defined as a sequence of bets, observations, and rewards are studied. The time needed to reach the winning zone is estimated. Extensions to arbitrary alphabets, partial and delayed prediction, and more active systems are discussed.

### Citations

1688 | An Introduction to Kolmogorov Complexity and Its Applications
- Li, Vitányi
- 1997
(Show Context)
Citation Context ...the performance of ξ relative to µ, and to apply the results to games of chance. Details and proofs can be found in [Hut01]. There are good introductions and surveys of Solomonoff sequence prediction =-=[LV97]-=-, inductive inference [AS83, Sol97], reasoning under uncertainty [Grü98], and competitive online statistics [Vov99] with interesting relations to this work. See [Hut01] and subsection 5.4 for details.... |

670 | The Weighted Majority Algorithm - Littlestone, Warmuth - 1994 |

523 | Three approaches to the quantitative definition of information - Kolmogorov - 1965 |

499 | Stochastic Complexity - RISSANEN - 1989 |

314 | How to use expert advice - Cesa-Bianchi, Freund, et al. - 1997 |

306 | Inductive inference: theory and methods - Angluin, Smith - 1983 |

183 | The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms - Zvonkin, Levin - 1970 |

171 | Statistical theory: the prequential approach - Dawid - 1984 |

156 | Universal prediction of individual sequences
- Feder, Merhav, et al.
- 1992
(Show Context)
Citation Context ...ted by S. If one considers only finite-state automata instead of general Turing machines, one can attain a quickly computable, universal finite-state prediction scheme similar to that of Feder et al. =-=[FMG92]-=-, which itself is related to the famous Lempel-Ziv data compression algorithm. If one has extra knowledge on the source generating the sequence, one might further reduce M and increase w. A detailed a... |

127 |
Complexity-based induction systems: comparisons and convergence theorems
- Solomonoff
- 1978
(Show Context)
Citation Context ...bstitution for the usually unknown µ. The framework can easily 1 We use the term distribution slightly unprecisely for a probability measure. 1be generalized to other probability classes and weights =-=[Sol78]-=-. 1.3 Contents The main aim of this work is to prove expected loss bounds for general loss functions which measure the performance of ξ relative to µ, and to apply the results to games of chance. Deta... |

118 | Universal sequential search problems - Levin - 1973 |

101 | Randomness Conservation Inequalities: Information and Independence in Mathematical Theories - Levin - 1984 |

74 | Sequential prediction of individual sequences under general loss functions
- Haussler, Kivinen, et al.
- 1998
(Show Context)
Citation Context ...n the set of environments M. The basic pβn algorithm has been extended in different directions: incorporation of different initial weights (|E| → ln 1 wi ) [LW89, Vov92], more general loss functions =-=[HKW98]-=-, continuous valued outcomes [HKW98], and multi-dimensional predictions [KW99] (but not yet for the absolute loss). The works of Yamanishi [Yam97] and [Yam98] lie somewhat in between WM and this work;... |

66 | Minimum description length induction, Bayesianism, and Kolmogorov complexity
- Vitányi, Li
(Show Context)
Citation Context ...el of the environment. Learning and exploitation are melted together in the framework of universal Bayesian prediction. A separation of these two aspects in the spirit of hypothesis learning with MDL =-=[VL00]-=- could lead to new insights. The attempt at an information theoretic interpretation of Theorem 4 may be made more rigorous in this or another way. In the end, this may lead to a simpler proof of Theor... |

63 | Competitive on-line statistics
- Vovk
(Show Context)
Citation Context ...n [Hut01]. There are good introductions and surveys of Solomonoff sequence prediction [LV97], inductive inference [AS83, Sol97], reasoning under uncertainty [Grü98], and competitive online statistics =-=[Vov99]-=- with interesting relations to this work. See [Hut01] and subsection 5.4 for details. Section 2 explains notation and defines the generalized universal distribution ξ as the wµi weighted sum of probab... |

58 |
Inductive reasoning and Kolmogorov complexity
- Li, Vitányi
- 1992
(Show Context)
Citation Context ...y the results to games of chance. Details and proofs can be found in [Hut01]. For an excellent introduction to Solomonoff induction one should consult the book of Li and Vitányi [LV97] or the article =-=[LV92]-=- for a short course. Historical 1 We use the term distribution (slightly unprecise) for a probability measure 1surveys of inductive reasoning/inference can be found in [AS83, Sol97]. Section 2 explai... |

57 | Averaging expert predictions
- KIVINEN, WARMUTH
- 1999
(Show Context)
Citation Context ...rent directions: incorporation of different initial weights (|E| → ln 1 wi ) [LW89, Vov92], more general loss functions [HKW98], continuous valued outcomes [HKW98], and multi-dimensional predictions =-=[KW99]-=- (but not yet for the absolute loss). The works of Yamanishi [Yam97] and [Yam98] lie somewhat in between WM and this work; “WM” techniques are used to prove expected loss bounds (but only for sequence... |

49 |
A formal theory of inductive inference: Part 1 and 2
- Solomonoff
- 1964
(Show Context)
Citation Context ...any cases we do not even have a reasonable guess of the true distribution µ. What is the true probability of weather sequences, stock charts, or sunrises? 1.2 Universal Sequence Prediction Solomonoff =-=[Sol64]-=- had the idea to define a universal probability distribution 1 ξ as a weighted average over all possible computable probability distributions. Lower weights were assigned to more complex distributions... |

49 |
A decision-theoretic extension of stochastic complexity and its applications to learning
- Yamanishi
- 1998
(Show Context)
Citation Context ...LW89, Vov92], more general loss functions [HKW98], continuous valued outcomes [HKW98], and multi-dimensional predictions [KW99] (but not yet for the absolute loss). The works of Yamanishi [Yam97] and =-=[Yam98]-=- lie somewhat in between WM and this work; “WM” techniques are used to prove expected loss bounds (but only for sequences of independent symbols/experiments and different classes of loss functions). F... |

40 | The discovery of algorithmic probability - Solomonoff - 1997 |

24 |
Optimality of universal Bayesian prediction for general loss and alphabet
- Hutter
- 2001
(Show Context)
Citation Context ...this work is to prove expected loss bounds for general loss functions which measure the performance of ξ relative to µ, and to apply the results to games of chance. Details and proofs can be found in =-=[Hut01]-=-. There are good introductions and surveys of Solomonoff sequence prediction [LV97], inductive inference [AS83, Sol97], reasoning under uncertainty [Grü98], and competitive online statistics [Vov99] w... |

18 | Three approaches to the quantitative de of information - Kolmogorov - 1965 |

18 | A theory of universal artificial intelligence based on algorithmic complexity
- Hutter
- 2000
(Show Context)
Citation Context ..., which itself leads to some reward or loss. If the action itself can influence the environment we enter the domain of acting agents which has been analyzed in the context of universal probability in =-=[Hut00]-=-. To stay in the framework of (passive) prediction we have to assume that the action itself does not influence the environment. Let lxtyt ∈IR be the received loss when taking action yt ∈ Y and xt ∈ {0... |

17 | Complexity-based induction systems: comparison and convergence theorems - Solomono - 1978 |

12 |
et al. How to use expert advice
- Cesa-Bianchi
- 1997
(Show Context)
Citation Context ...loss Lε(x):= ∑n t=1 |xt −xε t | of the best expert ε ∈ E have been proven. It is possible to fine tune β and to eliminate the necessity of knowing n in advance. The most general bound of this kind is =-=[Ces97]-=- Lp(x) ≤ Lε(x) + 2.8 ln |E| + 4 √ Lε(x)ln |E|. (18) It is interesting that our bound in Theorem 2 (with Hn ≤ ln |M| for uniform weights) has a quite similar structure as this bound, although the algor... |

12 |
Universal forecasting algorithms
- Vovk
- 1992
(Show Context)
Citation Context ...o be proven. 5.4 The Weighted Majority Algorithm(s) The Weighted Majority (WM) algorithm is a related universal forecasting algorithm. It was invented by Littlestone and Warmuth [LW89, LW94] and Vovk =-=[Vov92]-=- and further developed in [Ces97, HKW98, KW99] and others. Many variations known by many names have meanwhile been invented. Early works in this direction are [Daw84, Ris89]. See [Vov99] for a review ... |

9 | Algorithmic information and evolution - Chaitin - 1991 |

8 | Recursively enumerable reals and Chaitin ! numbers - Calude, Hertling, et al. - 1998 |

8 | A formal theory of inductive inference: Part 1 and 2 - Solomono - 1964 |

6 |
The Minimum Discription Length Principle and Reasoning under Uncertainty
- Grünwald
- 1998
(Show Context)
Citation Context ...f chance. Details and proofs can be found in [Hut01]. There are good introductions and surveys of Solomonoff sequence prediction [LV97], inductive inference [AS83, Sol97], reasoning under uncertainty =-=[Grü98]-=-, and competitive online statistics [Vov99] with interesting relations to this work. See [Hut01] and subsection 5.4 for details. Section 2 explains notation and defines the generalized universal distr... |

5 |
Algorithmic theories of everything. (Report IDSIA-20-00
- Schmidhuber
- 2000
(Show Context)
Citation Context ...to include all enumerable semi-measures, we attain Solomonoff’s [Sol64, Sol78] universal probability, apart from normalization, which has to be treated differently in this case. Recently, Schmidhuber =-=[Sch00]-=- has further enlarged M to include all cumulatively enumerable semi-measures. In all cases, ξ is not finitely computable, but can still be approximated to arbitrary but not prespecifiable precision. I... |

4 | A theory of universal arti intelligence based on algorithmic complexity (Technical Report - Hutter - 2000 |

4 |
Optimality of universal prediction for general loss and alphabet
- Hutter
(Show Context)
Citation Context ...ility classes and weights, to prove bounds for general loss functions which measure the performance of ξ relative to µ, and to apply the results to games of chance. Details and proofs can be found in =-=[Hut01]-=-. For an excellent introduction to Solomonoff induction one should consult the book of Li and Vitányi [LV97] or the article [LV92] for a short course. Historical 1 We use the term distribution (slight... |

4 | The discovery of algorithmic probability - Solomono - 1997 |

2 |
On-line maximum likelihood prediction with respect to general loss functions
- Yamanishi
- 1997
(Show Context)
Citation Context ... ln 1 wi ) [LW89, Vov92], more general loss functions [HKW98], continuous valued outcomes [HKW98], and multi-dimensional predictions [KW99] (but not yet for the absolute loss). The works of Yamanishi =-=[Yam97]-=- and [Yam98] lie somewhat in between WM and this work; “WM” techniques are used to prove expected loss bounds (but only for sequences of independent symbols/experiments and different classes of loss f... |

2 | Calude et al. Recursively enumerable reals and Chaitin Ω numbers - S - 1998 |