Results 1 
7 of
7
On Universal Prediction and Bayesian Confirmation
 Theoretical Computer Science
, 2007
"... The Bayesian framework is a wellstudied and successful framework for inductive reasoning, which includes hypothesis testing and confirmation, parameter estimation, sequence prediction, classification, and regression. But standard statistical guidelines for choosing the model class and prior are not ..."
Abstract

Cited by 23 (13 self)
 Add to MetaCart
The Bayesian framework is a wellstudied and successful framework for inductive reasoning, which includes hypothesis testing and confirmation, parameter estimation, sequence prediction, classification, and regression. But standard statistical guidelines for choosing the model class and prior are not always available or can fail, in particular in complex situations. Solomonoff completed the Bayesian framework by providing a rigorous, unique, formal, and universal choice for the model class and the prior. I discuss in breadth how and in which sense universal (noni.i.d.) sequence prediction solves various (philosophical) problems of traditional Bayesian sequence prediction. I show that Solomonoff’s model possesses many desirable properties: Strong total and future bounds, and weak instantaneous bounds, and in contrast to most classical continuous prior densities has no zero p(oste)rior problem, i.e. can confirm universal hypotheses, is reparametrization and regrouping invariant, and avoids the oldevidence and updating problem. It even performs well
Universal Algorithmic Intelligence: A mathematical topdown approach
 Artificial General Intelligence
, 2005
"... Artificial intelligence; algorithmic probability; sequential decision theory; rational ..."
Abstract

Cited by 23 (6 self)
 Add to MetaCart
Artificial intelligence; algorithmic probability; sequential decision theory; rational
Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet
 In
, 2002
"... The Bayesian framework is ideally suited for induction problems. The probability of observing $x_t$ at time $t$, given past observations $x_1...x_{t1}$ can be computed with Bayes' rule if the true generating distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. The problem, however, is ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
The Bayesian framework is ideally suited for induction problems. The probability of observing $x_t$ at time $t$, given past observations $x_1...x_{t1}$ can be computed with Bayes' rule if the true generating distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. The problem, however, is that in many cases one does not even have a reasonable guess of the true distribution. In order to overcome this problem a universal (or mixture) distribution $\xi$ is defined as a weighted sum or integral of distributions $ u\!\in\!\M$, where $\M$ is any countable or continuous set of distributions including $\mu$. This is a generalization of Solomonoff induction, in which $\M$ is the set of all enumerable semimeasures. It is shown for several performance measures that using the universal $\xi$ as a prior is nearly as good as using the unknown true distribution $\mu$. In a sense, this solves the problem of the unknown prior in a universal way. All results are obtained for general finite alphabet. Convergence of $\xi$ to $\mu$ in a conditional mean squared sense and of $\xi/\mu\to 1$ with $\mu$ probability $1$ is proven. The number of additional errors $E_\xi$ made by the optimal universal prediction scheme based on $\xi$ minus the number of errors $E_\mu$ of the optimal informed prediction scheme based on $\mu$ is proven to be bounded by $O(\sqrt{E_\mu})$. The prediction framework is generalized to arbitrary loss functions. A system is allowed to take an action $y_t$, given $x_1...x_{t1}$ and receives loss $\ell_{x_t y_t}$ if $x_t$ is the next symbol of the sequence. No assumptions on $\ell$ are necessary, besides boundedness. Optimal universal $\Lambda_\xi$ and optimal informed $\Lambda_\mu$ prediction schemes are defined and the total loss of $\Lambda_\xi$ is bounded in terms of the total loss of $\Lambda_\mu$, similar to the error bounds. We show that the bounds are tight and that no other predictor can lead to smaller bounds. Furthermore, for various performance measures we show Paretooptimality of $\xi$ in the sense that there is no other predictor which performs better or equal in all environments $ u\in\M$ and strictly better in at least one. So, optimal predictors can (w.r.t.\ to most performance measures in expectation) be based on the mixture $\xi$. Finally we give an Occam's razor argument that Solomonoff's choice $w_ u\sim 2^{K( u)}$ for the weights is optimal, where $K( u)$ is the length of the shortest program describing $ u$. Furthermore, games of chance, defined as a sequence of bets, observations, and rewards are studied. The average profit achieved by the $\Lambda_\xi$ scheme rapidly converges to the best possible profit. The time needed to reach the winning zone is proportional to the relative entropy of $\mu$ and $\xi$. The prediction schemes presented here are compared to the weighted majority algorithm(s). Although the algorithms, the settings, and the proofs are quite different the bounds of both schemes have a very similar structure. Extensions to infinite alphabets, partial, delayed and probabilistic prediction, classification, and more active systems are briefly discussed.
On Semimeasures Predicting MartinLöf Random Sequences
, 2006
"... Solomonoff’s central result on induction is that the prediction of a universal semimeasure M converges rapidly and with probability 1 to the true sequence generating predictor µ, if the latter is computable. Hence, M is eligible as a universal sequence predictor in case of unknown µ. Despite some ne ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Solomonoff’s central result on induction is that the prediction of a universal semimeasure M converges rapidly and with probability 1 to the true sequence generating predictor µ, if the latter is computable. Hence, M is eligible as a universal sequence predictor in case of unknown µ. Despite some nearby results and proofs in the literature, the stronger result of convergence for all (MartinLöf) random sequences remained open. Such a convergence result would be particularly interesting and natural, since randomness can be defined in terms of M itself. We show that there are universal semimeasures M which do not converge to µ on all µrandom sequences, i.e. we give a partial negative answer to the open problem. We also provide a positive answer for some nonuniversal semimeasures. We define the incomputable measure D as a mixture over all computable measures and the enumerable semimeasure W as a mixture over all enumerable nearlymeasures. We show that W converges to D and D to µ on all random sequences. The Hellinger distance measuring closeness of two distributions plays a central role.
Universal convergence of semimeasures on individual random sequences, in
 Proc. 15th Int. Conf. Algorithmic Learning Theory (ALT’04), LNAI
, 2004
"... Solomonoff’s central result on induction is that the posterior of a universal semimeasure M converges rapidly and with probability 1 to the true sequence generating posterior µ, if the latter is computable. Hence, M is eligible as a universal sequence predictor in case of unknown µ. Despite some nea ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Solomonoff’s central result on induction is that the posterior of a universal semimeasure M converges rapidly and with probability 1 to the true sequence generating posterior µ, if the latter is computable. Hence, M is eligible as a universal sequence predictor in case of unknown µ. Despite some nearby results and proofs in the literature, the stronger result of convergence for all (MartinLöf) random sequences remained open. Such a convergence result would be particularly interesting and natural, since randomness can be defined in terms of M itself. We show that there are universal semimeasures M which do not converge for all random sequences, i.e. we give a partial negative answer to the open problem. We also provide a positive answer for some nonuniversal semimeasures. We define the incomputable measure D as a mixture over all computable measures and the enumerable semimeasure W as a mixture over all enumerable nearlymeasures. We show that W converges to D and D to µ on all random sequences. The Hellinger distance measuring closeness of two distributions plays a central role.
On generalized computable universal priors and their convergence
 Theoretical Computer Science
"... Solomonoff unified Occam’s razor and Epicurus ’ principle of multiple explanations to one elegant, formal, universal theory of inductive inference, which initiated the field of algorithmic information theory. His central result is that the posterior of the universal semimeasure M converges rapidly t ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Solomonoff unified Occam’s razor and Epicurus ’ principle of multiple explanations to one elegant, formal, universal theory of inductive inference, which initiated the field of algorithmic information theory. His central result is that the posterior of the universal semimeasure M converges rapidly to the true sequence generating posterior µ, if the latter is computable. Hence, M is eligible as a universal predictor in case of unknown µ. The first part of the paper investigates the existence and convergence of computable universal (semi)measures for a hierarchy of computability classes: recursive, estimable, enumerable, and approximable. For instance, M is known to be enumerable, but not estimable, and to dominate all enumerable semimeasures. We present proofs for discrete and continuous semimeasures. The second part investigates more closely the types of convergence, possibly implied by universality: in difference and in ratio, with probability 1, in mean sum, and for MartinLöf random sequences. We introduce a generalized concept of randomness for individual sequences and use it to exhibit difficulties regarding these issues. In particular, we show that convergence fails (holds) on generalizedrandom sequences in gappy (dense) Bernoulli classes.