Results 11  20
of
29
On the Existence and Convergence of Computable Universal Priors
 In Proc. 14th International Conf. on Algorithmic Learning Theory (ALT2003), volume 2842 of LNAI
, 2003
"... Solomonoff unified Occam's razor and Epicurus' principle of multiple explanations to one elegant, formal, universal theory of inductive inference, which initiated the field of algorithmic information theory. His central result is that the posterior of his universal semimeasure M converges rapidly to ..."
Abstract

Cited by 7 (7 self)
 Add to MetaCart
Solomonoff unified Occam's razor and Epicurus' principle of multiple explanations to one elegant, formal, universal theory of inductive inference, which initiated the field of algorithmic information theory. His central result is that the posterior of his universal semimeasure M converges rapidly to the true sequence generating posterior μ, if the latter is computable. Hence, M is eligible as a universal predictor in case of unknown μ. We investigate the existence and convergence of computable universal (semi)measures for a hierarchy of computability classes: finitely computable, estimable, enumerable, and approximable. For instance, M is known...
Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet
 In
, 2002
"... The Bayesian framework is ideally suited for induction problems. The probability of observing $x_t$ at time $t$, given past observations $x_1...x_{t1}$ can be computed with Bayes' rule if the true generating distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. The problem, however, is that ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
The Bayesian framework is ideally suited for induction problems. The probability of observing $x_t$ at time $t$, given past observations $x_1...x_{t1}$ can be computed with Bayes' rule if the true generating distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. The problem, however, is that in many cases one does not even have a reasonable guess of the true distribution. In order to overcome this problem a universal (or mixture) distribution $\xi$ is defined as a weighted sum or integral of distributions $ u\!\in\!\M$, where $\M$ is any countable or continuous set of distributions including $\mu$. This is a generalization of Solomonoff induction, in which $\M$ is the set of all enumerable semimeasures. It is shown for several performance measures that using the universal $\xi$ as a prior is nearly as good as using the unknown true distribution $\mu$. In a sense, this solves the problem of the unknown prior in a universal way. All results are obtained for general finite alphabet. Convergence of $\xi$ to $\mu$ in a conditional mean squared sense and of $\xi/\mu\to 1$ with $\mu$ probability $1$ is proven. The number of additional errors $E_\xi$ made by the optimal universal prediction scheme based on $\xi$ minus the number of errors $E_\mu$ of the optimal informed prediction scheme based on $\mu$ is proven to be bounded by $O(\sqrt{E_\mu})$. The prediction framework is generalized to arbitrary loss functions. A system is allowed to take an action $y_t$, given $x_1...x_{t1}$ and receives loss $\ell_{x_t y_t}$ if $x_t$ is the next symbol of the sequence. No assumptions on $\ell$ are necessary, besides boundedness. Optimal universal $\Lambda_\xi$ and optimal informed $\Lambda_\mu$ prediction schemes are defined and the total loss of $\Lambda_\xi$ is bounded in terms of the total loss of $\Lambda_\mu$, similar to the error bounds. We show that the bounds are tight and that no other predictor can lead to smaller bounds. Furthermore, for various performance measures we show Paretooptimality of $\xi$ in the sense that there is no other predictor which performs better or equal in all environments $ u\in\M$ and strictly better in at least one. So, optimal predictors can (w.r.t.\ to most performance measures in expectation) be based on the mixture $\xi$. Finally we give an Occam's razor argument that Solomonoff's choice $w_ u\sim 2^{K( u)}$ for the weights is optimal, where $K( u)$ is the length of the shortest program describing $ u$. Furthermore, games of chance, defined as a sequence of bets, observations, and rewards are studied. The average profit achieved by the $\Lambda_\xi$ scheme rapidly converges to the best possible profit. The time needed to reach the winning zone is proportional to the relative entropy of $\mu$ and $\xi$. The prediction schemes presented here are compared to the weighted majority algorithm(s). Although the algorithms, the settings, and the proofs are quite different the bounds of both schemes have a very similar structure. Extensions to infinite alphabets, partial, delayed and probabilistic prediction, classification, and more active systems are briefly discussed.
Sequential predictions based on algorithmic complexity
, 2004
"... This paper studies sequence prediction based on the monotone Kolmogorov complexity Km=−log m, i.e. based on universal deterministic/onepart MDL. m is extremely close to Solomonoff’s universal prior M, the latter being an excellent predictor in deterministic as well as probabilistic environments, wh ..."
Abstract

Cited by 6 (6 self)
 Add to MetaCart
This paper studies sequence prediction based on the monotone Kolmogorov complexity Km=−log m, i.e. based on universal deterministic/onepart MDL. m is extremely close to Solomonoff’s universal prior M, the latter being an excellent predictor in deterministic as well as probabilistic environments, where performance is measured in terms of convergence of posteriors or losses. Despite this closeness to M, it is difficult to assess the prediction quality of m, since little is known about the closeness of their posteriors, which are the important quantities for prediction. We show that for deterministic computable environments, the “posterior ” and losses of m converge, but rapid convergence could only be shown onsequence; the offsequence convergence can be slow. In probabilistic environments, neither the posterior nor the losses converge,
A Complete Theory of Everything (will be subjective)
, 2010
"... Increasingly encompassing models have been suggested for our world. Theories range from generally accepted to increasingly speculative to apparently bogus. The progression of theories from ego to geo to heliocentric models to universe and multiverse theories and beyond was accompanied by a dramat ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
Increasingly encompassing models have been suggested for our world. Theories range from generally accepted to increasingly speculative to apparently bogus. The progression of theories from ego to geo to heliocentric models to universe and multiverse theories and beyond was accompanied by a dramatic increase in the sizes of the postulated worlds, with humans being expelled from their center to ever more remote and random locations. Rather than leading to a true theory of everything, this trend faces a turning point after which the predictive power of such theories decreases (actually to zero). Incorporating the location and other capacities of the observer into such theories avoids this problem and allows to distinguish meaningful from predictively meaningless theories. This also leads to a truly complete theory of everything consisting of a (conventional objective) theory of everything plus a (novel subjective) observer process. The observer localization is neither based on the controversial anthropic principle, nor has it anything to do with the quantummechanical
New millennium AI and the convergence of history
 Challenges to Computational Intelligence
, 2007
"... Artificial Intelligence (AI) has recently become a real formal science: the new millennium brought the first mathematically sound, asymptotically optimal, universal problem solvers, providing a new, rigorous foundation for the previously largely heuristic field of General AI and embedded agents. At ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
Artificial Intelligence (AI) has recently become a real formal science: the new millennium brought the first mathematically sound, asymptotically optimal, universal problem solvers, providing a new, rigorous foundation for the previously largely heuristic field of General AI and embedded agents. At the same time there has been rapid progress in practical methods for learning true sequenceprocessing programs, as opposed to traditional methods limited to stationary pattern association. Here we will briefly review some of the new results, and speculate about future developments, pointing out that the time intervals between the most notable events in over 40,000 years or 2 9 lifetimes of human history have sped up exponentially, apparently converging to zero within the next few decades. Or is this impression just a byproduct of the way humans allocate memory space to past events? 1
On Semimeasures Predicting MartinLöf Random Sequences
, 2006
"... Solomonoff’s central result on induction is that the prediction of a universal semimeasure M converges rapidly and with probability 1 to the true sequence generating predictor µ, if the latter is computable. Hence, M is eligible as a universal sequence predictor in case of unknown µ. Despite some ne ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Solomonoff’s central result on induction is that the prediction of a universal semimeasure M converges rapidly and with probability 1 to the true sequence generating predictor µ, if the latter is computable. Hence, M is eligible as a universal sequence predictor in case of unknown µ. Despite some nearby results and proofs in the literature, the stronger result of convergence for all (MartinLöf) random sequences remained open. Such a convergence result would be particularly interesting and natural, since randomness can be defined in terms of M itself. We show that there are universal semimeasures M which do not converge to µ on all µrandom sequences, i.e. we give a partial negative answer to the open problem. We also provide a positive answer for some nonuniversal semimeasures. We define the incomputable measure D as a mixture over all computable measures and the enumerable semimeasure W as a mixture over all enumerable nearlymeasures. We show that W converges to D and D to µ on all random sequences. The Hellinger distance measuring closeness of two distributions plays a central role.
Algorithmic complexity bounds on future prediction errors
 INFORMATION AND COMPUTATION
, 2007
"... We bound the future loss when predicting any (computably) stochastic sequence online. Solomonoff finitely bounded the total deviation of his universal predictor M from the true distribution µ by the algorithmic complexity of µ. Here we assume that we are at a time t> 1 and have already observed x = ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
We bound the future loss when predicting any (computably) stochastic sequence online. Solomonoff finitely bounded the total deviation of his universal predictor M from the true distribution µ by the algorithmic complexity of µ. Here we assume that we are at a time t> 1 and have already observed x = x1...xt. We bound the future prediction performance on xt+1xt+2... by a new variant of algorithmic complexity of µ given x, plus the complexity of the randomness deficiency of x. The new complexity is monotone in its condition in the sense that this complexity can only decrease if the condition is prolonged. We also briefly discuss potential generalizations to Bayesian model classes and to classification problems.
Monotone conditional complexity bounds on future prediction errors
 In Proc. 16th International Conf. on Algorithmic Learning Theory (ALT’05), volume 3734 of LNAI
, 2005
"... We bound the future loss when predicting any (computably) stochastic sequence online. Solomonoff finitely bounded the total deviation of his universal predictor M from the true distribution µ by the algorithmic complexity of µ. Here we assume we are at a time t>1 and already observed x=x1...xt. We b ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
We bound the future loss when predicting any (computably) stochastic sequence online. Solomonoff finitely bounded the total deviation of his universal predictor M from the true distribution µ by the algorithmic complexity of µ. Here we assume we are at a time t>1 and already observed x=x1...xt. We bound the future prediction performance on xt+1xt+2... by a new variant of algorithmic complexity of µ given x, plus the complexity of the randomness deficiency of x. The new complexity is monotone in its condition in the sense that this complexity can only decrease if the condition is prolonged. We also briefly discuss potential generalizations to Bayesian model classes and to classification problems. Keywords Kolmogorov complexity, posterior bounds, online sequential prediction,
Universal convergence of semimeasures on individual random sequences, in
 Proc. 15th Int. Conf. Algorithmic Learning Theory (ALT’04), LNAI
, 2004
"... Solomonoff’s central result on induction is that the posterior of a universal semimeasure M converges rapidly and with probability 1 to the true sequence generating posterior µ, if the latter is computable. Hence, M is eligible as a universal sequence predictor in case of unknown µ. Despite some nea ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Solomonoff’s central result on induction is that the posterior of a universal semimeasure M converges rapidly and with probability 1 to the true sequence generating posterior µ, if the latter is computable. Hence, M is eligible as a universal sequence predictor in case of unknown µ. Despite some nearby results and proofs in the literature, the stronger result of convergence for all (MartinLöf) random sequences remained open. Such a convergence result would be particularly interesting and natural, since randomness can be defined in terms of M itself. We show that there are universal semimeasures M which do not converge for all random sequences, i.e. we give a partial negative answer to the open problem. We also provide a positive answer for some nonuniversal semimeasures. We define the incomputable measure D as a mixture over all computable measures and the enumerable semimeasure W as a mixture over all enumerable nearlymeasures. We show that W converges to D and D to µ on all random sequences. The Hellinger distance measuring closeness of two distributions plays a central role.