Results 1  10
of
22
The Speed Prior: A New Simplicity Measure Yielding NearOptimal Computable Predictions
 Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT 2002), Lecture Notes in Artificial Intelligence
, 2002
"... Solomonoff's optimal but noncomputable method for inductive inference assumes that observation sequences x are drawn from an recursive prior distribution p(x). Instead of using the unknown p() he predicts using the celebrated universal enumerable prior M() which for all exceeds any recursiv ..."
Abstract

Cited by 63 (21 self)
 Add to MetaCart
(Show Context)
Solomonoff's optimal but noncomputable method for inductive inference assumes that observation sequences x are drawn from an recursive prior distribution p(x). Instead of using the unknown p() he predicts using the celebrated universal enumerable prior M() which for all exceeds any recursive p(), save for a constant factor independent of x. The simplicity measure M() naturally implements "Occam's razor " and is closely related to the Kolmogorov complexity of . However, M assigns high probability to certain data that are extremely hard to compute. This does not match our intuitive notion of simplicity. Here we suggest a more plausible measure derived from the fastest way of computing data. In absence of contrarian evidence, we assume that the physical world is generated by a computational process, and that any possibly infinite sequence of observations is therefore computable in the limit (this assumption is more radical and stronger than Solomonoff's).
Universal Algorithmic Intelligence: A mathematical topdown approach
 Artificial General Intelligence
, 2005
"... Artificial intelligence; algorithmic probability; sequential decision theory; rational ..."
Abstract

Cited by 31 (7 self)
 Add to MetaCart
(Show Context)
Artificial intelligence; algorithmic probability; sequential decision theory; rational
Convergence and Loss Bounds for Bayesian Sequence Prediction
 In
, 2003
"... The probability of observing $x_t$ at time $t$, given past observations $x_1...x_{t1}$ can be computed with Bayes rule if the true generating distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. If $\mu$ is unknown, but known to belong to a class $M$ one can base ones prediction on the Baye ..."
Abstract

Cited by 22 (21 self)
 Add to MetaCart
(Show Context)
The probability of observing $x_t$ at time $t$, given past observations $x_1...x_{t1}$ can be computed with Bayes rule if the true generating distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. If $\mu$ is unknown, but known to belong to a class $M$ one can base ones prediction on the Bayes mix $\xi$ defined as a weighted sum of distributions $ u\in M$. Various convergence results of the mixture posterior $\xi_t$ to the true posterior $\mu_t$ are presented. In particular a new (elementary) derivation of the convergence $\xi_t/\mu_t\to 1$ is provided, which additionally gives the rate of convergence. A general sequence predictor is allowed to choose an action $y_t$ based on $x_1...x_{t1}$ and receives loss $\ell_{x_t y_t}$ if $x_t$ is the next symbol of the sequence. No assumptions are made on the structure of $\ell$ (apart from being bounded) and $M$. The Bayesoptimal prediction scheme $\Lambda_\xi$ based on mixture $\xi$ and the Bayesoptimal informed prediction scheme $\Lambda_\mu$ are defined and the total loss $L_\xi$ of $\Lambda_\xi$ is bounded in terms of the total loss $L_\mu$ of $\Lambda_\mu$. It is shown that $L_\xi$ is bounded for bounded $L_\mu$ and $L_\xi/L_\mu\to 1$ for $L_\mu\to \infty$. Convergence of the instantaneous losses is also proven.
The New AI: General & Sound & Relevant for Physics
 ARTIFICIAL GENERAL INTELLIGENCE (ACCEPTED 2002)
, 2003
"... Most traditional artificial intelligence (AI) systems of the past 50 years are either very limited, or based on heuristics, or both. The new millennium, however, has brought substantial progress in the field of theoretically optimal and practically feasible algorithms for prediction, search, induct ..."
Abstract

Cited by 18 (9 self)
 Add to MetaCart
Most traditional artificial intelligence (AI) systems of the past 50 years are either very limited, or based on heuristics, or both. The new millennium, however, has brought substantial progress in the field of theoretically optimal and practically feasible algorithms for prediction, search, inductive inference based on Occam’s razor, problem solving, decision making, and reinforcement learning in environments of a very general type. Since inductive inference is at the heart of all inductive sciences, some of the results are relevant not only for AI and computer science but also for physics, provoking nontraditional predictions based on Zuse’s thesis of the computergenerated universe.
Sequence prediction based on monotone complexity
 In Proc. 16th Annual Conference on Learning Theory (COLT’03), volume 2777 of LNAI
, 2003
"... This paper studies sequence prediction based on the monotone Kolmogorov complexity Km=−log m, i.e. based on universal deterministic/onepart MDL. m is extremely close to Solomonoff’s prior M, the latter being an excellent predictor in deterministic as well as probabilistic environments, where perfor ..."
Abstract

Cited by 14 (14 self)
 Add to MetaCart
This paper studies sequence prediction based on the monotone Kolmogorov complexity Km=−log m, i.e. based on universal deterministic/onepart MDL. m is extremely close to Solomonoff’s prior M, the latter being an excellent predictor in deterministic as well as probabilistic environments, where performance is measured in terms of convergence of posteriors or losses. Despite this closeness to M, it is difficult to assess the prediction quality of m, since little is known about the closeness of their posteriors, which are the important quantities for prediction. We show that for deterministic computable environments, the “posterior ” and losses of m converge, but rapid convergence could only be shown onsequence; the offsequence behavior is unclear. In probabilistic environments, neither the posterior nor the losses converge, in general.
On the Existence and Convergence of Computable Universal Priors
 In Proc. 14th International Conf. on Algorithmic Learning Theory (ALT2003), volume 2842 of LNAI
, 2003
"... Solomonoff unified Occam's razor and Epicurus' principle of multiple explanations to one elegant, formal, universal theory of inductive inference, which initiated the field of algorithmic information theory. His central result is that the posterior of his universal semimeasure M converges ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
(Show Context)
Solomonoff unified Occam's razor and Epicurus' principle of multiple explanations to one elegant, formal, universal theory of inductive inference, which initiated the field of algorithmic information theory. His central result is that the posterior of his universal semimeasure M converges rapidly to the true sequence generating posterior &mu;, if the latter is computable. Hence, M is eligible as a universal predictor in case of unknown &mu;. We investigate the existence and convergence of computable universal (semi)measures for a hierarchy of computability classes: finitely computable, estimable, enumerable, and approximable. For instance, M is known...
Open problems in universal induction & intelligence
 Algorithms
, 2009
"... algorithms ..."
(Show Context)
Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet
 In
, 2002
"... The Bayesian framework is ideally suited for induction problems. The probability of observing $x_t$ at time $t$, given past observations $x_1...x_{t1}$ can be computed with Bayes' rule if the true generating distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. The problem, however, is ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
The Bayesian framework is ideally suited for induction problems. The probability of observing $x_t$ at time $t$, given past observations $x_1...x_{t1}$ can be computed with Bayes' rule if the true generating distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. The problem, however, is that in many cases one does not even have a reasonable guess of the true distribution. In order to overcome this problem a universal (or mixture) distribution $\xi$ is defined as a weighted sum or integral of distributions $ u\!\in\!\M$, where $\M$ is any countable or continuous set of distributions including $\mu$. This is a generalization of Solomonoff induction, in which $\M$ is the set of all enumerable semimeasures. It is shown for several performance measures that using the universal $\xi$ as a prior is nearly as good as using the unknown true distribution $\mu$. In a sense, this solves the problem of the unknown prior in a universal way. All results are obtained for general finite alphabet. Convergence of $\xi$ to $\mu$ in a conditional mean squared sense and of $\xi/\mu\to 1$ with $\mu$ probability $1$ is proven. The number of additional errors $E_\xi$ made by the optimal universal prediction scheme based on $\xi$ minus the number of errors $E_\mu$ of the optimal informed prediction scheme based on $\mu$ is proven to be bounded by $O(\sqrt{E_\mu})$. The prediction framework is generalized to arbitrary loss functions. A system is allowed to take an action $y_t$, given $x_1...x_{t1}$ and receives loss $\ell_{x_t y_t}$ if $x_t$ is the next symbol of the sequence. No assumptions on $\ell$ are necessary, besides boundedness. Optimal universal $\Lambda_\xi$ and optimal informed $\Lambda_\mu$ prediction schemes are defined and the total loss of $\Lambda_\xi$ is bounded in terms of the total loss of $\Lambda_\mu$, similar to the error bounds. We show that the bounds are tight and that no other predictor can lead to smaller bounds. Furthermore, for various performance measures we show Paretooptimality of $\xi$ in the sense that there is no other predictor which performs better or equal in all environments $ u\in\M$ and strictly better in at least one. So, optimal predictors can (w.r.t.\ to most performance measures in expectation) be based on the mixture $\xi$. Finally we give an Occam's razor argument that Solomonoff's choice $w_ u\sim 2^{K( u)}$ for the weights is optimal, where $K( u)$ is the length of the shortest program describing $ u$. Furthermore, games of chance, defined as a sequence of bets, observations, and rewards are studied. The average profit achieved by the $\Lambda_\xi$ scheme rapidly converges to the best possible profit. The time needed to reach the winning zone is proportional to the relative entropy of $\mu$ and $\xi$. The prediction schemes presented here are compared to the weighted majority algorithm(s). Although the algorithms, the settings, and the proofs are quite different the bounds of both schemes have a very similar structure. Extensions to infinite alphabets, partial, delayed and probabilistic prediction, classification, and more active systems are briefly discussed.
Sequential predictions based on algorithmic complexity
, 2004
"... This paper studies sequence prediction based on the monotone Kolmogorov complexity Km=−log m, i.e. based on universal deterministic/onepart MDL. m is extremely close to Solomonoff’s universal prior M, the latter being an excellent predictor in deterministic as well as probabilistic environments, wh ..."
Abstract

Cited by 6 (6 self)
 Add to MetaCart
This paper studies sequence prediction based on the monotone Kolmogorov complexity Km=−log m, i.e. based on universal deterministic/onepart MDL. m is extremely close to Solomonoff’s universal prior M, the latter being an excellent predictor in deterministic as well as probabilistic environments, where performance is measured in terms of convergence of posteriors or losses. Despite this closeness to M, it is difficult to assess the prediction quality of m, since little is known about the closeness of their posteriors, which are the important quantities for prediction. We show that for deterministic computable environments, the “posterior ” and losses of m converge, but rapid convergence could only be shown onsequence; the offsequence convergence can be slow. In probabilistic environments, neither the posterior nor the losses converge,