Results 1  10
of
18
Universal Algorithmic Intelligence: A mathematical topdown approach
 Artificial General Intelligence
, 2005
"... Artificial intelligence; algorithmic probability; sequential decision theory; rational ..."
Abstract

Cited by 22 (6 self)
 Add to MetaCart
Artificial intelligence; algorithmic probability; sequential decision theory; rational
On Universal Prediction and Bayesian Confirmation
 Theoretical Computer Science
, 2007
"... The Bayesian framework is a wellstudied and successful framework for inductive reasoning, which includes hypothesis testing and confirmation, parameter estimation, sequence prediction, classification, and regression. But standard statistical guidelines for choosing the model class and prior are not ..."
Abstract

Cited by 22 (13 self)
 Add to MetaCart
The Bayesian framework is a wellstudied and successful framework for inductive reasoning, which includes hypothesis testing and confirmation, parameter estimation, sequence prediction, classification, and regression. But standard statistical guidelines for choosing the model class and prior are not always available or can fail, in particular in complex situations. Solomonoff completed the Bayesian framework by providing a rigorous, unique, formal, and universal choice for the model class and the prior. I discuss in breadth how and in which sense universal (noni.i.d.) sequence prediction solves various (philosophical) problems of traditional Bayesian sequence prediction. I show that Solomonoff’s model possesses many desirable properties: Strong total and future bounds, and weak instantaneous bounds, and in contrast to most classical continuous prior densities has no zero p(oste)rior problem, i.e. can confirm universal hypotheses, is reparametrization and regrouping invariant, and avoids the oldevidence and updating problem. It even performs well
Adaptive online prediction by following the perturbed leader
 Journal of Machine Learning Research
, 2005
"... When applying aggregating strategies to Prediction with Expert Advice, the learning rate must be adaptively tuned. The natural choice of � complexity/current loss renders the analysis of Weighted Majority derivatives quite complicated. In particular, for arbitrary weights there have been no results ..."
Abstract

Cited by 21 (9 self)
 Add to MetaCart
When applying aggregating strategies to Prediction with Expert Advice, the learning rate must be adaptively tuned. The natural choice of � complexity/current loss renders the analysis of Weighted Majority derivatives quite complicated. In particular, for arbitrary weights there have been no results proven so far. The analysis of the alternative “Follow the Perturbed Leader ” (FPL) algorithm from Kalai and Vempala (2003) based on Hannan’s algorithm is easier. We derive loss bounds for adaptive learning rate and both finite expert classes with uniform weights and countable expert classes with arbitrary weights. For the former setup, our loss bounds match the best known results so far, while for the latter our results are new.
Sequence prediction based on monotone complexity
 In Proc. 16th Annual Conference on Learning Theory (COLT’03), volume 2777 of LNAI
, 2003
"... This paper studies sequence prediction based on the monotone Kolmogorov complexity Km=−log m, i.e. based on universal deterministic/onepart MDL. m is extremely close to Solomonoff’s prior M, the latter being an excellent predictor in deterministic as well as probabilistic environments, where perfor ..."
Abstract

Cited by 14 (14 self)
 Add to MetaCart
This paper studies sequence prediction based on the monotone Kolmogorov complexity Km=−log m, i.e. based on universal deterministic/onepart MDL. m is extremely close to Solomonoff’s prior M, the latter being an excellent predictor in deterministic as well as probabilistic environments, where performance is measured in terms of convergence of posteriors or losses. Despite this closeness to M, it is difficult to assess the prediction quality of m, since little is known about the closeness of their posteriors, which are the important quantities for prediction. We show that for deterministic computable environments, the “posterior ” and losses of m converge, but rapid convergence could only be shown onsequence; the offsequence behavior is unclear. In probabilistic environments, neither the posterior nor the losses converge, in general.
A Philosophical Treatise of Universal Induction
 Entropy 2011
"... Understanding inductive reasoning is a problem that has engaged mankind for thousands of years. This problem is relevant to a wide range of fields and is integral to the philosophy of science. It has been tackled by many great minds ranging from philosophers to scientists to mathematicians, and more ..."
Abstract

Cited by 11 (7 self)
 Add to MetaCart
Understanding inductive reasoning is a problem that has engaged mankind for thousands of years. This problem is relevant to a wide range of fields and is integral to the philosophy of science. It has been tackled by many great minds ranging from philosophers to scientists to mathematicians, and more recently computer scientists. In this article we argue the case for Solomonoff Induction, a formal inductive framework which combines algorithmic information theory with the Bayesian framework. Although it achieves excellent theoretical results and is based on solid philosophical foundations, the requisite technical knowledge necessary for understanding this framework has caused it to remain largely unknown and unappreciated in the wider scientific community. The main contribution of this article is to convey Solomonoff induction and its related concepts in a generally accessible form with the aim of bridging this current technical gap. In the process we examine the major historical contributions that have led to the formulation of Solomonoff Induction as well as criticisms of Solomonoff and induction in general. In particular we examine how Solomonoff induction addresses many issues that have plagued other inductive systems, such as the black ravens paradox and the confirmation problem, and compare this approach with other recent approaches.
J.: Prediction with expert advice by following the perturbed leader for general weights
 SpringerVerlag Berlin Heidelberg
, 2004
"... When applying aggregating strategies to Prediction with Expert Advice, the learning rate must be adaptively tuned. The natural choice of complexity/current loss renders the analysis of Weighted Majority derivatives quite complicated. In particular, for arbitrary weights there have been no results pr ..."
Abstract

Cited by 10 (7 self)
 Add to MetaCart
When applying aggregating strategies to Prediction with Expert Advice, the learning rate must be adaptively tuned. The natural choice of complexity/current loss renders the analysis of Weighted Majority derivatives quite complicated. In particular, for arbitrary weights there have been no results proven so far. The analysis of the alternative “Follow the Perturbed Leader ” (FPL) algorithm from [KV03] (based on Hannan’s algorithm) is easier. We derive loss bounds for adaptive learning rate and both finite expert classes with uniform weights and countable expert classes with arbitrary weights. For the former setup, our loss bounds match the best known results so far, while for the latter our results are (to our knowledge) new.
Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet
 In
, 2002
"... The Bayesian framework is ideally suited for induction problems. The probability of observing $x_t$ at time $t$, given past observations $x_1...x_{t1}$ can be computed with Bayes' rule if the true generating distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. The problem, however, is that ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
The Bayesian framework is ideally suited for induction problems. The probability of observing $x_t$ at time $t$, given past observations $x_1...x_{t1}$ can be computed with Bayes' rule if the true generating distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. The problem, however, is that in many cases one does not even have a reasonable guess of the true distribution. In order to overcome this problem a universal (or mixture) distribution $\xi$ is defined as a weighted sum or integral of distributions $ u\!\in\!\M$, where $\M$ is any countable or continuous set of distributions including $\mu$. This is a generalization of Solomonoff induction, in which $\M$ is the set of all enumerable semimeasures. It is shown for several performance measures that using the universal $\xi$ as a prior is nearly as good as using the unknown true distribution $\mu$. In a sense, this solves the problem of the unknown prior in a universal way. All results are obtained for general finite alphabet. Convergence of $\xi$ to $\mu$ in a conditional mean squared sense and of $\xi/\mu\to 1$ with $\mu$ probability $1$ is proven. The number of additional errors $E_\xi$ made by the optimal universal prediction scheme based on $\xi$ minus the number of errors $E_\mu$ of the optimal informed prediction scheme based on $\mu$ is proven to be bounded by $O(\sqrt{E_\mu})$. The prediction framework is generalized to arbitrary loss functions. A system is allowed to take an action $y_t$, given $x_1...x_{t1}$ and receives loss $\ell_{x_t y_t}$ if $x_t$ is the next symbol of the sequence. No assumptions on $\ell$ are necessary, besides boundedness. Optimal universal $\Lambda_\xi$ and optimal informed $\Lambda_\mu$ prediction schemes are defined and the total loss of $\Lambda_\xi$ is bounded in terms of the total loss of $\Lambda_\mu$, similar to the error bounds. We show that the bounds are tight and that no other predictor can lead to smaller bounds. Furthermore, for various performance measures we show Paretooptimality of $\xi$ in the sense that there is no other predictor which performs better or equal in all environments $ u\in\M$ and strictly better in at least one. So, optimal predictors can (w.r.t.\ to most performance measures in expectation) be based on the mixture $\xi$. Finally we give an Occam's razor argument that Solomonoff's choice $w_ u\sim 2^{K( u)}$ for the weights is optimal, where $K( u)$ is the length of the shortest program describing $ u$. Furthermore, games of chance, defined as a sequence of bets, observations, and rewards are studied. The average profit achieved by the $\Lambda_\xi$ scheme rapidly converges to the best possible profit. The time needed to reach the winning zone is proportional to the relative entropy of $\mu$ and $\xi$. The prediction schemes presented here are compared to the weighted majority algorithm(s). Although the algorithms, the settings, and the proofs are quite different the bounds of both schemes have a very similar structure. Extensions to infinite alphabets, partial, delayed and probabilistic prediction, classification, and more active systems are briefly discussed.
Sequential predictions based on algorithmic complexity
, 2004
"... This paper studies sequence prediction based on the monotone Kolmogorov complexity Km=−log m, i.e. based on universal deterministic/onepart MDL. m is extremely close to Solomonoff’s universal prior M, the latter being an excellent predictor in deterministic as well as probabilistic environments, wh ..."
Abstract

Cited by 6 (6 self)
 Add to MetaCart
This paper studies sequence prediction based on the monotone Kolmogorov complexity Km=−log m, i.e. based on universal deterministic/onepart MDL. m is extremely close to Solomonoff’s universal prior M, the latter being an excellent predictor in deterministic as well as probabilistic environments, where performance is measured in terms of convergence of posteriors or losses. Despite this closeness to M, it is difficult to assess the prediction quality of m, since little is known about the closeness of their posteriors, which are the important quantities for prediction. We show that for deterministic computable environments, the “posterior ” and losses of m converge, but rapid convergence could only be shown onsequence; the offsequence convergence can be slow. In probabilistic environments, neither the posterior nor the losses converge,
Strong asymptotic assertions for discrete MDL in regression and classification
 In Benelearn 2005 (Ann. Machine Learning Conf. of Belgium and the
, 2005
"... We study the properties of the MDL (or maximum penalized complexity) estimator for Regression and Classification, where the underlying model class is countable. We show in particular a finite bound on the Hellinger losses under the only assumption that there is a “true ” model contained in the class ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
We study the properties of the MDL (or maximum penalized complexity) estimator for Regression and Classification, where the underlying model class is countable. We show in particular a finite bound on the Hellinger losses under the only assumption that there is a “true ” model contained in the class. This implies almost sure convergence of the predictive distribution to the true one at a fast rate. It corresponds to Solomonoff’s central theorem of universal induction, however with a bound that is exponentially larger.