Results 11  20
of
49
BiasOptimal Incremental Problem Solving
 In Advances in Neural Information Processing Systems 15
, 2003
"... Given is a problem sequence and a probability distribution (the bias) on programs computing solution candidates. We present an optimally fast way of incrementally solving each task in the sequence. Bias shifts are computed by program prefixes that modify the distribution on their suffixes by reusing ..."
Abstract

Cited by 14 (8 self)
 Add to MetaCart
Given is a problem sequence and a probability distribution (the bias) on programs computing solution candidates. We present an optimally fast way of incrementally solving each task in the sequence. Bias shifts are computed by program prefixes that modify the distribution on their suffixes by reusing successful code for previous tasks (stored in nonmodifiable memory). No tested program gets more runtime than its probability times the total search time. In illustrative experiments, ours becomes the first general system to learn a universal solver for arbitrary disk Towers of Hanoi tasks (minimal solution size 2^n  1). It demonstrates the advantages of incremental learning by profiting from previously solved, simpler tasks involving samples of a simple context free language.
A Monte Carlo AIXI Approximation
 J. Artif. Intell. Res
"... This paper describes a computationally feasible approximation to the AIXI agent, a universal reinforcement learning agent for arbitrary environments. AIXI is scaled down in two key ways: First, the class of environment models is restricted to all prediction suffix trees of a fixed maximum depth. Thi ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
This paper describes a computationally feasible approximation to the AIXI agent, a universal reinforcement learning agent for arbitrary environments. AIXI is scaled down in two key ways: First, the class of environment models is restricted to all prediction suffix trees of a fixed maximum depth. This allows a Bayesian mixture of environment models to be computed in time proportional to the logarithm of the size of the model class. Secondly, the finitehorizon expectimax search is approximated by an asymptotically convergent Monte Carlo Tree Search technique. This scaled down AIXI agent is empirically shown to be effective on a wide class of toy problem domains, ranging from simple fully observable games to small POMDPs. We explore the limits of this approximate agent and propose a general heuristic framework for scaling this technique to much larger problems.
A MonteCarlo AIXI Approximation
, 2009
"... This paper describes a computationally feasible approximation to the AIXI agent, a universal reinforcement learning agent for arbitrary environments. AIXI is scaled down in two key ways: First, the class of environment models is restricted to all prediction suffix trees of a fixed maximum depth. Thi ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
This paper describes a computationally feasible approximation to the AIXI agent, a universal reinforcement learning agent for arbitrary environments. AIXI is scaled down in two key ways: First, the class of environment models is restricted to all prediction suffix trees of a fixed maximum depth. This allows a Bayesian mixture of environment models to be computed in time proportional to the logarithm of the size of the model class. Secondly, the finitehorizon expectimax search is approximated by an asymptotically convergent Monte Carlo Tree Search technique. This scaled down AIXI agent is empirically shown to be effective on a wide class of toy problem domains, ranging from simple fully observable games to small POMDPs. We explore the limits of this approximate agent and propose a general heuristic framework for scaling this technique to much larger problems.
On the foundations of universal sequence prediction
 In Proc. 3rd Annual Conference on Theory and Applications of Models of Computation (TAMC’06), volume 3959 of LNCS
, 2006
"... Solomonoff completed the Bayesian framework by providing a rigorous, unique, formal, and universal choice for the model class and the prior. We discuss in breadth how and in which sense universal (noni.i.d.) sequence prediction solves various (philosophical) problems of traditional Bayesian sequenc ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
Solomonoff completed the Bayesian framework by providing a rigorous, unique, formal, and universal choice for the model class and the prior. We discuss in breadth how and in which sense universal (noni.i.d.) sequence prediction solves various (philosophical) problems of traditional Bayesian sequence prediction. We show that Solomonoff’s model possesses many desirable properties: Fast convergence and strong bounds, and in contrast to most classical continuous prior densities has no zero p(oste)rior problem, i.e. can confirm universal hypotheses, is reparametrization and regrouping invariant, and avoids the oldevidence and updating problem. It even performs well (actually better) in noncomputable environments.
Progress in Incremental Machine Learning
, 2003
"... We will describe recent developments in a system for machine learning that we've been working on for some time (Sol 86, Sol 89). It is meant to be a "Scientist's Assistant" of great power and versatility in many areas of science and mathematics. It di#ers from other ambitious work in this area i ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
We will describe recent developments in a system for machine learning that we've been working on for some time (Sol 86, Sol 89). It is meant to be a "Scientist's Assistant" of great power and versatility in many areas of science and mathematics. It di#ers from other ambitious work in this area in that we are not so much interested in knowledge itself, as we are in how it is acquired  how machines may learn. To start o#, the system will learn to solve two very general kinds of problems. Most, but perhaps not all problems in science and engineering are of these two kinds.
Feature Markov Decision Processes
"... General purpose intelligent learning agents cycle through (complex,nonMDP) sequences of observations, actions, and rewards. On the other hand, reinforcement learning is welldeveloped for small finite state Markov Decision Processes (MDPs). So far it is an art performed by human designers to extract ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
General purpose intelligent learning agents cycle through (complex,nonMDP) sequences of observations, actions, and rewards. On the other hand, reinforcement learning is welldeveloped for small finite state Markov Decision Processes (MDPs). So far it is an art performed by human designers to extract the right state representation out of the bare observations, i.e. to reduce the agent setup to the MDP framework. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in the companion article [Hut09].
Adaptive Online Time Allocation to Search Algorithms
 MACHINE LEARNING: ECML 2004. PROCEEDINGS OF THE 15TH EUROPEAN CONFERENCE ON MACHINE LEARNING
, 2004
"... Given is a search problem or a sequence of search problems, as well as a set of potentially useful search algorithms. We propose a general framework for online allocation of computation time to search algorithms based on experience with their performance so far. In an example instantiation, we use s ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
Given is a search problem or a sequence of search problems, as well as a set of potentially useful search algorithms. We propose a general framework for online allocation of computation time to search algorithms based on experience with their performance so far. In an example instantiation, we use simple linear extrapolation of performance for allocating time to various simultaneously running genetic algorithms characterized by different parameter values. Despite the large number of searchers tested in parallel, on various tasks this rather general approach compares favorably to a more specialized stateoftheart heuristic; in one case it is nearly two orders of magnitude faster.
The Universal Distribution and Machine Learning
 The Computer Journal
, 2003
"... I will discuss two main topics in this lecture: First, the Universal Distribution and some of its properties: its accuracy, its incomputability, its subjectivity. Secondly, I’m going to tell how to use this distribution to create very intelligent machines. Many years ago in 1960 I discovered what ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
I will discuss two main topics in this lecture: First, the Universal Distribution and some of its properties: its accuracy, its incomputability, its subjectivity. Secondly, I’m going to tell how to use this distribution to create very intelligent machines. Many years ago in 1960 I discovered what we now call the Universal Probability Distribution(Sol 60). It is the probability distribution on all possible output strings of a universal computer with random input. It seemed to solve all kinds of prediction problems and resolve serious difficulties in the foundations of Bayesian Statistics. Suppose we have a string, x, and we want to know its universal probability with respect to machine, M. There will be many inputs to Machine M that will give x as output. Say si is the ith such input. If si is of length L(si) bits, the probability that a random binary input would be si is just 2−L(si). To get the probability that x will be produced by any of its programs, we sum the probabilities of all of them to get PM (x), the probability assigned to x by the universal distribution, using machine M as reference. PM (x) = � 2 −L(si) (1) It is easy to use this distribution for prediction: if x is a binary string, then the probability that 1 will be the next symbol of x is just PM (x1)/(PM (x0) + PM (x1)) Five years later, in 1965, Kolmogorov, not yet having read my paper, independently
POWERPLAY: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem
, 2011
"... Most of computer science focuses on automatically solving given computational problems. I focus on automatically inventing or discovering problems in a way inspired by the playful behavior of animals and humans, to train a more and more general problem solver from scratch in an unsupervised fashion. ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Most of computer science focuses on automatically solving given computational problems. I focus on automatically inventing or discovering problems in a way inspired by the playful behavior of animals and humans, to train a more and more general problem solver from scratch in an unsupervised fashion. At any given time, the novel algorithmic framework POWERPLAY searches the space of possible pairs of new tasks and modifications of the current problem solver, until it finds a more powerful problem solver that provably solves all previously learned tasks plus the new one, while the unmodified predecessor does not. The new task and its corresponding tasksolving skill are those first found and validated. Newly invented tasks may require making previously learned skills more efficient. The greedy search of typical POWERPLAY variants orders candidate pairs of tasks and solver modifications by their conditional computational complexity, given the stored experience so far. This biases the search towards pairs that can be described compactly and validated quickly. Standard problem solver architectures of personal computers or neural networks tend to generalize by solving numerous tasks outside the selfinvented training set; POWERPLAY’s ongoing search for novelty keeps fighting to extend beyond the generalization abilities of its present solver. The continually increasing repertoire of problem solving procedures can be exploited