Results 1 
8 of
8
A MonteCarlo AIXI Approximation
, 2009
"... This paper describes a computationally feasible approximation to the AIXI agent, a universal reinforcement learning agent for arbitrary environments. AIXI is scaled down in two key ways: First, the class of environment models is restricted to all prediction suffix trees of a fixed maximum depth. Thi ..."
Abstract

Cited by 28 (9 self)
 Add to MetaCart
This paper describes a computationally feasible approximation to the AIXI agent, a universal reinforcement learning agent for arbitrary environments. AIXI is scaled down in two key ways: First, the class of environment models is restricted to all prediction suffix trees of a fixed maximum depth. This allows a Bayesian mixture of environment models to be computed in time proportional to the logarithm of the size of the model class. Secondly, the finitehorizon expectimax search is approximated by an asymptotically convergent Monte Carlo Tree Search technique. This scaled down AIXI agent is empirically shown to be effective on a wide class of toy problem domains, ranging from simple fully observable games to small POMDPs. We explore the limits of this approximate agent and propose a general heuristic framework for scaling this technique to much larger problems.
Feature reinforcement learning: Part I. Unstructured MDPs
 Journal of General Artificial Intelligence
, 2009
"... www.hutter1.net Generalpurpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and nonMarkovian. On the other hand, reinforcement learning is welldeveloped for small finite state Markov decision processes (MDPs). Up ..."
Abstract

Cited by 23 (9 self)
 Add to MetaCart
(Show Context)
www.hutter1.net Generalpurpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and nonMarkovian. On the other hand, reinforcement learning is welldeveloped for small finite state Markov decision processes (MDPs). Up to now, extracting the right state representations out of bare observations, that is, reducing the general agent setup to the MDP framework, is an art that involves significant effort by designers. The primary goal of this work is to automate the reduction process and thereby significantly expand the scope of many existing reinforcement learning algorithms and the agents that employ them. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in Part
A Monte Carlo AIXI Approximation
 J. Artif. Intell. Res
"... This paper describes a computationally feasible approximation to the AIXI agent, a universal reinforcement learning agent for arbitrary environments. AIXI is scaled down in two key ways: First, the class of environment models is restricted to all prediction suffix trees of a fixed maximum depth. Thi ..."
Abstract

Cited by 21 (11 self)
 Add to MetaCart
(Show Context)
This paper describes a computationally feasible approximation to the AIXI agent, a universal reinforcement learning agent for arbitrary environments. AIXI is scaled down in two key ways: First, the class of environment models is restricted to all prediction suffix trees of a fixed maximum depth. This allows a Bayesian mixture of environment models to be computed in time proportional to the logarithm of the size of the model class. Secondly, the finitehorizon expectimax search is approximated by an asymptotically convergent Monte Carlo Tree Search technique. This scaled down AIXI agent is empirically shown to be effective on a wide class of toy problem domains, ranging from simple fully observable games to small POMDPs. We explore the limits of this approximate agent and propose a general heuristic framework for scaling this technique to much larger problems.
Feature dynamic Bayesian networks
 In AGI
, 2009
"... Feature Markov Decision Processes (ΦMDPs) [Hut09] are wellsuited for learning agents in general environments. Nevertheless, unstructured (Φ)MDPs are limited to relatively simple environments. Structured MDPs like Dynamic Bayesian Networks (DBNs) are used for largescale realworld problems. In this ..."
Abstract

Cited by 12 (8 self)
 Add to MetaCart
(Show Context)
Feature Markov Decision Processes (ΦMDPs) [Hut09] are wellsuited for learning agents in general environments. Nevertheless, unstructured (Φ)MDPs are limited to relatively simple environments. Structured MDPs like Dynamic Bayesian Networks (DBNs) are used for largescale realworld problems. In this article I extend ΦMDP to ΦDBN. The primary contribution is to derive a cost criterion that allows to automatically extract the most relevant features from the environment, leading to the “best ” DBN representation. I discuss all building blocks required for a complete general learning algorithm.
Open problems in universal induction & intelligence
 Algorithms
, 2009
"... algorithms ..."
(Show Context)
Constructing States for Reinforcement Learning
"... POMDPs are the models of choice for reinforcement learning (RL) tasks where the environment cannot be observed directly. In many applications we need to learn the POMDP structure and parameters from experience and this is considered to be a difficult problem. In this paper we address this issue by m ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
POMDPs are the models of choice for reinforcement learning (RL) tasks where the environment cannot be observed directly. In many applications we need to learn the POMDP structure and parameters from experience and this is considered to be a difficult problem. In this paper we address this issue by modeling the hidden environment with a novel class of models that are less expressive, but easier to learn and plan with than POMDPs. We call these models deterministic Markov models (DMMs), which are deterministicprobabilistic finite automata from learning theory, extended with actions to the sequential (rather than i.i.d.) setting. Conceptually, we extend the Utile Suffix Memory method of McCallum to handle long term memory. We describe DMMs, give Bayesian algorithms for learning and planning with them and also present experimental results for some standard POMDP tasks and tasks to illustrate its efficacy. 1.
DeterministicProbabilistic Models For Partially Observable Reinforcement Learning Problems
"... In this paper we consider learning the environment model in reinforcement learning tasks where the environment cannot be fully observed. The most popular frameworks for environment modeling are POMDPs and PSRs but they are considered difficult to learn. We propose to bypass this hard problem by assu ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper we consider learning the environment model in reinforcement learning tasks where the environment cannot be fully observed. The most popular frameworks for environment modeling are POMDPs and PSRs but they are considered difficult to learn. We propose to bypass this hard problem by assuming that (a) the sufficient statistic of any history can be represented as one of finitely many states and (b) this state is given by a deterministic map from histories to the finite state space. This finite set of states can be interpreted as the state space of an MDP which can then be used to plan. Now the learning problem is to estimate this deterministic historystate map. One of the earliest approaches in this direction is McCallum’s USM algorithm. Our work can roughly be understood as extending this general idea by replacing prediction suffix trees, used in USM, with deterministicprobabilistic finite automata from learning theory. In this paper we describe our model, derive a pseudoBayesian inference criterion, and show its consistency. We also describe a heuristic algorithm that uses the criterion to learn the models, along with experiments showing its efficacy.