Results 1 
5 of
5
Consistency of Feature Markov Processes
, 2010
"... We are studying long term sequence prediction (forecasting). We approach this by investigating criteria for choosing a compact useful state representation. The state is supposed to summarize useful information from the history. We want a method that is asymptotically consistent in the sense it will ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
We are studying long term sequence prediction (forecasting). We approach this by investigating criteria for choosing a compact useful state representation. The state is supposed to summarize useful information from the history. We want a method that is asymptotically consistent in the sense it will provably eventually only choose between alternatives that satisfy an optimality property related to the used criterion. We extend our work to the case where there is side information that one can take advantage of and, furthermore, we briefly discuss the active setting where an agent takes actions to achieve desirable outcomes.
Learning to make predictions in partially observable environments without a generative model
 Journal of Artificial Intelligence Research
, 2011
"... When faced with the problem of learning a model of a highdimensional environment, a common approach is to limit the model to make only a restricted set of predictions, thereby simplifying the learning problem. These partial models may be directly useful for making decisions or may be combined toget ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
When faced with the problem of learning a model of a highdimensional environment, a common approach is to limit the model to make only a restricted set of predictions, thereby simplifying the learning problem. These partial models may be directly useful for making decisions or may be combined together to form a more complete, structured model. However, in partially observable (nonMarkov) environments, standard modellearning methods learn generative models, i.e. models that provide a probability distribution over all possible futures (such as POMDPs). It is not straightforward to restrict such models to make only certain predictions, and doing so does not always simplify the learning problem. In this paper we present prediction profile models: nongenerative partial models for partially observable systems that make only a given set of predictions, and are therefore far simpler than generative models in some cases. We formalize the problem of learning a prediction profile model as a transformation of the original modellearning problem, and show empirically that one can learn prediction profile models that make a small set of important predictions even in systems that are too complex for standard generative models. 1.
Learning Partially Observable Models Using Temporally Abstract Decision Trees
"... This paper introduces timeline trees, which are partial models of partially observable environments. Timeline trees are given some specific predictions to make and learn a decision tree over history. The main idea of timeline trees is to use temporally abstract features to identify and split on feat ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
This paper introduces timeline trees, which are partial models of partially observable environments. Timeline trees are given some specific predictions to make and learn a decision tree over history. The main idea of timeline trees is to use temporally abstract features to identify and split on features of key events, spread arbitrarily far apart in the past (whereas previous decisiontreebased methods have been limited to a finite suffix of history). Experiments demonstrate that timeline trees can learn to make high quality predictions in complex, partially observable environments with highdimensional observations (e.g. an arcade game). 1
Feature Reinforcement Learning using Looping Suffix Trees
"... There has recently been much interest in historybased methods using suffix trees to solve POMDPs. However, these suffix trees cannot efficiently represent environments that have longterm dependencies. We extend the recently introduced CTΦMDP algorithm to the space of looping suffix trees which hav ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
There has recently been much interest in historybased methods using suffix trees to solve POMDPs. However, these suffix trees cannot efficiently represent environments that have longterm dependencies. We extend the recently introduced CTΦMDP algorithm to the space of looping suffix trees which have previously only been used in solving deterministic POMDPs. The resulting algorithm replicates results from CTΦMDP for environments with short term dependencies, while it outperforms LSTMbased methods on TMaze, a deep memory environment. 1.
DeterministicProbabilistic Models For Partially Observable Reinforcement Learning Problems
"... In this paper we consider learning the environment model in reinforcement learning tasks where the environment cannot be fully observed. The most popular frameworks for environment modeling are POMDPs and PSRs but they are considered difficult to learn. We propose to bypass this hard problem by assu ..."
Abstract
 Add to MetaCart
In this paper we consider learning the environment model in reinforcement learning tasks where the environment cannot be fully observed. The most popular frameworks for environment modeling are POMDPs and PSRs but they are considered difficult to learn. We propose to bypass this hard problem by assuming that (a) the sufficient statistic of any history can be represented as one of finitely many states and (b) this state is given by a deterministic map from histories to the finite state space. This finite set of states can be interpreted as the state space of an MDP which can then be used to plan. Now the learning problem is to estimate this deterministic historystate map. One of the earliest approaches in this direction is McCallum’s USM algorithm. Our work can roughly be understood as extending this general idea by replacing prediction suffix trees, used in USM, with deterministicprobabilistic finite automata from learning theory. In this paper we describe our model, derive a pseudoBayesian inference criterion, and show its consistency. We also describe a heuristic algorithm that uses the criterion to learn the models, along with experiments showing its efficacy.