Results 1  10
of
11
Closing the learningplanning loop with predictive state representations. http://arxiv.org/abs/0912.2385
, 2009
"... Abstract — A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must learn an accurate model of our environment, and then plan to maximize reward. Unfortunately, learning algorithms often recover a model w ..."
Abstract

Cited by 33 (14 self)
 Add to MetaCart
Abstract — A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must learn an accurate model of our environment, and then plan to maximize reward. Unfortunately, learning algorithms often recover a model which is too inaccurate to support planning or too large and complex for planning to be feasible; or, they require large amounts of prior domain knowledge or fail to provide important guarantees such as statistical consistency. To begin to fill this gap, we propose a novel algorithm which provably learns a compact, accurate model directly from sequences of actionobservation pairs. To evaluate the learner, we then close the loop from observations to actions: we plan in the learned model and recover a policy which is nearoptimal in the original environment (not the model). In more detail, we present a spectral algorithm for learning a Predictive State Representation (PSR). We demonstrate the algorithm by learning a model of a simulated highdimensional, visionbased mobile robot planning task, and then performing approximate pointbased planning in the learned model. This experiment shows that the learned PSR captures the essential features of the environment, allows accurate prediction with a small number of parameters, and enables successful and efficient planning. Our algorithm has several benefits which have not appeared together in any previous PSR learner: it is computationally efficient and statistically consistent; it handles highdimensional observations and long time horizons by working from realvalued features of observation sequences; and finally, our closetheloop experiments provide an endtoend practical test. I.
Learning to make predictions in partially observable environments without a generative model
 Journal of Artificial Intelligence Research
, 2011
"... When faced with the problem of learning a model of a highdimensional environment, a common approach is to limit the model to make only a restricted set of predictions, thereby simplifying the learning problem. These partial models may be directly useful for making decisions or may be combined toget ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
When faced with the problem of learning a model of a highdimensional environment, a common approach is to limit the model to make only a restricted set of predictions, thereby simplifying the learning problem. These partial models may be directly useful for making decisions or may be combined together to form a more complete, structured model. However, in partially observable (nonMarkov) environments, standard modellearning methods learn generative models, i.e. models that provide a probability distribution over all possible futures (such as POMDPs). It is not straightforward to restrict such models to make only certain predictions, and doing so does not always simplify the learning problem. In this paper we present prediction profile models: nongenerative partial models for partially observable systems that make only a given set of predictions, and are therefore far simpler than generative models in some cases. We formalize the problem of learning a prediction profile model as a transformation of the original modellearning problem, and show empirically that one can learn prediction profile models that make a small set of important predictions even in systems that are too complex for standard generative models. 1.
Spectral Approaches to Learning Predictive Representations
, 2011
"... A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must obtain an accurate environment model, and then plan to maximize reward. However, for complex domains, specifying a model by hand can be a time cons ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must obtain an accurate environment model, and then plan to maximize reward. However, for complex domains, specifying a model by hand can be a time consuming process. This motivates an alternative approach: learning a model directly from observations. Unfortunately, learning algorithms often recover a model that is too inaccurate to support planning or too large and complex for planning to succeed; or, they require excessive prior domain knowledge or fail to provide guarantees such as statistical consistency. To address this gap, we propose spectral subspace identification algorithms which provably learn compact, accurate, predictive models of partially observable dynamical systems directly from sequences of actionobservation pairs. Our research agenda includes several variations of this general approach: batch algorithms and online algorithms, kernelbased algorithms for learning models in high and infinitedimensional feature spaces, and manifoldbased identification algorithms. All of these approaches share a common framework: they are statistically consistent, computationally efficient, and easy to implement using established matrixalgebra techniques. Additionally, we show that our framework generalizes a variety of successful spectral
A DataDriven Statistical Framework for PostGrasp Manipulation
"... Abstract Grasping an object is usually only an intermediate goal for a robotic manipulator. To finish the task, the robot needs to know where the object is in its hand and what action to execute. This paper presents a general statistical framework to address these problems. Given a novel object, the ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract Grasping an object is usually only an intermediate goal for a robotic manipulator. To finish the task, the robot needs to know where the object is in its hand and what action to execute. This paper presents a general statistical framework to address these problems. Given a novel object, the robot learns a statistical model of grasp state conditioned on sensor values. The robot also builds a statistical model of the requirements of the task in terms of grasp state accuracy. Both of these models are constructed by offline experiments. The online process then grasps objects and chooses actions to maximize likelihood of success. This paper describes the framework in detail, and demonstrates its effectiveness experimentally in placing, dropping, and insertion tasks. To construct statistical models, the robot performed over 8000 grasp trials, and over 1000 trials each of placing, dropping and insertion. 1
TemporalDifference Networks for Dynamical Systems with Continuous Observations and Actions
"... Temporaldifference (TD) networks are a class of predictive state representations that use wellestablished TD methods to learn models of partially observable dynamical systems. Previous research with TD networks has dealt only with dynamical systems with finite sets of observations and actions. We ..."
Abstract
 Add to MetaCart
Temporaldifference (TD) networks are a class of predictive state representations that use wellestablished TD methods to learn models of partially observable dynamical systems. Previous research with TD networks has dealt only with dynamical systems with finite sets of observations and actions. We present an algorithm for learning TD network representations of dynamical systems with continuous observations and actions. Our results show that the algorithm is capable of learning accurate and robust models of several noisy continuous dynamical systems. The algorithm presented here is the first fully incremental method for learning a predictive representation of a continuous dynamical system. 1
Research Experience Research Scientist
, 2001
"... Actively pursuing research into structured dynamical systems modeling with Bayesian nonparametrics, planning and model building for reinforcement learning, structured policy priors for policy learning, and universal inference for probabilistic programming languages. Current applied thrusts include r ..."
Abstract
 Add to MetaCart
Actively pursuing research into structured dynamical systems modeling with Bayesian nonparametrics, planning and model building for reinforcement learning, structured policy priors for policy learning, and universal inference for probabilistic programming languages. Current applied thrusts include reinforcement learning for multicore systems, machine learning for oil discovery, and generative models of machine vision. Contributed to funding efforts for AFOSR and Shell Oil.
Learning Latent Variable and Predictive Models of Dynamical Systems
, 2009
"... Despite the single author listed on the cover, this dissertation is not the product of one person alone. I would like to acknowledge many, many people who influenced me, my life and my work. They have all aided this research in different ways over the years and helped it come to a successful conclus ..."
Abstract
 Add to MetaCart
Despite the single author listed on the cover, this dissertation is not the product of one person alone. I would like to acknowledge many, many people who influenced me, my life and my work. They have all aided this research in different ways over the years and helped it come to a successful conclusion. Geoff Gordon, my advisor, has taught me a lot over the years; how to think methodically and analyze a problem, how to formulate problems mathematically, and how to choose interesting problems. From the outset, he has helped me develop the ideas that went into the thesis. Andrew Moore, my first advisor, got me started in machine learning and data mining and helped make this field fun and accessible to me, and his guidance and mentoring was crucial for work done early in my Ph.D. Both Geoff and Andrew are the very best kind of advisor I could have asked for: really smart, knowledgeable, caring and handson. They showed me how be a good researcher while staying relaxed, calm and happy. Though I wasn’t always able to strike that balance, the example they set was essential for me to be able to make it through without burning out in the process. All the members of the AUTON lab deserve much thanks, especially Artur Dubrawski
Hilbert Space Embeddings of Predictive State Representations
"... Predictive State Representations (PSRs) are an expressive class of models for controlled stochastic processes. PSRs represent state as a set of predictions of future observable events. Because PSRs are defined entirely in terms of observable data, statistically consistent estimates of PSR parameters ..."
Abstract
 Add to MetaCart
Predictive State Representations (PSRs) are an expressive class of models for controlled stochastic processes. PSRs represent state as a set of predictions of future observable events. Because PSRs are defined entirely in terms of observable data, statistically consistent estimates of PSR parameters can be learned efficiently by manipulating moments of observed training data. Most learning algorithms for PSRs have assumed that actions and observations are finite with low cardinality. In this paper, we generalize PSRs to infinite sets of observations and actions, using the recent concept of Hilbert space embeddings of distributions. The essence is to represent the state as one or more nonparametric conditional embedding operators in a Reproducing Kernel Hilbert Space (RKHS) and leverage recent work in kernel methods to estimate, predict, and update the representation. We show that these Hilbert space embeddings of PSRs are able to gracefully handle continuous actions and observations, and that our learned models outperform competing system identification algorithms on several prediction benchmarks. 1
Representations
, 2012
"... A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must obtain an accurate environment model, and then plan to maximize reward. However, for complex domains, specifying a model by hand can be a time cons ..."
Abstract
 Add to MetaCart
A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must obtain an accurate environment model, and then plan to maximize reward. However, for complex domains, specifying a model by hand can be a time consuming process. This motivates an alternative approach: learning a model directly from observations. Unfortunately, learning algorithms often recover a model that is too inaccurate to support planning or too large and complex for planning to succeed; or, they require excessive prior domain knowledge or fail to provide guarantees such as statistical consistency. To address this gap, we propose spectral subspace identification algorithms which provably learn compact, accurate, predictive models of partially observable dynamical systems directly from sequences of actionobservation pairs. Our research agenda includes several variations of this general approach: spectral methods for classical models like Kalman filters and hidden Markov models, batch algorithms and online algorithms, and kernelbased algorithms for learning models in high and infinitedimensional feature spaces. All of these approaches share