• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Exponential Family Predictive Representations of State (2008)

by David Wingate
Add To MetaCart

Tools

Sorted by:
Results 1 - 7 of 7

Closing the learning-planning loop with predictive state representations. http://arxiv.org/abs/0912.2385

by Byron Boots, Sajid M. Siddiqi, Geoffrey J. Gordon , 2009
"... Abstract — A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must learn an accurate model of our environment, and then plan to maximize reward. Unfortunately, learning algorithms often recover a model w ..."
Abstract - Cited by 15 (10 self) - Add to MetaCart
Abstract — A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must learn an accurate model of our environment, and then plan to maximize reward. Unfortunately, learning algorithms often recover a model which is too inaccurate to support planning or too large and complex for planning to be feasible; or, they require large amounts of prior domain knowledge or fail to provide important guarantees such as statistical consistency. To begin to fill this gap, we propose a novel algorithm which provably learns a compact, accurate model directly from sequences of action-observation pairs. To evaluate the learner, we then close the loop from observations to actions: we plan in the learned model and recover a policy which is nearoptimal in the original environment (not the model). In more detail, we present a spectral algorithm for learning a Predictive State Representation (PSR). We demonstrate the algorithm by learning a model of a simulated high-dimensional, vision-based mobile robot planning task, and then performing approximate point-based planning in the learned model. This experiment shows that the learned PSR captures the essential features of the environment, allows accurate prediction with a small number of parameters, and enables successful and efficient planning. Our algorithm has several benefits which have not appeared together in any previous PSR learner: it is computationally efficient and statistically consistent; it handles high-dimensional observations and long time horizons by working from real-valued features of observation sequences; and finally, our close-the-loop experiments provide an end-to-end practical test. I.

Spectral Approaches to Learning Predictive Representations

by Byron Boots, Geoffrey J. Gordon (chair, J. Andrew Bagnell, Dieter Fox , 2011
"... A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must obtain an accurate environment model, and then plan to maximize reward. However, for complex domains, specifying a model by hand can be a time cons ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must obtain an accurate environment model, and then plan to maximize reward. However, for complex domains, specifying a model by hand can be a time consuming process. This motivates an alternative approach: learning a model directly from observations. Unfortunately, learning algorithms often recover a model that is too inaccurate to support planning or too large and complex for planning to succeed; or, they require excessive prior domain knowledge or fail to provide guarantees such as statistical consistency. To address this gap, we propose spectral subspace identification algorithms which provably learn compact, accurate, predictive models of partially observable dynamical systems directly from sequences of actionobservation pairs. Our research agenda includes several variations of this general approach: batch algorithms and online algorithms, kernel-based algorithms for learning models in high- and infinite-dimensional feature spaces, and manifold-based identification algorithms. All of these approaches share a common framework: they are statistically consistent, computationally efficient, and easy to implement using established matrixalgebra techniques. Additionally, we show that our framework generalizes a variety of successful spectral

Temporal-Difference Networks for Dynamical Systems with Continuous Observations and Actions

by Christopher M. Vigorito
"... Temporal-difference (TD) networks are a class of predictive state representations that use well-established TD methods to learn models of partially observable dynamical systems. Previous research with TD networks has dealt only with dynamical systems with finite sets of observations and actions. We ..."
Abstract - Add to MetaCart
Temporal-difference (TD) networks are a class of predictive state representations that use well-established TD methods to learn models of partially observable dynamical systems. Previous research with TD networks has dealt only with dynamical systems with finite sets of observations and actions. We present an algorithm for learning TD network representations of dynamical systems with continuous observations and actions. Our results show that the algorithm is capable of learning accurate and robust models of several noisy continuous dynamical systems. The algorithm presented here is the first fully incremental method for learning a predictive representation of a continuous dynamical system. 1

Predictive Representations For Sequential Decision Making Under Uncertainty

by Abdeslam Boularias , 2010
"... ..."
Abstract - Add to MetaCart
Abstract not found

Research Experience Research Scientist

by David Wingate , 2001
"... Actively pursuing research into structured dynamical systems modeling with Bayesian nonparametrics, planning and model building for reinforcement learning, structured policy priors for policy learning, and universal inference for probabilistic programming languages. Current applied thrusts include r ..."
Abstract - Add to MetaCart
Actively pursuing research into structured dynamical systems modeling with Bayesian nonparametrics, planning and model building for reinforcement learning, structured policy priors for policy learning, and universal inference for probabilistic programming languages. Current applied thrusts include reinforcement learning for multicore systems, machine learning for oil discovery, and generative models of machine vision. Contributed to funding efforts for AFOSR and Shell Oil.

Learning Latent Variable and Predictive Models of Dynamical Systems

by Sajid M. Siddiqi, Andrew W. Moore, Jeff Schneider, Zoubin Ghahramani , 2009
"... Despite the single author listed on the cover, this dissertation is not the product of one person alone. I would like to acknowledge many, many people who influenced me, my life and my work. They have all aided this research in different ways over the years and helped it come to a successful conclus ..."
Abstract - Add to MetaCart
Despite the single author listed on the cover, this dissertation is not the product of one person alone. I would like to acknowledge many, many people who influenced me, my life and my work. They have all aided this research in different ways over the years and helped it come to a successful conclusion. Geoff Gordon, my advisor, has taught me a lot over the years; how to think methodically and analyze a problem, how to formulate problems mathematically, and how to choose interesting problems. From the outset, he has helped me develop the ideas that went into the thesis. Andrew Moore, my first advisor, got me started in machine learning and data mining and helped make this field fun and accessible to me, and his guidance and mentoring was crucial for work done early in my Ph.D. Both Geoff and Andrew are the very best kind of advisor I could have asked for: really smart, knowledgeable, caring and hands-on. They showed me how be a good researcher while staying relaxed, calm and happy. Though I wasn’t always able to strike that balance, the example they set was essential for me to be able to make it through without burning out in the process. All the members of the AUTON lab deserve much thanks, especially Artur Dubrawski

A Data-Driven Statistical Framework for Post-Grasp Manipulation

by Robert Paolini, Alberto Rodriguez, Siddhartha S. Srinivasa, Matthew T. Mason
"... Abstract Grasping an object is usually only an intermediate goal for a robotic manipulator. To finish the task, the robot needs to know where the object is in its hand and what action to execute. This paper presents a general statistical framework to address these problems. Given a novel object, the ..."
Abstract - Add to MetaCart
Abstract Grasping an object is usually only an intermediate goal for a robotic manipulator. To finish the task, the robot needs to know where the object is in its hand and what action to execute. This paper presents a general statistical framework to address these problems. Given a novel object, the robot learns a statistical model of grasp state conditioned on sensor values. The robot also builds a statistical model of the requirements of the task in terms of grasp state accuracy. Both of these models are constructed by offline experiments. The online process then grasps objects and chooses actions to maximize likelihood of success. This paper describes the framework in detail, and demonstrates its effectiveness experimentally in placing, dropping, and insertion tasks. To construct statistical models, the robot performed over 8000 grasp trials, and over 1000 trials each of placing, dropping and insertion. 1
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University