• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Autonomous discovery of temporal abstractions from interaction with an environment (2002)

by A E McGovern
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 25
Next 10 →

Recent advances in hierarchical reinforcement learning

by Andrew G. Barto , 2003
"... A preliminary unedited version of this paper was incorrectly published as part of Volume ..."
Abstract - Cited by 119 (18 self) - Add to MetaCart
A preliminary unedited version of this paper was incorrectly published as part of Volume

Intrinsically motivated learning of hierarchical collections of skills

by Andrew G. Barto , 2004
"... Humans and other animals often engage in activities for their own sakes rather than as steps toward solving practical problems. Psychologists call these intrinsically motivated behaviors. What we learn during intrinsically motivated behavior is essential for our development as competent autonomous e ..."
Abstract - Cited by 80 (15 self) - Add to MetaCart
Humans and other animals often engage in activities for their own sakes rather than as steps toward solving practical problems. Psychologists call these intrinsically motivated behaviors. What we learn during intrinsically motivated behavior is essential for our development as competent autonomous entities able to efficiently solve a wide range of practical problems as they arise. In this paper we present initial results from a computational study of intrinsically motivated learning aimed at allowing artificial agents to construct and extend hierarchies of reusable skills that are needed for competent autonomy. At the core of the model are recent theoretical and algorithmic advances in computational reinforcement learning, specifically, new concepts related to skills and new learning algorithms for learning with skill hierarchies. 1

Using relative novelty to identify useful temporal abstractions in reinforcement learning

by Andrew G. Barto - In Proceedings of the Twenty-First International Conference on Machine Learning , 2004
"... We present a new method for automatically creating useful temporal abstractions in reinforcement learning. We argue that states that allow the agent to transition to a different region of the state space are useful subgoals, and propose a method for identifying them using the concept of relative nov ..."
Abstract - Cited by 51 (11 self) - Add to MetaCart
We present a new method for automatically creating useful temporal abstractions in reinforcement learning. We argue that states that allow the agent to transition to a different region of the state space are useful subgoals, and propose a method for identifying them using the concept of relative novelty. When such a state is identified, a temporallyextended activity (e.g., an option) is generated that takes the agent efficiently to this state. We illustrate the utility of the method in a number of tasks. 1.

Proto-value functions: A laplacian framework for learning representation and control in markov decision processes

by Sridhar Mahadevan, Mauro Maggioni, Carlos Guestrin - Journal of Machine Learning Research , 2006
"... This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by d ..."
Abstract - Cited by 45 (8 self) - Add to MetaCart
This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by diagonalizing symmetric diffusion operators (ii) A specific instantiation of this approach where global basis functions called proto-value functions (PVFs) are formed using the eigenvectors of the graph Laplacian on an undirected graph formed from state transitions induced by the MDP (iii) A three-phased procedure called representation policy iteration comprising of a sample collection phase, a representation learning phase that constructs basis functions from samples, and a final parameter estimation phase that determines an (approximately) optimal policy within the (linear) subspace spanned by the (current) basis functions. (iv) A specific instantiation of the RPI framework using least-squares policy iteration (LSPI) as the parameter estimation method (v) Several strategies for scaling the proposed approach to large discrete and continuous state spaces, including the Nyström extension for out-of-sample interpolation of eigenfunctions, and the use of Kronecker sum factorization to construct compact eigenfunctions in product spaces such as factored MDPs (vi) Finally, a series of illustrative discrete and continuous control tasks, which both illustrate the concepts and provide a benchmark for evaluating the proposed approach. Many challenges remain to be addressed in scaling the proposed framework to large MDPs, and several elaboration of the proposed framework are briefly summarized at the end.

PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning

by Marc Pickett, Andrew G. Barto - Proceedings of the Nineteenth International Conference on Machine Learning , 2002
"... We present PolicyBlocks, an algorithm by which a reinforcement learning agent can extract useful macro-actions from a set of related tasks. The agent creates macroactions by finding commonalities in solutions to previous tasks. Using these macro-actions, learning to do future related tasks is accele ..."
Abstract - Cited by 23 (8 self) - Add to MetaCart
We present PolicyBlocks, an algorithm by which a reinforcement learning agent can extract useful macro-actions from a set of related tasks. The agent creates macroactions by finding commonalities in solutions to previous tasks. Using these macro-actions, learning to do future related tasks is accelerated.

Intrinsically motivated reinforcement learning: A promising framework for developmental robot learning

by Andrew Stout, George D. Konidaris, Andrew G. Barto - In The AAAI Spring Symposium on Developmental Robotics , 2005
"... One of the primary challenges of developmental robotics is the question of how to learn and represent increasingly complex behavior in a self-motivated, open-ended way. Barto, Singh, and Chentanez (Barto, Singh, & Chentanez 2004; Singh, Barto, & Chentanez 2004) have recently presented an algorithm ..."
Abstract - Cited by 18 (1 self) - Add to MetaCart
One of the primary challenges of developmental robotics is the question of how to learn and represent increasingly complex behavior in a self-motivated, open-ended way. Barto, Singh, and Chentanez (Barto, Singh, & Chentanez 2004; Singh, Barto, & Chentanez 2004) have recently presented an algorithm for intrinsically motivated reinforcement learning that strives to achieve broad competence in an environment in a tasknonspecific manner by incorporating internal reward to build a hierarchical collection of skills. This paper suggests that with its emphasis on task-general, self-motivated, and hierarchical learning, intrinsically motivated reinforcement learning is an obvious choice for organizing behavior in developmental robotics. We present additional preliminary results from a gridworld abstraction of a robot environment and advocate a layered learning architecture for applying the algorithm on a physically embodied system.

Transfer of Policies Based on Trajectory Libraries

by Martin Stolle, Joel Chestnutt, Christopher G. Atkeson
"... Abstract — Libraries of trajectories are a promising way of creating policies for difficult problems. However, often it is not desirable or even possible to create a new library for every task. We present a method for transferring libraries across tasks, which allows us to build libraries by learnin ..."
Abstract - Cited by 8 (1 self) - Add to MetaCart
Abstract — Libraries of trajectories are a promising way of creating policies for difficult problems. However, often it is not desirable or even possible to create a new library for every task. We present a method for transferring libraries across tasks, which allows us to build libraries by learning from demonstration on one task and apply them to similar tasks. Representing the libraries in a feature-based space is key to supporting transfer. We also search through the library to ensure a complete path to the goal is possible. Results are shown for the Little Dog task. Little Dog is a quadruped robot

Self-Organizing Perceptual and Temporal Abstraction for Robot Reinforcement

by Jefferson Provost, Benjamin J. Kuipers, Risto Miikkulainen , 2004
"... A major current challenge in reinforcement learning research is to extend methods that work well on discrete, short-range, low-dimensional problems to continuous, highdiameter, high-dimensional problems, such as robot navigation using high-resolution sensors. We present a method whereby an robo ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
A major current challenge in reinforcement learning research is to extend methods that work well on discrete, short-range, low-dimensional problems to continuous, highdiameter, high-dimensional problems, such as robot navigation using high-resolution sensors. We present a method whereby an robot in a continuous world can, with little prior knowledge of its sensorimotor system, environment, and task, improve task learning by first using a self-organizing feature map to develop a set of higher-level perceptual features while exploring using primitive, local actions. Then using those features, the agent can build a set of high-level actions that carry it between perceptually distinctive states in the environment. This method

The development of hierarchical knowledge in robot systems

by Stephen W. Hart, Rachel Keen Member, Andrew Barto, Department Chair , 2009
"... This dissertation would not have been possible without the help and support of many people. Most of all, I would like to extend my gratitude to Rod Grupen for many years of inspiring work, our discussions, and his guidance. Without his support and vision, I cannot imagine that the journey would have ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
This dissertation would not have been possible without the help and support of many people. Most of all, I would like to extend my gratitude to Rod Grupen for many years of inspiring work, our discussions, and his guidance. Without his support and vision, I cannot imagine that the journey would have been as enormously enjoyable and rewarding as it turned out to be. I am very excited about what we discovered during my time at UMass, but there is much more to be done. I look forward to what comes next! In addition to providing professional inspiration, Rod was a great person to work with and for—creating a warm and encouraging laboratory atmosphere, motivating us to stay in shape for his annual half-marathons, and ensuring a sufficient amount of cake at the weekly lab meetings. Thanks for all your support, Rod! I am very grateful to my thesis committee—Andy Barto, David Jensen, and Rachel Keen—for many encouraging and inspirational discussions. Their comments and feedback significantly contributed to the form of this document. I would especially

TTree: Tree-Based State Generalization with Temporally Abstract Actions

by William T. B. Uther, Manuela M. Veloso - In Proceedings of SARA-2002 , 2002
"... In this chapter we describe the Trajectory Tree, or TTree, algorithm. TTree uses a small set of supplied policies to help solve a Semi-Markov Decision Problem (SMDP). The algorithm uses a learned tree based discretization of the state space as an abstract state description and both user supplied and ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
In this chapter we describe the Trajectory Tree, or TTree, algorithm. TTree uses a small set of supplied policies to help solve a Semi-Markov Decision Problem (SMDP). The algorithm uses a learned tree based discretization of the state space as an abstract state description and both user supplied and auto-generated policies as temporally abstract actions. It uses a generative model of the world to sample the transition function for the abstract SMDP defined by those state and temporal abstractions, and then finds a policy for that abstract SMDP. This policy for the abstract SMDP can then be mapped back...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University