Results 1 - 10
of
49
Recent advances in hierarchical reinforcement learning
, 2003
"... A preliminary unedited version of this paper was incorrectly published as part of Volume ..."
Abstract
-
Cited by 119 (18 self)
- Add to MetaCart
A preliminary unedited version of this paper was incorrectly published as part of Volume
Intrinsically motivated learning of hierarchical collections of skills
, 2004
"... Humans and other animals often engage in activities for their own sakes rather than as steps toward solving practical problems. Psychologists call these intrinsically motivated behaviors. What we learn during intrinsically motivated behavior is essential for our development as competent autonomous e ..."
Abstract
-
Cited by 80 (15 self)
- Add to MetaCart
Humans and other animals often engage in activities for their own sakes rather than as steps toward solving practical problems. Psychologists call these intrinsically motivated behaviors. What we learn during intrinsically motivated behavior is essential for our development as competent autonomous entities able to efficiently solve a wide range of practical problems as they arise. In this paper we present initial results from a computational study of intrinsically motivated learning aimed at allowing artificial agents to construct and extend hierarchies of reusable skills that are needed for competent autonomy. At the core of the model are recent theoretical and algorithmic advances in computational reinforcement learning, specifically, new concepts related to skills and new learning algorithms for learning with skill hierarchies. 1
Using relative novelty to identify useful temporal abstractions in reinforcement learning
- In Proceedings of the Twenty-First International Conference on Machine Learning
, 2004
"... We present a new method for automatically creating useful temporal abstractions in reinforcement learning. We argue that states that allow the agent to transition to a different region of the state space are useful subgoals, and propose a method for identifying them using the concept of relative nov ..."
Abstract
-
Cited by 51 (11 self)
- Add to MetaCart
We present a new method for automatically creating useful temporal abstractions in reinforcement learning. We argue that states that allow the agent to transition to a different region of the state space are useful subgoals, and propose a method for identifying them using the concept of relative novelty. When such a state is identified, a temporallyextended activity (e.g., an option) is generated that takes the agent efficiently to this state. We illustrate the utility of the method in a number of tasks. 1.
Proto-value functions: A laplacian framework for learning representation and control in markov decision processes
- Journal of Machine Learning Research
, 2006
"... This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by d ..."
Abstract
-
Cited by 45 (8 self)
- Add to MetaCart
This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by diagonalizing symmetric diffusion operators (ii) A specific instantiation of this approach where global basis functions called proto-value functions (PVFs) are formed using the eigenvectors of the graph Laplacian on an undirected graph formed from state transitions induced by the MDP (iii) A three-phased procedure called representation policy iteration comprising of a sample collection phase, a representation learning phase that constructs basis functions from samples, and a final parameter estimation phase that determines an (approximately) optimal policy within the (linear) subspace spanned by the (current) basis functions. (iv) A specific instantiation of the RPI framework using least-squares policy iteration (LSPI) as the parameter estimation method (v) Several strategies for scaling the proposed approach to large discrete and continuous state spaces, including the Nyström extension for out-of-sample interpolation of eigenfunctions, and the use of Kronecker sum factorization to construct compact eigenfunctions in product spaces such as factored MDPs (vi) Finally, a series of illustrative discrete and continuous control tasks, which both illustrate the concepts and provide a benchmark for evaluating the proposed approach. Many challenges remain to be addressed in scaling the proposed framework to large MDPs, and several elaboration of the proposed framework are briefly summarized at the end.
Identifying useful subgoals in reinforcement learning by local graph partitioning
- In Proceedings of the Twenty-Second International Conference on Machine Learning
, 2005
"... We present a new subgoal-based method for automatically creating useful skills in reinforcement learning. Our method identifies subgoals by partitioning local state transition graphs—those that are constructed using only the most recent experiences of the agent. The local scope of our subgoal discov ..."
Abstract
-
Cited by 32 (7 self)
- Add to MetaCart
We present a new subgoal-based method for automatically creating useful skills in reinforcement learning. Our method identifies subgoals by partitioning local state transition graphs—those that are constructed using only the most recent experiences of the agent. The local scope of our subgoal discovery method allows it to successfully identify the type of subgoals we seek—states that lie between two densely-connected regions of the state space—while producing an algorithm with low computational cost.
Building portable options: Skill transfer in reinforcement learning
- Proceedings of the 20th International Joint Conference on Artificial Intelligence
, 2007
"... The options framework provides methods for reinforcement learning agents to build new high-level skills. However, since options are usually learned in the same state space as the problem the agent is solving, they cannot be used in other tasks that are similar but have different state spaces. We int ..."
Abstract
-
Cited by 32 (9 self)
- Add to MetaCart
The options framework provides methods for reinforcement learning agents to build new high-level skills. However, since options are usually learned in the same state space as the problem the agent is solving, they cannot be used in other tasks that are similar but have different state spaces. We introduce the notion of learning options in agentspace, the space generated by a feature set that is present and retains the same semantics across successive problem instances, rather than in problemspace. Agent-space options can be reused in later tasks that share the same agent-space but have different problem-spaces. We present experimental results demonstrating the use of agent-space options in building transferrable skills, and show that they perform best when used in conjunction with problem-space options. 1
An Algebraic Approach to Abstraction in Reinforcement Learning
, 2003
"... To operate e#ectively in complex environments learning agents have to selectively ignore irrelevant details by forming useful abstractions. In this article we outline a formulation of abstraction for reinforcement learning approaches to stochastic sequential decision problems modeled as semiMarkov D ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
To operate e#ectively in complex environments learning agents have to selectively ignore irrelevant details by forming useful abstractions. In this article we outline a formulation of abstraction for reinforcement learning approaches to stochastic sequential decision problems modeled as semiMarkov Decision Processes (SMDPs). Building on existing algebraic approaches, we propose the concept of SMDP homomorphism and argue that it provides a useful tool for a rigorous study of abstraction for SMDPs. We apply this framework to di#erent classes of abstractions that arise in hierarchical systems and discuss relativized options, a framework for compactly specifying a related family of temporally-extended actions. Additional details of this work are described in refs. [1, 2, 3].
Probabilistic Policy Reuse in a Reinforcement Learning Agent
- IN AAMAS ’06: PROCEEDINGS OF THE FIFTH INTERNATIONAL JOINT CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS
, 2006
"... We contribute Policy Reuse as a technique to improve a reinforcement learning agent with guidance from past learned similar policies. Our method relies on using the past policies as a probabilistic bias where the learning agent faces three choices: the exploitation of the ongoing learned policy, the ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
We contribute Policy Reuse as a technique to improve a reinforcement learning agent with guidance from past learned similar policies. Our method relies on using the past policies as a probabilistic bias where the learning agent faces three choices: the exploitation of the ongoing learned policy, the exploration of random unexplored actions, and the exploitation of past policies. We introduce the algorithm and its major components: an exploration strategy to include the new reuse bias, and a similarity function to estimate the similarity of past policies with respect to a new one. We provide empirical results demonstrating that Policy Reuse improves the learning performance over different strategies that learn without reuse. Interestingly and almost as a side effect, Policy Reuse also identifies classes of similar policies revealing a basis of core policies of the domain. We demonstrate that such a basis can be built incrementally, contributing the learning of the structure of a domain.
Developing Navigation Behavior through Self-Organizing Distinctive State Abstraction
- CONNECTION SCIENCE
, 2006
"... A major challenge in reinforcement learning research is to extend methods that have worked well on discrete, short-range, low-dimensional problems to continuous, high-diameter, high-dimensional problems, such as robot navigation using high-resolution sensors. Self-Organizing Distinctivestate Abs ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
A major challenge in reinforcement learning research is to extend methods that have worked well on discrete, short-range, low-dimensional problems to continuous, high-diameter, high-dimensional problems, such as robot navigation using high-resolution sensors. Self-Organizing Distinctivestate Abstraction (SODA) is a new, generic method by which a robot in a continuous world can better learn to navigate by learning a set of high-level features and building temporally-extended actions to carry it between distinctive states based on those features. A SODA agent first uses a selforganizing feature map to develop a set of high-level perceptual features while exploring the environment with primitive, local actions. The agent then builds a set of high-level actions composed of generic trajectoryfollowing and hill-climbing control laws that carry it between the states at local maxima of feature activations. In an experiment on a simulated robot navigation task, the SODA agent learns to perform a task requiring 300 small-scale, local actions using as few as 9 new, temporally-extended actions, significantly improving learning time over navigating with the local actions.

