Results 1 - 10
of
53
Proto-value functions: A laplacian framework for learning representation and control in markov decision processes
- Journal of Machine Learning Research
, 2006
"... This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by d ..."
Abstract
-
Cited by 92 (10 self)
- Add to MetaCart
(Show Context)
This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by diagonalizing symmetric diffusion operators (ii) A specific instantiation of this approach where global basis functions called proto-value functions (PVFs) are formed using the eigenvectors of the graph Laplacian on an undirected graph formed from state transitions induced by the MDP (iii) A three-phased procedure called representation policy iteration comprising of a sample collection phase, a representation learning phase that constructs basis functions from samples, and a final parameter estimation phase that determines an (approximately) optimal policy within the (linear) subspace spanned by the (current) basis functions. (iv) A specific instantiation of the RPI framework using least-squares policy iteration (LSPI) as the parameter estimation method (v) Several strategies for scaling the proposed approach to large discrete and continuous state spaces, including the Nyström extension for out-of-sample interpolation of eigenfunctions, and the use of Kronecker sum factorization to construct compact eigenfunctions in product spaces such as factored MDPs (vi) Finally, a series of illustrative discrete and continuous control tasks, which both illustrate the concepts and provide a benchmark for evaluating the proposed approach. Many challenges remain to be addressed in scaling the proposed framework to large MDPs, and several elaboration of the proposed framework are briefly summarized at the end.
Identifying useful subgoals in reinforcement learning by local graph partitioning
- In Proceedings of the Twenty-Second International Conference on Machine Learning
, 2005
"... We present a new subgoal-based method for automatically creating useful skills in reinforcement learning. Our method identifies subgoals by partitioning local state transition graphs—those that are constructed using only the most recent experiences of the agent. The local scope of our subgoal discov ..."
Abstract
-
Cited by 69 (10 self)
- Add to MetaCart
(Show Context)
We present a new subgoal-based method for automatically creating useful skills in reinforcement learning. Our method identifies subgoals by partitioning local state transition graphs—those that are constructed using only the most recent experiences of the agent. The local scope of our subgoal discovery method allows it to successfully identify the type of subgoals we seek—states that lie between two densely-connected regions of the state space—while producing an algorithm with low computational cost.
Building portable options: Skill transfer in reinforcement learning
- Proceedings of the 20th International Joint Conference on Artificial Intelligence
, 2007
"... The options framework provides methods for reinforcement learning agents to build new high-level skills. However, since options are usually learned in the same state space as the problem the agent is solving, they cannot be used in other tasks that are similar but have different state spaces. We int ..."
Abstract
-
Cited by 57 (12 self)
- Add to MetaCart
(Show Context)
The options framework provides methods for reinforcement learning agents to build new high-level skills. However, since options are usually learned in the same state space as the problem the agent is solving, they cannot be used in other tasks that are similar but have different state spaces. We introduce the notion of learning options in agentspace, the space generated by a feature set that is present and retains the same semantics across successive problem instances, rather than in problemspace. Agent-space options can be reused in later tasks that share the same agent-space but have different problem-spaces. We present experimental results demonstrating the use of agent-space options in building transferrable skills, and show that they perform best when used in conjunction with problem-space options. 1
Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining
"... We introduce skill chaining, a skill discovery method for reinforcement learning agents in continuous domains. Skill chaining produces chains of skills leading to an end-of-task reward. We demonstrate experimentally that skill chaining is able to create appropriate skills in a challenging continuous ..."
Abstract
-
Cited by 39 (8 self)
- Add to MetaCart
(Show Context)
We introduce skill chaining, a skill discovery method for reinforcement learning agents in continuous domains. Skill chaining produces chains of skills leading to an end-of-task reward. We demonstrate experimentally that skill chaining is able to create appropriate skills in a challenging continuous domain and that doing so results in performance gains. 1
An intrinsic reward mechanism for efficient exploration
- University of Pittsburgh
, 2006
"... How should a reinforcement learning agent act if its sole purpose is to efficiently learn an optimal policy for later use? In other words, how should it explore, to be able to exploit later? We formulate this problem as a Markov Decision Process by explicitly modeling the internal state of the agent ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
How should a reinforcement learning agent act if its sole purpose is to efficiently learn an optimal policy for later use? In other words, how should it explore, to be able to exploit later? We formulate this problem as a Markov Decision Process by explicitly modeling the internal state of the agent and propose a principled heuristic for its solution. We present experimental results in a number of domains, also exploring the algorithm’s use for learning a policy for a skill given its reward function—an important but neglected component of skill discovery. 1.
Causal Graph Based Decomposition of Factored MDPs
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We present Variable Influence Structure Analysis, or VISA, an algorithm that performs hierarchical decomposition of factored Markov decision processes. VISA uses a dynamic Bayesian network model of actions, and constructs a causal graph that captures relationships between state variables. In tasks ..."
Abstract
-
Cited by 35 (5 self)
- Add to MetaCart
We present Variable Influence Structure Analysis, or VISA, an algorithm that performs hierarchical decomposition of factored Markov decision processes. VISA uses a dynamic Bayesian network model of actions, and constructs a causal graph that captures relationships between state variables. In tasks
State abstraction discovery from irrelevant state variables
- In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence
, 2005
"... Abstraction is a powerful form of domain knowledge that allows reinforcement-learning agents to cope with complex environments, but in most cases a human must supply this knowledge. In the absence of such prior knowledge or a given model, we propose an algorithm for the automatic discovery of state ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
Abstraction is a powerful form of domain knowledge that allows reinforcement-learning agents to cope with complex environments, but in most cases a human must supply this knowledge. In the absence of such prior knowledge or a given model, we propose an algorithm for the automatic discovery of state abstraction from policies learned in one domain for use in other domains that have similar structure. To this end, we introduce a novel condition for state abstraction in terms of the relevance of state features to optimal behavior, and we exhibit statistical methods that detect this condition robustly. Finally, we show how to apply temporal abstraction to benefit safely from even partial state abstraction in the presence of generalization error. 1
The Constructivist Learning Architecture: A Model of Cognitive Development for Robust Autonomous Robots
, 2004
"... Copyright by ..."
(Show Context)