Results 1 - 10
of
22
Transfer Learning for Reinforcement Learning Domains: A Survey
"... The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of transfer learning has only recently been applied to reinforcement learning tasks. The core idea of transfer is that experience gained in learning to perform one task can help improve learning performance in a related, but different, task. In this article we present a framework that classifies transfer learning methods in terms of their capabilities and goals, and then use it to survey the existing literature, as well as to suggest future directions for transfer learning work.
Causal Graph Based Decomposition of Factored MDPs
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We present Variable Influence Structure Analysis, or VISA, an algorithm that performs hierarchical decomposition of factored Markov decision processes. VISA uses a dynamic Bayesian network model of actions, and constructs a causal graph that captures relationships between state variables. In tasks ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
We present Variable Influence Structure Analysis, or VISA, an algorithm that performs hierarchical decomposition of factored Markov decision processes. VISA uses a dynamic Bayesian network model of actions, and constructs a causal graph that captures relationships between state variables. In tasks
Toward a topological theory of relational reinforcement learning for navigation tasks
- In Proceedings of the Eighteenth International Florida Artificial Intelligence Research Society Conference (FLAIRS-2005
, 2005
"... We examine application of relational learning methods to reinforcement learning in spatial navigation tasks. Specifically, we consider a goal-seeking agent with noisy control actions embedded in an environment with strong topological structure. While formally a Markov decision process (MDP), this ta ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
We examine application of relational learning methods to reinforcement learning in spatial navigation tasks. Specifically, we consider a goal-seeking agent with noisy control actions embedded in an environment with strong topological structure. While formally a Markov decision process (MDP), this task possesses special structure derived from the underlying topology that can be exploited to speed learning. We describe relational policies for such environments that are relocatable by virtue of being parameterized solely in terms of the relations (distance and direction) between the agent’s current state and the goal state. We demonstrate that this formulation yields significant learning improvements in completely homogeneous environments for which exact policy relocation is possible. We also examine the effects of non-homogeneities such as walls or obstacles and show that their effects can be neglected if they fall outside of a closed-form envelope surrounding the optimal path between the agent and the goal. To our knowledge, this is the first closed-form result for the structure of an envelope in an MDP. We demonstrate that relational reinforcement learning in an environment that obeys the envelope constraints also yields substantial learning performance improvements.
Generalization and transfer in robot control
- In Epigenetic Robotics Annual Conference
"... We address the generalization and transfer of sensorimotor programs in robot systems. We use a factorable control-based approach that provides a natural, discrete abstraction of the underlying continuous state/action space and thus allows for the application of learning algorithms that converge in p ..."
Abstract
-
Cited by 10 (8 self)
- Add to MetaCart
We address the generalization and transfer of sensorimotor programs in robot systems. We use a factorable control-based approach that provides a natural, discrete abstraction of the underlying continuous state/action space and thus allows for the application of learning algorithms that converge in practical amounts of time. We argue that our approach provides an efficient means for the adaptation of skills to new situations. We show the performance gains for our framework in simulation, and demonstrate results from on-line learning on a bimanual robot. 1.
The development of hierarchical knowledge in robot systems
, 2009
"... This dissertation would not have been possible without the help and support of many people. Most of all, I would like to extend my gratitude to Rod Grupen for many years of inspiring work, our discussions, and his guidance. Without his support and vision, I cannot imagine that the journey would have ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This dissertation would not have been possible without the help and support of many people. Most of all, I would like to extend my gratitude to Rod Grupen for many years of inspiring work, our discussions, and his guidance. Without his support and vision, I cannot imagine that the journey would have been as enormously enjoyable and rewarding as it turned out to be. I am very excited about what we discovered during my time at UMass, but there is much more to be done. I look forward to what comes next! In addition to providing professional inspiration, Rod was a great person to work with and for—creating a warm and encouraging laboratory atmosphere, motivating us to stay in shape for his annual half-marathons, and ensuring a sufficient amount of cake at the weekly lab meetings. Thanks for all your support, Rod! I am very grateful to my thesis committee—Andy Barto, David Jensen, and Rachel Keen—for many encouraging and inspirational discussions. Their comments and feedback significantly contributed to the form of this document. I would especially
Transfer via Soft Homomorphisms
"... The field of transfer learning aims to speed up learning across multiple related tasks by transferring knowledge between source and target tasks. Past work has shown that when the tasks are specified as Markov Decision Processes (MDPs), a function that maps states in the target task to similar state ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The field of transfer learning aims to speed up learning across multiple related tasks by transferring knowledge between source and target tasks. Past work has shown that when the tasks are specified as Markov Decision Processes (MDPs), a function that maps states in the target task to similar states in the source task can be used to transfer many types of knowledge. Current approaches for autonomously learning such functions are inefficient or require domain knowledge and lack theoretical guarantees of performance. We devise a novel approach that learns a stochastic mapping between tasks. Using this mapping, we present two algorithms for autonomous transfer learning – one that has strong convergence guarantees and another approximate method that learns online from experience. Extending existing work on MDP homomorphisms, we present theoretical guarantees for the quality of a transferred value function.
Why (PO)MDPs lose for spatial tasks and what to do about it
- In Proceedings of the ICML 2005 Workshop on Rich Representations for Reinforcement Learning
, 2005
"... In this deliberately inflammatory paper, we claim that everything you believe about (PO)MDPs is wrong. More specifically, we claim that (PO)MDPs are so general as to be nearly useless in many cases of practical interest and that we should specialize rather than generalize. We are mostly concerned wi ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In this deliberately inflammatory paper, we claim that everything you believe about (PO)MDPs is wrong. More specifically, we claim that (PO)MDPs are so general as to be nearly useless in many cases of practical interest and that we should specialize rather than generalize. We are mostly concerned with problems involving real, physical systems operating in a real, physical world (the same real, physical world that we live in). In particular, we are interested in spatial navigation, but we believe that this claim holds for a number of other key problem areas as well. Our abstraction efforts to date have focused on extending the reach of (PO)MDP models while maintaining their basic worldview. We claim that a profitable approach for the future is to cleave RL into a number of sub-disciplines, each studying important “special cases”. By doing so, we will be able to take advantage of the properties of these cases in ways that our current (PO)MDP frameworks are unable to. 1. Provocative Claim The (PO)MDP frameworks are fundamentally broken, not because they are insufficiently powerful representations, but because they are too powerful. We submit that, rather than generalizing these models, we should be specializing them if we want to make progress on solving real problems in the real world. Preliminary work. Under review by the International Conference on Machine Learning (ICML). Do not distribute.
POMDP Homomorphisms
"... The problem of finding hidden state in a POMDP and the problem of finding abstractions for MDPs are closely related. In this paper, we analyze the connection between existing Predictive State Representation methods [3] and homomorphic reductions of Markov Processes [5, 6]. We formally define a POMDP ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The problem of finding hidden state in a POMDP and the problem of finding abstractions for MDPs are closely related. In this paper, we analyze the connection between existing Predictive State Representation methods [3] and homomorphic reductions of Markov Processes [5, 6]. We formally define a POMDP homomorphism, then extend PSR reduction methods to find POMDP homomorphisms when the original POMDP is known. The resulting methods find more compact abstract models in tasks for which different observations have the same meaning. 1

