Results 1 - 10
of
28
Transfer Learning for Reinforcement Learning Domains: A Survey
"... The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of transfer learning has only recently been applied to reinforcement learning tasks. The core idea of transfer is that experience gained in learning to perform one task can help improve learning performance in a related, but different, task. In this article we present a framework that classifies transfer learning methods in terms of their capabilities and goals, and then use it to survey the existing literature, as well as to suggest future directions for transfer learning work.
A.: Transfer of samples in batch reinforcement learning
- In: Proceedings of the 25th Annual ICML
, 2008
"... The main objective of transfer in reinforcement learning is to reduce the complexity of learning the solution of a target task by effectively reusing the knowledge retained from solving a set of source tasks. In this paper, we introduce a novel algorithm that transfers samples (i.e., tuples 〈s, a, s ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
The main objective of transfer in reinforcement learning is to reduce the complexity of learning the solution of a target task by effectively reusing the knowledge retained from solving a set of source tasks. In this paper, we introduce a novel algorithm that transfers samples (i.e., tuples 〈s, a, s ′ , r〉) from source to target tasks. Under the assumption that tasks have similar transition models and reward functions, we propose a method to select samples from the source tasks that are mostly similar to the target task, and, then, to use them as input for batch reinforcementlearning algorithms. As a result, the number of samples an agent needs to collect from the target task to learn its solution is reduced. We empirically show that, following the proposed approach, the transfer of samples is effective in reducing the learning complexity, even when some source tasks are significantly different from the target task. 1.
Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining
"... We introduce skill chaining, a skill discovery method for reinforcement learning agents in continuous domains. Skill chaining produces chains of skills leading to an end-of-task reward. We demonstrate experimentally that skill chaining is able to create appropriate skills in a challenging continuous ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
We introduce skill chaining, a skill discovery method for reinforcement learning agents in continuous domains. Skill chaining produces chains of skills leading to an end-of-task reward. We demonstrate experimentally that skill chaining is able to create appropriate skills in a challenging continuous domain and that doing so results in performance gains. 1
Generalization and transfer in robot control
- In Epigenetic Robotics Annual Conference
"... We address the generalization and transfer of sensorimotor programs in robot systems. We use a factorable control-based approach that provides a natural, discrete abstraction of the underlying continuous state/action space and thus allows for the application of learning algorithms that converge in p ..."
Abstract
-
Cited by 10 (8 self)
- Add to MetaCart
We address the generalization and transfer of sensorimotor programs in robot systems. We use a factorable control-based approach that provides a natural, discrete abstraction of the underlying continuous state/action space and thus allows for the application of learning algorithms that converge in practical amounts of time. We argue that our approach provides an efficient means for the adaptation of skills to new situations. We show the performance gains for our framework in simulation, and demonstrate results from on-line learning on a bimanual robot. 1.
Learning relational options for inductive transfer in relational reinforcement learning
- In Proceedings of the Seventeenth Conference on Inductive Logic Programming
, 2007
"... Abstract. In reinforcement learning problems, an agent has the task of learning a good or optimal strategy from interaction with his environment. At the start of the learning task, the agent usually has very little information. Therefore, when faced with complex problems that have a large state spac ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Abstract. In reinforcement learning problems, an agent has the task of learning a good or optimal strategy from interaction with his environment. At the start of the learning task, the agent usually has very little information. Therefore, when faced with complex problems that have a large state space, learning a good strategy might be infeasible or too slow to work in practice. One way to overcome this problem, is the use of guidance to supply the agent with traces of “reasonable policies”. However, in a lot of cases it will be hard for the user to supply such a policy. In this paper, we will investigate the use of transfer learning in Relational Reinforcement Learning. The goal of transfer learning is to accelerate learning on a target task after training on a different, but related, source task. More specifically, we introduce an extension of the options framework to the relational setting and show how one can learn skills that can be transferred across similar, but different domains. We present experiments showing the possible benefits of using relational options for transfer learning.
The development of hierarchical knowledge in robot systems
, 2009
"... This dissertation would not have been possible without the help and support of many people. Most of all, I would like to extend my gratitude to Rod Grupen for many years of inspiring work, our discussions, and his guidance. Without his support and vision, I cannot imagine that the journey would have ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This dissertation would not have been possible without the help and support of many people. Most of all, I would like to extend my gratitude to Rod Grupen for many years of inspiring work, our discussions, and his guidance. Without his support and vision, I cannot imagine that the journey would have been as enormously enjoyable and rewarding as it turned out to be. I am very excited about what we discovered during my time at UMass, but there is much more to be done. I look forward to what comes next! In addition to providing professional inspiration, Rod was a great person to work with and for—creating a warm and encouraging laboratory atmosphere, motivating us to stay in shape for his annual half-marathons, and ensuring a sufficient amount of cake at the weekly lab meetings. Thanks for all your support, Rod! I am very grateful to my thesis committee—Andy Barto, David Jensen, and Rachel Keen—for many encouraging and inspirational discussions. Their comments and feedback significantly contributed to the form of this document. I would especially
Transfer via Soft Homomorphisms
"... The field of transfer learning aims to speed up learning across multiple related tasks by transferring knowledge between source and target tasks. Past work has shown that when the tasks are specified as Markov Decision Processes (MDPs), a function that maps states in the target task to similar state ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The field of transfer learning aims to speed up learning across multiple related tasks by transferring knowledge between source and target tasks. Past work has shown that when the tasks are specified as Markov Decision Processes (MDPs), a function that maps states in the target task to similar states in the source task can be used to transfer many types of knowledge. Current approaches for autonomously learning such functions are inefficient or require domain knowledge and lack theoretical guarantees of performance. We devise a novel approach that learns a stochastic mapping between tasks. Using this mapping, we present two algorithms for autonomous transfer learning – one that has strong convergence guarantees and another approximate method that learns online from experience. Extending existing work on MDP homomorphisms, we present theoretical guarantees for the quality of a transferred value function.
Constructing skill trees for reinforcement learning agents from demonstration trajectories
- In Advances in Neural Information Processing Systems (NIPS
, 2010
"... We introduce CST, an algorithm for constructing skill trees from demonstration trajectories in continuous reinforcement learning domains. CST uses a changepoint detection method to segment each trajectory into a skill chain by detecting a change of appropriate abstraction, or that a segment is too c ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
We introduce CST, an algorithm for constructing skill trees from demonstration trajectories in continuous reinforcement learning domains. CST uses a changepoint detection method to segment each trajectory into a skill chain by detecting a change of appropriate abstraction, or that a segment is too complex to model as a single skill. The skill chains from each trajectory are then merged to form a skill tree. We demonstrate that CST constructs an appropriate skill tree that can be further refined through learning in a challenging continuous domain, and that it can be used to segment demonstration trajectories on a mobile manipulator into chains of skills where each skill is assigned an appropriate abstraction. 1
Thinking as the control of imagination: a conceptual framework for goaldirected systems
- Psychological Research
, 2009
"... This paper offers a conceptual framework which (re)integrates goal-directed control, motivational processes, and executive functions, and suggests a developmental pathway from situated action to higher level cognition. We first illustrate a basic computational (control-theoretic) model of goal-direc ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
This paper offers a conceptual framework which (re)integrates goal-directed control, motivational processes, and executive functions, and suggests a developmental pathway from situated action to higher level cognition. We first illustrate a basic computational (control-theoretic) model of goal-directed action that makes use of internal modeling. We then show that by adding the problem of selection among multiple action alternatives motivation enters the scene, and that the basic mechanisms of executive functions such as inhibition, the monitoring of progresses, and working memory, are required for this system to work. Further, we elaborate on the idea that the off-line reenactment of anticipatory mechanisms used for action control gives rise to (embodied) mental simulations, and propose that thinking consists essentially in controlling mental simulations rather than directly controlling behavior and perceptions. We conclude by sketching an evolutionary perspective of this process, proposing that anticipation leveraged cognition, and by highlighting specific predictions of our model.
Autonomous Skill Acquisition on a Mobile Manipulator
"... We describe a robot system that autonomously acquires skills through interaction with its environment. The robot learns to sequence the execution of a set of innate controllers to solve a task, extracts and retains components of that solution as portable skills, and then transfers those skills to re ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
We describe a robot system that autonomously acquires skills through interaction with its environment. The robot learns to sequence the execution of a set of innate controllers to solve a task, extracts and retains components of that solution as portable skills, and then transfers those skills to reduce the time required to learn to solve a second task.

