Results 1  10
of
42
Reinforcement Learning in Robotics: A Survey
"... Reinforcement learning offers to robotics a framework and set oftoolsfor the design of sophisticated and hardtoengineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between di ..."
Abstract

Cited by 39 (2 self)
 Add to MetaCart
Reinforcement learning offers to robotics a framework and set oftoolsfor the design of sophisticated and hardtoengineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this article, we attempt to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots. We highlight both key challenges in robot reinforcement learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our paper lies on the choice between modelbased and modelfree as well as between value functionbased and policy search methods. By analyzing a simple problem in some detail we demonstrate how reinforcement learning approaches may be profitably applied, and
Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining
"... We introduce skill chaining, a skill discovery method for reinforcement learning agents in continuous domains. Skill chaining produces chains of skills leading to an endoftask reward. We demonstrate experimentally that skill chaining is able to create appropriate skills in a challenging continuous ..."
Abstract

Cited by 38 (8 self)
 Add to MetaCart
(Show Context)
We introduce skill chaining, a skill discovery method for reinforcement learning agents in continuous domains. Skill chaining produces chains of skills leading to an endoftask reward. We demonstrate experimentally that skill chaining is able to create appropriate skills in a challenging continuous domain and that doing so results in performance gains. 1
Robot Learning from Demonstration by Constructing Skill Trees
"... We describe CST, an online algorithm for constructing skill trees from demonstration trajectories. CST segments a demonstration trajectory into a chain of component skills, where each skill has a goal and is assigned a suitable abstraction from an abstraction library. These properties permit skills ..."
Abstract

Cited by 33 (5 self)
 Add to MetaCart
We describe CST, an online algorithm for constructing skill trees from demonstration trajectories. CST segments a demonstration trajectory into a chain of component skills, where each skill has a goal and is assigned a suitable abstraction from an abstraction library. These properties permit skills to be improved efficiently using a policy learning algorithm. Chains from multiple demonstration trajectories are merged into a skill tree. We show that CST can be used to acquire skills from human demonstration in a dynamic continuous domain, and from both expert demonstration and learned control sequences on the uBot5 mobile manipulator. 1 1
Constructing skill trees for reinforcement learning agents from demonstration trajectories
 In Advances in Neural Information Processing Systems (NIPS
, 2010
"... We introduce CST, an algorithm for constructing skill trees from demonstration trajectories in continuous reinforcement learning domains. CST uses a changepoint detection method to segment each trajectory into a skill chain by detecting a change of appropriate abstraction, or that a segment is too c ..."
Abstract

Cited by 27 (7 self)
 Add to MetaCart
(Show Context)
We introduce CST, an algorithm for constructing skill trees from demonstration trajectories in continuous reinforcement learning domains. CST uses a changepoint detection method to segment each trajectory into a skill chain by detecting a change of appropriate abstraction, or that a segment is too complex to model as a single skill. The skill chains from each trajectory are then merged to form a skill tree. We demonstrate that CST constructs an appropriate skill tree that can be further refined through learning in a challenging continuous domain, and that it can be used to segment demonstration trajectories on a mobile manipulator into chains of skills where each skill is assigned an appropriate abstraction. 1
A unifying framework for computational reinforcement learning theory
, 2009
"... Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervisedlearning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understand ..."
Abstract

Cited by 23 (7 self)
 Add to MetaCart
Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervisedlearning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understanding the nature of supervised learning, they have not been as successful in reinforcement learning (RL). Here, the fundamental barrier is the need for active exploration in sequential decision problems. An RL agent tries to maximize longterm utility by exploiting its knowledge about the problem, but this knowledge has to be acquired by the agent itself through exploring the problem that may reduce shortterm utility. The need for active exploration is common in many problems in daily life, engineering, and sciences. For example, a Backgammon program strives to take good moves to maximize the probability of winning a game, but sometimes it may try novel and possibly harmful moves to discover how the opponent reacts in the hope of discovering a better gameplaying strategy. It has been known since the early days of RL that a good tradeoff between exploration and exploitation is critical for the agent to learn fast (i.e., to reach nearoptimal strategies
1 Learning Graphical Model Parameters with Approximate Marginal Inference
"... Abstract—Likelihood basedlearning of graphical models faces challenges of computationalcomplexity and robustness to model misspecification. This paper studies methods that fit parameters directly to maximize a measure of the accuracy of predicted marginals, taking into account both model and infer ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
Abstract—Likelihood basedlearning of graphical models faces challenges of computationalcomplexity and robustness to model misspecification. This paper studies methods that fit parameters directly to maximize a measure of the accuracy of predicted marginals, taking into account both model and inference approximations at training time. Experiments on imaging problems suggest marginalizationbased learning performs better than likelihoodbased approximations on difficult problems where the model being fit is approximate in nature. 1
Efficient skill learning using abstraction selection
 In Proceedings of the 21st International Joint Conference on Artificial Intelligence
, 2009
"... We present an algorithm for selecting an appropriate abstraction when learning a new skill. We show empirically that it can consistently select an appropriate abstraction using very little sample data, and that it significantly improves skill learning performance in a reasonably large realvalued re ..."
Abstract

Cited by 17 (9 self)
 Add to MetaCart
We present an algorithm for selecting an appropriate abstraction when learning a new skill. We show empirically that it can consistently select an appropriate abstraction using very little sample data, and that it significantly improves skill learning performance in a reasonably large realvalued reinforcement learning domain. 1
Learning and Generalization of Complex Tasks from Unstructured Demonstrations
"... Abstract — We present a novel method for segmenting demonstrations, recognizing repeated skills, and generalizing complex tasks from unstructured demonstrations. This method combines many of the advantages of recent automatic segmentation methods for learning from demonstration into a single princip ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
(Show Context)
Abstract — We present a novel method for segmenting demonstrations, recognizing repeated skills, and generalizing complex tasks from unstructured demonstrations. This method combines many of the advantages of recent automatic segmentation methods for learning from demonstration into a single principled, integrated framework. Specifically, we use the Beta Process Autoregressive Hidden Markov Model and Dynamic Movement Primitives to learn and generalize a multistep task on the PR2 mobile manipulator and to demonstrate the potential of our framework to learn a large library of skills over time. I.
Investigating Contingency Awareness Using Atari 2600 Games
"... Contingency awareness is the recognition that some aspects of a future observation are under an agent’s control while others are solely determined by the environment. This paper explores the idea of contingency awareness in reinforcement learning using the platform of Atari 2600 games. We introduce ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
Contingency awareness is the recognition that some aspects of a future observation are under an agent’s control while others are solely determined by the environment. This paper explores the idea of contingency awareness in reinforcement learning using the platform of Atari 2600 games. We introduce a technique for accurately identifying contingent regions and describe how to exploit this knowledge to generate improved features for value function approximation. We evaluate the performance of our techniques empirically, using 46 unseen, diverse, and challenging games for the Atari 2600 console. Our results suggest that contingency awareness is a generally useful concept for modelfree reinforcement learning agents. 1
Regularized OffPolicy TDLearning
, 2012
"... We present a novel l1 regularized offpolicy convergent TDlearning method (termed ROTD), which is able to learn sparse representations of value functions with low computational complexity. The algorithmic framework underlying ROTD integrates two key ideas: offpolicy convergent gradient TD method ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
(Show Context)
We present a novel l1 regularized offpolicy convergent TDlearning method (termed ROTD), which is able to learn sparse representations of value functions with low computational complexity. The algorithmic framework underlying ROTD integrates two key ideas: offpolicy convergent gradient TD methods, such as TDC, and a convexconcave saddlepoint formulation of nonsmooth convex optimization, which enables firstorder solvers and feature selection using online convex regularization. A detailed theoretical and experimental analysis of ROTD is presented. A variety of experiments are presented to illustrate the offpolicy convergence, sparse feature selection capability and low computational cost of the ROTD algorithm.