Results 1  10
of
57
Robot Motor Skill Coordination with EMbased Reinforcement Learning
"... Abstract — We present an approach allowing a robot to acquire new motor skills by learning the couplings across motor control variables. The demonstrated skill is first encoded in a compact form through a modified version of Dynamic Movement Primitives (DMP) which encapsulates correlation informatio ..."
Abstract

Cited by 52 (8 self)
 Add to MetaCart
(Show Context)
Abstract — We present an approach allowing a robot to acquire new motor skills by learning the couplings across motor control variables. The demonstrated skill is first encoded in a compact form through a modified version of Dynamic Movement Primitives (DMP) which encapsulates correlation information. ExpectationMaximization based Reinforcement Learning is then used to modulate the mixture of dynamical systems initialized from the user’s demonstration. The approach is evaluated on a torquecontrolled 7 DOFs Barrett WAM robotic arm. Two skill learning experiments are conducted: a reaching task where the robot needs to adapt the learned movement to avoid an obstacle, and a dynamic pancakeflipping task. I.
Reinforcement Learning in Robotics: A Survey
"... Reinforcement learning offers to robotics a framework and set oftoolsfor the design of sophisticated and hardtoengineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between di ..."
Abstract

Cited by 39 (2 self)
 Add to MetaCart
Reinforcement learning offers to robotics a framework and set oftoolsfor the design of sophisticated and hardtoengineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this article, we attempt to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots. We highlight both key challenges in robot reinforcement learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our paper lies on the choice between modelbased and modelfree as well as between value functionbased and policy search methods. By analyzing a simple problem in some detail we demonstrate how reinforcement learning approaches may be profitably applied, and
Learning variable impedance control
 International Journal of Robotics Research
, 2011
"... One of the hallmarks of the performance, versatility, and robustness of biological motor control is the ability to adapt the impedance of the overall biomechanical system to different task requirements and stochastic disturbances. A transfer of this principle to robotics is desirable, for instance t ..."
Abstract

Cited by 25 (9 self)
 Add to MetaCart
(Show Context)
One of the hallmarks of the performance, versatility, and robustness of biological motor control is the ability to adapt the impedance of the overall biomechanical system to different task requirements and stochastic disturbances. A transfer of this principle to robotics is desirable, for instance to enable robots to work robustly and safely in everyday human environments. It is, however, not trivial to derive variable impedance controllers for practical high degreeoffreedom (DOF) robotic tasks. In this contribution, we accomplish such variable impedance control with the reinforcement learning (RL) algorithm PI2 (Policy Improvement with Path Integrals). PI2 is a modelfree, sampling based learning method derived from first principles of stochastic optimal control. The PI2 algorithm requires no tuning of algorithmic parameters besides the exploration noise. The designer can thus fully focus on cost function design to specify the task. From the
Learning Sequential Motor Tasks
"... Abstract—Many real robot applications require the sequential use of multiple distinct motor primitives. This requirement implies the need to learn the individual primitives as well as a strategy to select the primitives sequentially. Such hierarchical learning problems are commonly either treated as ..."
Abstract

Cited by 19 (9 self)
 Add to MetaCart
(Show Context)
Abstract—Many real robot applications require the sequential use of multiple distinct motor primitives. This requirement implies the need to learn the individual primitives as well as a strategy to select the primitives sequentially. Such hierarchical learning problems are commonly either treated as one complex monolithic problem which is hard to learn, or as separate tasks learned in isolation. However, there exists a strong link between the robots strategy and its motor primitives. Consequently, a consistent framework is needed that can learn jointly on the level of the individual primitives and the robots strategy. We present a hierarchical learning method which improves individual motor primitives and, simultaneously, learns how to combine these motor primitives sequentially to solve complex motor tasks. We evaluate our method on the game of robot hockey, which is both difficult to learn in terms of the required motor primitives as well as its strategic elements. I.
Variable impedance control  a reinforcement learning approach
 in ‘Robotics: Science and Systems Conference (RSS
, 2010
"... AbstractOne of the hallmarks of the performance, versatility, and robustness of biological motor control is the ability to adapt the impedance of the overall biomechanical system to different task requirements and stochastic disturbances. A transfer of this principle to robotics is desirable, for ..."
Abstract

Cited by 16 (9 self)
 Add to MetaCart
(Show Context)
AbstractOne of the hallmarks of the performance, versatility, and robustness of biological motor control is the ability to adapt the impedance of the overall biomechanical system to different task requirements and stochastic disturbances. A transfer of this principle to robotics is desirable, for instance to enable robots to work robustly and safely in everyday human environments. It is, however, not trivial to derive variable impedance controllers for practical high DOF robotic tasks. In this contribution, we accomplish such gain scheduling with a reinforcement learning approach algorithm, PI 2 (Policy Improvement with Path Integrals). PI 2 is a modelfree, sampling based learning method derived from first principles of optimal control. The PI 2 algorithm requires no tuning of algorithmic parameters besides the exploration noise. The designer can thus fully focus on cost function design to specify the task. From the viewpoint of robotics, a particular useful property of PI 2 is that it can scale to problems of many DOFs, so that RL on real robotic systems becomes feasible. We sketch the PI 2 algorithm and its theoretical properties, and how it is applied to gain scheduling. We evaluate our approach by presenting results on two different simulated robotic systems, a 3DOF Phantom Premium Robot and a 6DOF Kuka Lightweight Robot. We investigate tasks where the optimal strategy requires both tuning of the impedance of the endeffector, and tuning of a reference trajectory. The results show that we can use path integral based RL not only for planning but also to derive variable gain feedback controllers in realistic scenarios. Thus, the power of variable impedance control is made available to a wide variety of robotic systems and practical applications.
DataEfficient Generalization of Robot Skills with Contextual Policy Search
"... In robotics, controllers make the robot solve a task within a specific context. The context can describe the objectives of the robot or physical properties of the environment and is always specified before task execution. To generalize the controller to multiple contexts, we follow a hierarchical ap ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
In robotics, controllers make the robot solve a task within a specific context. The context can describe the objectives of the robot or physical properties of the environment and is always specified before task execution. To generalize the controller to multiple contexts, we follow a hierarchical approach for policy learning: A lowerlevel policy controls the robot for a given context and an upperlevel policy generalizes among contexts. Current approaches for learning such upperlevel policies are based on modelfree policy search, which require an excessive number of interactions of the robot with its environment. More dataefficient policy search approaches are model based but, thus far, without the capability of learning hierarchical policies. We propose a new modelbased policy search approach that can also learn contextual upperlevel policies. Our approach is based on learning probabilistic forward models for longterm predictions. Using these predictions, we use informationtheoretic insights to improve the upperlevel policy. Our method achieves a substantial improvement in learning speed compared to existing methods on simulated and real robotic tasks.
Risk Sensitive Path Integral Control
"... Recently path integral methods have been developed for stochastic optimal control for a wide class of models with nonlinear dynamics in continuous spacetime. Path integral methods find the control that minimizes the expected costtogo. In this paper we show that under the same assumptions, path i ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
Recently path integral methods have been developed for stochastic optimal control for a wide class of models with nonlinear dynamics in continuous spacetime. Path integral methods find the control that minimizes the expected costtogo. In this paper we show that under the same assumptions, path integral methods generalize directlytorisksensitivestochasticoptimalcontrol. Here the method minimizes in expectation an exponentially weighted costtogo. Depending on the exponential weight, risk seeking or risk averse behaviour is obtained. We demonstrate the approach on risk sensitive stochastic optimal control problems beyond the linearquadratic case, showing the intricate interaction of multimodal control with risk sensitivity. 1
Learning concurrent motor skills in versatile solution spaces
 In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS
, 2012
"... Abstract — Future robots need to autonomously acquire motor skills in order to reduce their reliance on human programming. Many motor skill learning methods concentrate on learning a single solution for a given task. However, discarding information about additional solutions during learning unnecess ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
(Show Context)
Abstract — Future robots need to autonomously acquire motor skills in order to reduce their reliance on human programming. Many motor skill learning methods concentrate on learning a single solution for a given task. However, discarding information about additional solutions during learning unnecessarily limits autonomy. Such favoring of single solutions often requires relearning of motor skills when the task, the environment or the robot’s body changes in a way that renders the learned solution infeasible. Future robots need to be able to adapt to such changes and, ideally, have a large repertoire of movements to cope with such problems. In contrast to current methods, our approach simultaneously learns multiple distinct solutions for the same task, such that a partial degeneration of this solution space does not prevent the successful completion of the task. In this paper, we present a complete framework that is capable of learning different solution strategies for a real robot Tetherball task. I.
EP for Efficient Stochastic Control with Obstacles
"... We address the problem of continuous stochastic optimal control in the presence of hard obstacles. Due to the nonsmooth character of the obstacles, the traditional approach using dynamic programming in combination with function approximation tends to fail. We consider a recently introduced special ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
We address the problem of continuous stochastic optimal control in the presence of hard obstacles. Due to the nonsmooth character of the obstacles, the traditional approach using dynamic programming in combination with function approximation tends to fail. We consider a recently introduced special class of control problems for which the optimal control computation is reformulated in terms of a path integral. The path integral is typically intractable, but amenable to techniques developed for approximate inference.We argue that the variational approach fails in this case due to the nonsmooth cost function. Sampling techniques are simple to implement and converge to the exact results given enough samples. However, the infinite cost associated with hard obstacles renders the sampling procedures inefficient in practice. We suggest Expectation Propagation (EP) as a suitable approximation method, and compare the quality and efficiency of the resulting control with an MC sampler on a car steering task and a ball throwing task. We conclude that EP can solve these challenging problems much better than a sampling approach.
Robot learning of everyday object manipulations via human demonstration
 in IROS
, 2010
"... Abstract — We deal with the problem of teaching a robot to manipulate everyday objects through human demonstration. We first design a task descriptor which encapsulates important elements of a task. The design originates from observations that manipulations involved in many everyday object tasks can ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
Abstract — We deal with the problem of teaching a robot to manipulate everyday objects through human demonstration. We first design a task descriptor which encapsulates important elements of a task. The design originates from observations that manipulations involved in many everyday object tasks can be considered as a series of sequential rotations and translations, which we call manipulation primitives. We then propose a method that enables a robot to decompose a demonstrated task into sequential manipulation primitives and construct a task descriptor. We also show how to transfer a task descriptor learned from one object to similar objects. In the end, we argue that this framework is highly generic. Particularly, it can be used to construct a robot task database that serves as a manipulation knowledge base for a robot to succeed in manipulating everyday objects. I.