Results 1  10
of
114
Policy gradient methods for robotics
 In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS
, 2006
"... Abstract — The aquisition and improvement of motor skills and control policies for robotics from trial and error is of essential importance if robots should ever leave precisely prestructured environments. However, to date only few existing reinforcement learning methods have been scaled into the d ..."
Abstract

Cited by 84 (19 self)
 Add to MetaCart
(Show Context)
Abstract — The aquisition and improvement of motor skills and control policies for robotics from trial and error is of essential importance if robots should ever leave precisely prestructured environments. However, to date only few existing reinforcement learning methods have been scaled into the domains of highdimensional robots such as manipulator, legged or humanoid robots. Policy gradient methods remain one of the few exceptions and have found a variety of applications. Nevertheless, the application of such methods is not without peril if done in an uninformed manner. In this paper, we give an overview on learning with policy gradient methods for robotics with a strong focus on recent advances in the field. We outline previous applications to robotics and show how the most recently developed methods can significantly improve learning performance. Finally, we evaluate our most promising algorithm in the application of hitting a baseball with an anthropomorphic arm. I.
Protovalue functions: A laplacian framework for learning representation and control in markov decision processes
 Journal of Machine Learning Research
, 2006
"... This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by d ..."
Abstract

Cited by 71 (10 self)
 Add to MetaCart
(Show Context)
This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by diagonalizing symmetric diffusion operators (ii) A specific instantiation of this approach where global basis functions called protovalue functions (PVFs) are formed using the eigenvectors of the graph Laplacian on an undirected graph formed from state transitions induced by the MDP (iii) A threephased procedure called representation policy iteration comprising of a sample collection phase, a representation learning phase that constructs basis functions from samples, and a final parameter estimation phase that determines an (approximately) optimal policy within the (linear) subspace spanned by the (current) basis functions. (iv) A specific instantiation of the RPI framework using leastsquares policy iteration (LSPI) as the parameter estimation method (v) Several strategies for scaling the proposed approach to large discrete and continuous state spaces, including the Nyström extension for outofsample interpolation of eigenfunctions, and the use of Kronecker sum factorization to construct compact eigenfunctions in product spaces such as factored MDPs (vi) Finally, a series of illustrative discrete and continuous control tasks, which both illustrate the concepts and provide a benchmark for evaluating the proposed approach. Many challenges remain to be addressed in scaling the proposed framework to large MDPs, and several elaboration of the proposed framework are briefly summarized at the end.
Natural ActorCritic
, 2007
"... In this paper, we suggest a novel reinforcement learning architecture, the Natural ActorCritic. The actor updates are achieved using stochastic policy gradients employing Amari’s natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a valu ..."
Abstract

Cited by 68 (10 self)
 Add to MetaCart
(Show Context)
In this paper, we suggest a novel reinforcement learning architecture, the Natural ActorCritic. The actor updates are achieved using stochastic policy gradients employing Amari’s natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regression. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gradients. The critic makes use of a special basis function parameterization motivated by the policygradient compatible function approximation. We show that several wellknown reinforcement learning methods such as the original ActorCritic and Bradtke’s Linear Quadratic QLearning are in fact Natural ActorCritic algorithms. Empirical evaluations illustrate the effectiveness of our techniques in comparison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.
Learning movement primitives
 International Symposium on Robotics Research (ISRR2003
, 2004
"... Abstract. This paper discusses a comprehensive framework for modular motor control based on a recently developed theory of dynamic movement primitives (DMP). DMPs are a formulation of movement primitives with autonomous nonlinear differential equations, whose time evolution creates smooth kinematic ..."
Abstract

Cited by 68 (10 self)
 Add to MetaCart
(Show Context)
Abstract. This paper discusses a comprehensive framework for modular motor control based on a recently developed theory of dynamic movement primitives (DMP). DMPs are a formulation of movement primitives with autonomous nonlinear differential equations, whose time evolution creates smooth kinematic control policies. Modelbased control theory is used to convert the outputs of these policies into motor commands. By means of coupling terms, online modifications can be incorporated into the time evolution of the differential equations, thus providing a rather flexible and reactive framework for motor planning and execution. The linear parameterization of DMPs lends itself naturally to supervised learning from demonstration. Moreover, the temporal, scale, and translation invariance of the differential equations with respect to these parameters provides a useful means for movement recognition. A novel reinforcement learning technique based on natural stochastic policy gradients allows a general approach of improving DMPs by trial and error learning with respect to almost arbitrary optimization criteria. We demonstrate the different ingredients of the DMP approach in various examples, involving skill learning from demonstration on the humanoid robot DB, and learning biped walking from demonstration in simulation, including selfimprovement of the movement patterns towards energy efficiency through resonance tuning. 1
Learning from Observation Using Primitives
, 2004
"... This paper describes the use of task primitives in robot learning from observation. A framework has been developed that uses observed data to initially learn a task and then the agent goes on to increase its performance through repeated task performance (learning from practice). Data that is collect ..."
Abstract

Cited by 62 (5 self)
 Add to MetaCart
(Show Context)
This paper describes the use of task primitives in robot learning from observation. A framework has been developed that uses observed data to initially learn a task and then the agent goes on to increase its performance through repeated task performance (learning from practice). Data that is collected while a human performs a task is parsed into small parts of the task called primitives. Modules are created for each primitive that encode the movements required during the performance of the primitive, and when and where the primitives are performed. The feasibility of this method is currently being tested with agents that learn to play a virtual and an actual air hockey game. 1
Development environments for autonomous mobile robots: A survey
 Autonomous Robots
, 2007
"... Robotic Development Environments (RDEs) have come to play an increasingly important role in robotics research in general, and for the development of architectures for mobile robots in particular. Yet, no systematic evaluation of available RDEs has been performed; establishing a comprehensive list of ..."
Abstract

Cited by 53 (1 self)
 Add to MetaCart
Robotic Development Environments (RDEs) have come to play an increasingly important role in robotics research in general, and for the development of architectures for mobile robots in particular. Yet, no systematic evaluation of available RDEs has been performed; establishing a comprehensive list of evaluation criteria targeted at robotics applications is desirable that can subsequently be used to compare their strengths and weaknesses. Moreover, there are no practical evaluations of the usability and impact of a large selection of RDEs that provides researchers with the information necessary to select an RDE most suited to their needs, nor identifies trends in RDE research that suggest directions for future RDE development. This survey addresses the above by selecting and describing nine open source, freely available RDEs for mobile robots, evaluating and comparing them from various points of view. First, based on previous work concerning agent systems, a conceptual framework of four broad categories is established, encompassing the characteristics and capabilities that an RDE supports. Then, a practical evaluation of RDE usability in designing, implementing, and executing robot architectures is presented. Finally, the impact of specific RDEs on the field of robotics is addressed by providing a list of published applications and research projects that give concrete examples of areas in which systems have been used. The comprehensive evaluation and comparison of the nine RDEs concludes with suggestions of how to use the results of this survey and a brief discussion of future trends in RDE design. 1
Incremental Natural ActorCritic Algorithms
"... We present four new reinforcement learning algorithms based on actorcritic and naturalgradient ideas, and provide their convergence proofs. Actorcritic reinforcement learning methods are online approximations to policy iteration in which the valuefunction parameters are estimated using temporal ..."
Abstract

Cited by 42 (3 self)
 Add to MetaCart
We present four new reinforcement learning algorithms based on actorcritic and naturalgradient ideas, and provide their convergence proofs. Actorcritic reinforcement learning methods are online approximations to policy iteration in which the valuefunction parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their compatibility with function approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variance in some cases. Our results extend prior twotimescale convergence results for actorcritic methods by Konda and Tsitsiklis by using temporal difference learning in the actor and by incorporating natural gradients, and they extend prior empirical studies of natural actorcritic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms. 1
Reinforcement learning for imitating constrained reaching movements
 RSJ Advanced Robotics
, 2007
"... The goal of developing algorithms for programming robots by demonstration is to create an easy way of programming robots such that it can be accomplished by anyone. When a demonstrator teaches a task to a robot, he/she shows some ways of fulfilling the task, but not all the possibilities. The robot ..."
Abstract

Cited by 38 (11 self)
 Add to MetaCart
(Show Context)
The goal of developing algorithms for programming robots by demonstration is to create an easy way of programming robots such that it can be accomplished by anyone. When a demonstrator teaches a task to a robot, he/she shows some ways of fulfilling the task, but not all the possibilities. The robot must then be able to reproduce the task even when unexpected perturbations occur. In this case, it has to learn a new solution. In this paper, we describe a system to teach to the robot constrained reaching tasks. Our system is based on a dynamical system generator modulated by a learned speed trajectory. This system is combined with a reinforcement learning module to allow the robot to adapt the trajectory when facing a new situation, for example in the presence of obstacles.