## Learning movement primitives (2004)

Venue: | International Symposium on Robotics Research (ISRR2003 |

Citations: | 61 - 10 self |

### BibTeX

@INPROCEEDINGS{Schaal04learningmovement,

author = {Stefan Schaal and Jan Peters and Jun Nakanishi and Auke Ijspeert},

title = {Learning movement primitives},

booktitle = {International Symposium on Robotics Research (ISRR2003},

year = {2004},

publisher = {Springer}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. This paper discusses a comprehensive framework for modular motor control based on a recently developed theory of dynamic movement primitives (DMP). DMPs are a formulation of movement primitives with autonomous nonlinear differential equations, whose time evolution creates smooth kinematic control policies. Model-based control theory is used to convert the outputs of these policies into motor commands. By means of coupling terms, on-line modifications can be incorporated into the time evolution of the differential equations, thus providing a rather flexible and reactive framework for motor planning and execution. The linear parameterization of DMPs lends itself naturally to supervised learning from demonstration. Moreover, the temporal, scale, and translation invariance of the differential equations with respect to these parameters provides a useful means for movement recognition. A novel reinforcement learning technique based on natural stochastic policy gradients allows a general approach of improving DMPs by trial and error learning with respect to almost arbitrary optimization criteria. We demonstrate the different ingredients of the DMP approach in various examples, involving skill learning from demonstration on the humanoid robot DB, and learning biped walking from demonstration in simulation, including self-improvement of the movement patterns towards energy efficiency through resonance tuning. 1

### Citations

4855 |
Neural Networks for Pattern Recognition
- Bishop
- 1994
(Show Context)
Citation Context ... the phase and frequency of the canonical system of the DMP in at the moment of heel-strike: ˙ � = � ˆ n robot + � ( t � theel�strike) ( �heel�strike � �); � ˆ n+1 = � ˆ n n + K � measured �� ˆ n ( ) =-=(16)-=- where � is the Dirac delta function, n is the number of steps, � = 1/� , and robot �heel�strike is the phase of the mechanical oscillator (robot) at heel strike defined as robot �heel�strike = 0 at t... |

1996 |
Robot Motion Planning
- Latombe
- 1991
(Show Context)
Citation Context ... high dimensional motor systems offers another challenge. While efficient planning in typical low dimensional industrial robots, usually characterized by three to six DOFs, is already a complex issue =-=[3, 4]-=-, optimal planning in 30 to 50 DOF systems with uncertain geometric and dynamic models is quite daunting, especially in the light of the required real-time performance in a reactive robotic system. As... |

1877 |
Numerical Recipes in C: The Art of Scientific Computing
- Press, Teukolsky, et al.
- 1992
(Show Context)
Citation Context ...s not possible to obtain analytical gradients dJ /dw. Thus, the gradient needs to be estimated from empirical data. Levenberg-Marquardt and Gauss-Newton algorithms are a possible choice for this task =-=[18]-=- – however, both algorithms often dare large jumps in parameter space under the assumption that an aggressive exploration of parameters is permissible, and are not very robust in stochastic settings. ... |

622 | Planning Algorithms
- LaValle
- 2006
(Show Context)
Citation Context ... high dimensional motor systems offers another challenge. While efficient planning in typical low dimensional industrial robots, usually characterized by three to six DOFs, is already a complex issue =-=[3, 4]-=-, optimal planning in 30 to 50 DOF systems with uncertain geometric and dynamic models is quite daunting, especially in the light of the required real-time performance in a reactive robotic system. As... |

427 | Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
- Sutton, Precup, et al.
- 1998
(Show Context)
Citation Context ...s a potential way to simplify learning control policies, research has turned towards a more macroscopic representation of policies in form of movement primitives, or, more precisely policy primitives =-=[11]-=-. Movement primitives are parameterized policies that can achieve a complete movement behavior. Planning complex movement based on a small number of such movement primitives can avoid the combinatoria... |

374 |
Theory and practice of recursive identification
- Ljung, Soderstrom
- 1983
(Show Context)
Citation Context ...to account the Riemannian structure of the space in which the optimization surface lies [20]. The above algorithm can also be formulated as a recursive estimation method using recursive least squares =-=[21]-=-, and the variance � 2 of the stochastic policy can be included in the parameters to be optimized. By updating w with gradient ascent (or descent) according to the natural gradient estimate, fast conv... |

322 | Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Williams
- 1992
(Show Context)
Citation Context ...ch movement started at y=0 and moved within 500ms to g=1. Figure 3 illustrates the convergence of the Natural Actor Critic algorithm in comparison to a non-natural gradient method, Episodic Reinforce =-=[27]-=-. The NAC algorithm converges smoothly with about one order of magnitude faster performance than Episodic Reinforce. Smooth bell-shaped velocity profiles of the DMP are reached after about 150-200 tri... |

226 | Is Imitation Learning the Route to Humanoid Robots
- Schaal
- 1999
(Show Context)
Citation Context ...., 16], a variety of learning algorithms exist to find wi. Let us assume we are given a sample trajectory ydemo (t), y ˙ demo (t), ˙ y demo (t) with duration T, e.g., as typical in imitation learning =-=[5]-=-. Based on this information, a supervised learning problem results with the following target for f: • For the transformation system in Equation (3), using g = ydemo (T ) : f target = � ˙ y demo � z de... |

224 | B: Modeling and Control of Robot Manipulators - Sciavicco, Siciliano - 1996 |

173 |
Introduction to Robotics
- Craig
- 1989
(Show Context)
Citation Context ...ble for motor control if we conceive of this dynamic system as a kinematic planning policy, whose outputs are subsequently converted to motor commands by an appropriate standard controller (Figure 2) =-=[12]-=-. It should be noted, however, that a kinematic representation of movement primitives is not necessarily independent of the dynamic properties of the limb. Proprioceptive feedback can be used to on-li... |

154 | Adaptive representation of dynamics during learning of a motor task
- Shadmehr, Mussa-Ivaldi
- 1994
(Show Context)
Citation Context ...rmance than Episodic Reinforce. Smooth bell-shaped velocity profiles of the DMP are reached after about 150-200 trials, which is compat=0n measured = � /Tn 9 rable to human learning in related tasks =-=[28]-=-. This initial evaluation demonstrates the efficiency of the NAC algorithm for optimizing DMPs, although more thorough evaluations are needed for various reward criteria and also multi-DOF tasks. 3.3 ... |

133 | Movement imitation with nonlinear dynamical systems
- Ijspeert, Nakanishi, et al.
(Show Context)
Citation Context ...ation and spatial and temporal scaling. Thus, the w vector can serve as a classifier for DMPs, e.g., by using nearest neighbor classification or more advanced classification techniques [e.g., 16]. In =-=[10]-=-, we demonstrated how DMPs can be used for character recognition of the Palm Pilot graffiti alphabet. 3 Evaluations The following sections give some examples of the abilities of DMPs in the context of... |

122 | Learning attractor landscapes for learning motor primitives
- Ijspeert, Nakanishi, et al.
(Show Context)
Citation Context ...oller that requires a continuous desired acceleration signal. A Phase Oscillator DMP By replacing the point attractor in the canonical system with a limit cycle oscillator, a rhythmic DMP is obtained =-=[9]-=-. Among the simplest limit cycle oscillators is a phase-amplitude representation: � r ˙ = � r ( A � r), � ˙ � = 1 (9) where r is the amplitude of the oscillator, A the desired amplitude, and � its pha... |

91 | Statistical learning for humanoid robots
- Vijayakumar, D’souza, et al.
(Show Context)
Citation Context ...sive exploration of parameters is permissible, and are not very robust in stochastic settings. As an alternative, we developed a novel reinforcement learning algorithm, the Natural-Actor Critic (NAC) =-=[19]-=-. The NAC is a stochastic gradient method, i.e., it injects noise in the control policy to provide the necessary exploration for learning. We will illustrate the NAC algorithm in the context of the ac... |

84 | Learning from demonstration and adaptation of biped locomotion. Robotics and Autonomous Systems
- Nakanishi, Morimoto, et al.
- 2004
(Show Context)
Citation Context ...for various reward criteria and also multi-DOF tasks. 3.3 Learning Resonance Tuning in Biped Locomotion As a last evaluation of DMPs, we applied the phase oscillator DMP to simulated biped locomotion =-=[29]-=- of a planar biped (Figure 4). Motion capture data from human locomotion was employed to learn an initial trajectory pattern, which, after some modest tuning of the speed and amplitude parameters of t... |

74 | Locally weighted projection regression: An o(n) algorithm for incremental real time learning in high dimensional space
- Vijayakumar, Schaal
(Show Context)
Citation Context ...d with training samples (v, ftarget ) (cf. Equations (5) and (10)). For solving the function approximation problem, we chose a nonparametric regression technique from locally weighted learning (LWPR) =-=[17]-=- as it allows us to determine the necessary number of basis functions N, their centers ci , and bandwidth hi automatically. 2.3 Reinforcement Learning of DMPs While imitation learning provides an exce... |

65 | Neural control of rhythmic arm movements. Neural Networks 11 - Williamson - 1998 |

62 | Trajectory formation for imitation with nonlinear dynamical systems
- Ijspeert, Nakanishi, et al.
- 2001
(Show Context)
Citation Context ... and switching between different movement amplitudes and frequencies was easily accomplished by changing the appropriate parameters of the DMP. Due to space constraints, we ask to refer the reader to =-=[8, 9, 23]-=- for detailed explanations and illustrations. Figure 3: Convergence of Natural Actor Critic reinforcement learning for learning the weights of a DMP. 3.2 Reinforcement Learning In a preliminary applic... |

50 | Learning rhythmic movements by demonstration using nonlinear oscillators
- Ijspeert, Nakanishi, et al.
- 2002
(Show Context)
Citation Context ...rather straightforward. The simplest form employs a separate DMP for every DOF. Alternatively, all DMPs could share the same canonical system, but have separate transformation systems, as realized in =-=[23]-=-. In this case, every DOF learns its own function f. By sharing the same canonical system, very complex phase relationships between individual DOFs can be realized and stabilized, for instance, as nee... |

35 | Programmable pattern generators
- Schaal, Sternad
- 1998
(Show Context)
Citation Context ...ory movement remains under the ball such that the task can be accomplished [e.g., 13, 24]. Other forms of superposition are conceivable, and future work will evaluate the most promising strategies. r =-=(14)-=-8 • Movement Recognition with DMPs: The invariance properties described above render the parameters w of a DMP insensitive towards movement translation and spatial and temporal scaling. Thus, the w v... |

31 |
Natural gradient learning for over- and undercomplete bases
- Amari
- 1999
(Show Context)
Citation Context ...offset in the regression. The natural gradient is a more efficient version of the regular gradient that takes into account the Riemannian structure of the space in which the optimization surface lies =-=[20]-=-. The above algorithm can also be formulated as a recursive estimation method using recursive least squares [21], and the variance � 2 of the stochastic policy can be included in the parameters to be ... |

23 |
Further progress in robot juggling: Solvable mirror laws
- Rizzi, Koditschek
- 1994
(Show Context)
Citation Context ...mmand ˙ y = z ˙ in the top equation of (8). In the NAC, this acceleration command is treated as the mean of a Gaussian control policy �( ˙ y | x,v,z, y) = 1 2�� 2 exp � 1 2� 2 ( ˙ y � ˙ y ) 2 � � � � =-=(13)-=- � � with variance � 2 . Given a reward r(˙ y ˙ , x,v, y,z) at every time step of the movement, the goal of learning is to optimize the expected accumulated reward T J( w) = E{ � r i=0 t} where the ex... |

19 | One-handed juggling: a dynamical approach to a rhythmic movement task - Schaal, Sternad, et al. - 1996 |

13 |
Composite adaptive control with locally weighted statistical learning
- Nakanishi, Farrell, et al.
(Show Context)
Citation Context ...der to address these questions, we suggest to resort to some of the most basic ideas of dynamic systems theory. A dynamic system can generally be written as a differential equation: x ˙ = f ( x,�,t ) =-=(2)-=- which is almost identical to Equation (1), except that the left-hand-side denotes a change-of-state, not a motor command. Such a kinematic formulation is, however, quite suitable for motor control if... |

3 |
Learning robot control. The handbook of brain theory and neural networks
- Schaal
- 2002
(Show Context)
Citation Context ...tion in Equations (3), (4), and (5) what is the most important idea of DMPs, but rather it is design principle. A DMP consists of two sets of differential equations: a canonical system � x ˙ = hx ( ) =-=(6)-=- and a transformation system � y ˙ = gy, ( f ( x) ) (7) The canonical system needs to generate two quantities: a phase variable x, and a phase velocity v, i.e., x = [ x v] T . The phase x is a substit... |

2 |
arm and hand movement control,” in Handbook of brain theory and neural
- Schaal
- 2002
(Show Context)
Citation Context ...portant idea of DMPs, but rather it is design principle. A DMP consists of two sets of differential equations: a canonical system � x ˙ = hx ( ) (6) and a transformation system � y ˙ = gy, ( f ( x) ) =-=(7)-=- The canonical system needs to generate two quantities: a phase variable x, and a phase velocity v, i.e., x = [ x v] T . The phase x is a substitute for time and allows us anchoring our spatially loca... |

2 |
1991] Perspectives of Nonlinear Dynamics Vol.1
- Jackson
(Show Context)
Citation Context ... time. From a dynamic systems point of view, we wish that the attractor landscape of a DMP does not change qualitatively after scaling, a topic addressed in the framework of “topological equivalence” =-=[22]-=-. It can easily be verified, that if a new DMP is created by multiplying �, g, or A in any of the DMPs above by a factor c, a simple multiplication of all state variables and change-of-state variables... |

1 |
Robosapiens: Evolution of a new species. 2000
- Menzel, D'Alusio
(Show Context)
Citation Context ... g can be guaranteed with a proper choice of �v and �v , e.g., such that (4) is critically damped. Assuming that all initial conditions of the state variables x, v, y, z are zero, the quotient x / g �=-=[0,1]-=- can serve as a phase variable to anchor the Gaussian basis functions � i (characterized by a center ci and bandwidth hi ), and v can act as a “gating term” in the nonlinear function (5) such that the... |

1 | What is optimized in muscular movements?, in Human Muscle - Stein, Ogusztöreli, et al. - 1986 |