• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Learning for Control from Multiple Demonstrations

Cached

  • Download as a PDF

Download Links

  • [heli.stanford.edu]
  • [www.robotics.stanford.edu]
  • [cs.stanford.edu]
  • [www.cs.stanford.edu]
  • [ai.stanford.edu]
  • [icml2008.cs.helsinki.fi]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Adam Coates , Pieter Abbeel , Andrew Y. Ng
Citations:24 - 5 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Coates_learningfor,
    author = {Adam Coates and Pieter Abbeel and Andrew Y. Ng},
    title = {Learning for Control from Multiple Demonstrations},
    year = {}
}

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

We consider the problem of learning to follow a desired trajectory when given a small number of demonstrations from a sub-optimal expert. We present an algorithm that (i) extracts the—initially unknown—desired trajectory from the sub-optimal expert’s demonstrations and (ii) learns a local model suitable for control along the learned trajectory. We apply our algorithm to the problem of autonomous helicopter flight. In all cases, the autonomous helicopter’s performance exceeds that of our expert helicopter pilot’s demonstrations. Even stronger, our results significantly extend the state-of-the-art in autonomous helicopter aerobatics. In particular, our results include the first autonomous tic-tocs, loops and hurricane, vastly superior performance on previously performed aerobatic maneuvers (such as in-place flips and rolls), and a complete airshow, which requires autonomous transitions between these and various other maneuvers. 1.

Citations

6236 Maximum likelihood from incomplete data via EM algorithm - Dempster, Laird, et al. - 1977
1009 A general method applicable to the search for similarities in the amino acid sequence of two proteins.J - Needleman, Wunsch - 1970
287 Dynamic programming algorithm optimization for spoken word recognition - Sakoe, Chiba - 1978
248 S.: Robot learning from demonstration - Atkeson, Schaal - 1997
240 Context-specific independence in bayesian networks - Boutilier, Friedman, et al. - 1996
143 Apprenticeship learning via inverse reinforcement learning - Abbeel, Ng - 2004
137 Locally weighted learning for control - Atkeson, Moore, et al. - 1997
126 Algorithms for inverse reinforcement learning - Ng, Russell - 2000
81 Model-based control of a robot manipulator - An, Atkeson, et al. - 1988
78 Differential Dynamic Programming - Jacobson, Mayne - 1970
66 M.: Maximum margin planning - Ratliff, Bagnell, et al. - 2006
61 On learning, representing and generalizing a task in a humanoid robot - Calinon, Guenter, et al. - 2007
54 An application of reinforcement learning to aerobatic helicopter flight - Abbeel, Coates, et al. - 2007
31 Bayesian inverse reinforcement learning - Ramachandran, Amir
29 Apprenticeship learning using inverse reinforcement learning and gradient methods - Neu, Szepesvári
28 Autonomous inverted helicopter flight via reinforcement learning - Ng, Coates, et al. - 2004
24 Using inaccurate models in reinforcement learning - ABBEEL, QUIGLEY, et al.
22 Multiple alignment of continuous time series - Listgarten, Neal, et al. - 2004
20 Control logic for automated aerobatic flight of miniature helicopter - Gavrilets, Martinos, et al. - 2002
19 Modeling Vehicular Dynamics, with Application to Modeling Helicopters - ABBEEL, GANAPATHI, et al.
3 Analysis of Sibling Time Series Data: Alignment and Difference Detection - Listgarten - 2006
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University