Results 1 - 10
of
21
A Survey of Robot Learning from Demonstration
"... We present a comprehensive survey of robot Learning from Demonstration (LfD), a technique that develops policies from example state to action mappings. We introduce the LfD design choices in terms of demonstrator, problem space, policy derivation and performance, and contribute the foundations for a ..."
Abstract
-
Cited by 63 (15 self)
- Add to MetaCart
We present a comprehensive survey of robot Learning from Demonstration (LfD), a technique that develops policies from example state to action mappings. We introduce the LfD design choices in terms of demonstrator, problem space, policy derivation and performance, and contribute the foundations for a structure in which to categorize LfD research. Specifically, we analyze and categorize the multiple ways in which examples are gathered, ranging from teleoperation to imitation, as well as the various techniques for policy derivation, including matching functions, dynamics models and plans. To conclude we discuss LfD limitations and related promising areas for future research.
Transfer Learning for Reinforcement Learning Domains: A Survey
"... The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of transfer learning has only recently been applied to reinforcement learning tasks. The core idea of transfer is that experience gained in learning to perform one task can help improve learning performance in a related, but different, task. In this article we present a framework that classifies transfer learning methods in terms of their capabilities and goals, and then use it to survey the existing literature, as well as to suggest future directions for transfer learning work.
Learning to Search: Functional Gradient Techniques for Imitation Learning
- Autonomous Robots
, 2009
"... Programming robot behavior remains a challenging task. While it is often easy to abstractly define or even demonstrate a desired behavior, designing a controller that embodies the same behavior is difficult, time consuming, and ultimately expensive. The machine learning paradigm offers the promise o ..."
Abstract
-
Cited by 26 (11 self)
- Add to MetaCart
Programming robot behavior remains a challenging task. While it is often easy to abstractly define or even demonstrate a desired behavior, designing a controller that embodies the same behavior is difficult, time consuming, and ultimately expensive. The machine learning paradigm offers the promise of enabling “programming by demonstration ” for developing high-performance robotic systems. Unfortunately, many “behavioral cloning ” (Bain & Sammut, 1995; Pomerleau, 1989; LeCun et al., 2006) approaches that utilize classical tools of supervised learning (e.g. decision trees, neural networks, or support vector machines) do not fit the needs of modern robotic systems. These systems are often built atop sophisticated planning algorithms that efficiently reason far into the future; consequently, ignoring these planning algorithms in lieu of a supervised learning approach often leads to myopic and poor-quality robot performance. While planning algorithms have shown success in many real-world applications ranging from legged locomotion (Chestnutt et al., 2003) to outdoor unstructured navigation (Kelly et al., 2004; Stentz, 2009), such algorithms rely on fully specified cost functions that map sensor readings and environment models to quantifiable costs. Such cost functions are usually manually designed and programmed. Recently, a set of techniques has been developed that explore learning these functions from expert human demonstration.
A complete control architecture for quadruped locomotion over rough terrain
- In Proceedings of the International Conference on Robotics and Automation
, 2008
"... Abstract — Legged robots have the potential to navigate a much larger variety of terrain than their wheeled counterparts. In this paper we present a hierarchical control architecture that enables a quadruped, the “LittleDog ” robot, to walk over rough terrain. The controller consists of a high-level ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Abstract — Legged robots have the potential to navigate a much larger variety of terrain than their wheeled counterparts. In this paper we present a hierarchical control architecture that enables a quadruped, the “LittleDog ” robot, to walk over rough terrain. The controller consists of a high-level planner that plans a set of footsteps across the terrain, a low-level planner that plans trajectories for the robot’s feet and center of gravity (COG), and a low-level controller that tracks these desired trajectories using a set of closed-loop mechanisms. We conduct extensive experiments to verify that the controller is able to robustly cross a wide variety of challenging terrains, climbing over obstacles nearly as tall as the robot’s legs. In addition, we highlight several elements of the controller that we found to be particularly crucial for robust locomotion, and which are applicable to quadruped robots in general. In such cases we conduct empirical evaluations to test the usefulness of these elements.
LEARNING MOBILE ROBOT MOTION CONTROL FROM DEMONSTRATION AND CORRECTIVE FEEDBACK
, 2009
"... Fundamental to the successful, autonomous operation of mobile robots are robust motion control algorithms. Motion control algorithms determine an appropriate action to take based on the current state of the world. A robot observes the world through sensors, and executes physical actions through actu ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
Fundamental to the successful, autonomous operation of mobile robots are robust motion control algorithms. Motion control algorithms determine an appropriate action to take based on the current state of the world. A robot observes the world through sensors, and executes physical actions through actuation mechanisms. Sensors are noisy and can mislead, however, and actions are non-deterministic and thus execute with uncertainty. Furthermore, the trajectories produced by the physical motion devices of mobile robots are complex, which make them difficult to model and treat with traditional control approaches. Thus, to develop motion control algorithms for mobile robots poses a significant challenge, even for simple motion behaviors. As behaviors become more complex, the generation of appropriate control algorithms only becomes more challenging. To develop sophisticated motion behaviors for a dynamically balancing differential drive mobile robot is one target application for this thesis work. Not only are the desired behaviors complex, but prior experiences developing motion behaviors through traditional means for this robot proved to be tedious and demand a high level of expertise. One approach that mitigates many of these challenges is to develop motion control algorithms within a Learning from Demonstration (LfD) paradigm. Here, a behavior is represented as pairs
Learning Locomotion over Rough Terrain using Terrain Templates
"... Abstract — We address the problem of foothold selection in robotic legged locomotion over very rough terrain. The difficulty of the problem we address here is comparable to that of human rock-climbing, where foot/hand-hold selection is one of the most critical aspects. Previous work in this domain t ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Abstract — We address the problem of foothold selection in robotic legged locomotion over very rough terrain. The difficulty of the problem we address here is comparable to that of human rock-climbing, where foot/hand-hold selection is one of the most critical aspects. Previous work in this domain typically involves defining a reward function over footholds as a weighted linear combination of terrain features. However, a significant amount of effort needs to be spent in designing these features in order to model more complex decision functions, and hand-tuning their weights is not a trivial task. We propose the use of terrain templates, which are discretized height maps of the terrain under a foothold on different length scales, as an alternative to manually designed features. We describe an algorithm that can simultaneously learn a small set of templates and a foothold ranking function using these templates, from expertdemonstrated footholds. Using the LittleDog quadruped robot, we experimentally show that the use of terrain templates can produce complex ranking functions with higher performance than standard terrain features, and improved generalization to unseen terrain. I.
Learning to Search: Structured Prediction Techniques for Imitation Learning
, 2009
"... Modern robots successfully manipulate objects, navigate rugged terrain, drive in urban settings, and play world-class chess. Unfortunately, programming these robots is challenging, timeconsuming and expensive; the parameters governing their behavior are often unintuitive, even when the desired behav ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Modern robots successfully manipulate objects, navigate rugged terrain, drive in urban settings, and play world-class chess. Unfortunately, programming these robots is challenging, timeconsuming and expensive; the parameters governing their behavior are often unintuitive, even when the desired behavior is clear and easily demonstrated. Inspired by successful end-to-end learning systems such as neural network controlled driving platforms (Pomerleau, 1989), learning-based “programming by demonstration ” has gained currency as a method to achieve intelligent robot behavior. Unfortunately, with highly structured algorithms at their core, modern robotic systems are hard to train using classical learning techniques. Rather than redefining robot architectures to accommodate existing learning algorithms, this thesis develops learning techniques that leverage the performance of modern robotic components. We begin with a discussion of a novel imitation learning framework we call Maximum Margin Planning which automates finding a cost function for optimal planning and control algorithms such as A*. In the linear setting, this framework has firm theoretical backing in the form of strong generalization and regret bounds. Further, we have developed practical nonlinear generalizations that are effective and efficient for real-world problems. This framework reduces imitation learning
Inverse Optimal Heuristic Control for Imitation Learning
"... One common approach to imitation learning is behavioral cloning (BC), which employs straightforward supervised learning (i.e., classification) to directly map observations to controls. A second approach is inverse optimal control (IOC), which formalizes the problem of learning sequential decision-ma ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
One common approach to imitation learning is behavioral cloning (BC), which employs straightforward supervised learning (i.e., classification) to directly map observations to controls. A second approach is inverse optimal control (IOC), which formalizes the problem of learning sequential decision-making behavior over long horizons as a problem of recovering a utility function that explains observed behavior. This paper presents inverse optimal heuristic control (IOHC), a novel approach to imitation learning that capitalizes on the strengths of both paradigms. It employs long-horizon IOC-style modeling in a low-dimensional space where inference remains tractable, while incorporating an additional descriptive set of BC-style features to guide a higher-dimensional overall action selection. We provide experimental results demonstrating the capabilities of our model on a simple illustrative problem as well as on two real world problems: turn-prediction for taxi drivers, and pedestrian prediction within an office environment. 1
Superhuman Performance of Surgical Tasks by Robots using Iterative Learning from Human-Guided Demonstrations
"... Abstract — In the future, robotic surgical assistants may assist surgeons by performing specific subtasks such as retraction and suturing to reduce surgeon tedium and reduce the duration of some operations. We propose an apprenticeship learning approach that has potential to allow robotic surgical a ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract — In the future, robotic surgical assistants may assist surgeons by performing specific subtasks such as retraction and suturing to reduce surgeon tedium and reduce the duration of some operations. We propose an apprenticeship learning approach that has potential to allow robotic surgical assistants to autonomously execute specific trajectories with superhuman performance in terms of speed and smoothness. In the first step, we record a set of trajectories using human-guided backdriven motions of the robot. These are then analyzed to extract a smooth reference trajectory, which we execute at gradually increasing speeds using a variant of iterative learning control. We evaluate this approach on two representative tasks using the Berkeley Surgical Robots: a figure eight trajectory and a two handed knot-tie, a tedious suturing sub-task required in many surgical procedures. Results suggest that the approach enables (i) rapid learning of trajectories, (ii) smoother trajectories than the human-guided trajectories, and (iii) trajectories that are 7 to 10 times faster than the best human-guided trajectories. I.
A Reduction from Apprenticeship Learning to Classification
"... We provide new theoretical results for apprenticeship learning, a variant of reinforcement learning in which the true reward function is unknown, and the goal is to perform well relative to an observed expert. We study a common approach to learning from expert demonstrations: using a classification ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We provide new theoretical results for apprenticeship learning, a variant of reinforcement learning in which the true reward function is unknown, and the goal is to perform well relative to an observed expert. We study a common approach to learning from expert demonstrations: using a classification algorithm to learn to imitate the expert’s behavior. Although this straightforward learning strategy is widely-used in practice, it has been subject to very little formal analysis. We prove that, if the learned classifier has error rate ǫ, the difference between the value of the apprentice’s policy and the expert’s policy is O ( √ ǫ). Further, we prove that this difference is onlyO(ǫ) when the expert’s policy is close to optimal. This latter result has an important practical consequence: Not only does imitating a near-optimal expert result in a better policy, but far fewer demonstrations are required to successfully imitate such an expert. This suggests an opportunity for substantial savings whenever the expert is known to be good, but demonstrations are expensive or difficult to obtain. 1

