Results 1  10
of
138
A Survey of Robot Learning from Demonstration
"... We present a comprehensive survey of robot Learning from Demonstration (LfD), a technique that develops policies from example state to action mappings. We introduce the LfD design choices in terms of demonstrator, problem space, policy derivation and performance, and contribute the foundations for a ..."
Abstract

Cited by 274 (19 self)
 Add to MetaCart
We present a comprehensive survey of robot Learning from Demonstration (LfD), a technique that develops policies from example state to action mappings. We introduce the LfD design choices in terms of demonstrator, problem space, policy derivation and performance, and contribute the foundations for a structure in which to categorize LfD research. Specifically, we analyze and categorize the multiple ways in which examples are gathered, ranging from teleoperation to imitation, as well as the various techniques for policy derivation, including matching functions, dynamics models and plans. To conclude we discuss LfD limitations and related promising areas for future research.
Maximum entropy inverse reinforcement learning
 In Proc. AAAI
, 2008
"... Recent research has shown the benefit of framing problems of imitation learning as solutions to Markov Decision Problems. This approach reduces learning to the problem of recovering a utility function that makes the behavior induced by a nearoptimal policy closely mimic demonstrated behavior. In th ..."
Abstract

Cited by 109 (20 self)
 Add to MetaCart
(Show Context)
Recent research has shown the benefit of framing problems of imitation learning as solutions to Markov Decision Problems. This approach reduces learning to the problem of recovering a utility function that makes the behavior induced by a nearoptimal policy closely mimic demonstrated behavior. In this work, we develop a probabilistic approach based on the principle of maximum entropy. Our approach provides a welldefined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods. We develop our technique in the context of modeling realworld navigation and driving behaviors where collected data is inherently noisy and imperfect. Our probabilistic approach enables modeling of route preferences as well as a powerful new approach to inferring destinations and routes based on partial trajectories.
(Online) Subgradient Methods for Structured Prediction
"... Promising approaches to structured learning problems have recently been developed in the maximum margin framework. Unfortunately, algorithms that are computationally and memory efficient enough to solve large scale problems have lagged behind. We propose using simple subgradientbased techniques for ..."
Abstract

Cited by 86 (15 self)
 Add to MetaCart
(Show Context)
Promising approaches to structured learning problems have recently been developed in the maximum margin framework. Unfortunately, algorithms that are computationally and memory efficient enough to solve large scale problems have lagged behind. We propose using simple subgradientbased techniques for optimizing a regularized risk formulation of these problems in both online and batch settings, and analyze the theoretical convergence, generalization, and robustness properties of the resulting techniques. These algorithms are are simple, memory efficient, fast to converge, and have small regret in the online setting. We also investigate a novel convex regression formulation of structured learning. Finally, we demonstrate the benefits of the subgradient approach on three structured prediction problems. 1
Bundle Methods for Regularized Risk Minimization
"... A wide variety of machine learning problems can be described as minimizing a regularized risk functional, with different algorithms using different notions of risk and different regularizers. Examples include linear Support Vector Machines (SVMs), Gaussian Processes, Logistic Regression, Conditional ..."
Abstract

Cited by 78 (4 self)
 Add to MetaCart
A wide variety of machine learning problems can be described as minimizing a regularized risk functional, with different algorithms using different notions of risk and different regularizers. Examples include linear Support Vector Machines (SVMs), Gaussian Processes, Logistic Regression, Conditional Random Fields (CRFs), and Lasso amongst others. This paper describes the theory and implementation of a scalable and modular convex solver which solves all these estimation problems. It can be parallelized on a cluster of workstations, allows for datalocality, and can deal with regularizers such as L1 and L2 penalties. In addition to the unified framework we present tight convergence bounds, which show that our algorithm converges in O(1/ɛ) steps to ɛ precision for general convex problems and in O(log(1/ɛ)) steps for continuously differentiable problems. We demonstrate the performance of our general purpose solver on a variety of publicly available datasets.
Learning for Control from Multiple Demonstrations
"... We consider the problem of learning to follow a desired trajectory when given a small number of demonstrations from a suboptimal expert. We present an algorithm that (i) extracts the—initially unknown—desired trajectory from the suboptimal expert’s demonstrations and (ii) learns a local model suit ..."
Abstract

Cited by 72 (9 self)
 Add to MetaCart
We consider the problem of learning to follow a desired trajectory when given a small number of demonstrations from a suboptimal expert. We present an algorithm that (i) extracts the—initially unknown—desired trajectory from the suboptimal expert’s demonstrations and (ii) learns a local model suitable for control along the learned trajectory. We apply our algorithm to the problem of autonomous helicopter flight. In all cases, the autonomous helicopter’s performance exceeds that of our expert helicopter pilot’s demonstrations. Even stronger, our results significantly extend the stateoftheart in autonomous helicopter aerobatics. In particular, our results include the first autonomous tictocs, loops and hurricane, vastly superior performance on previously performed aerobatic maneuvers (such as inplace flips and rolls), and a complete airshow, which requires autonomous transitions between these and various other maneuvers. 1.
Slow learners are fast
 In NIPS
, 2009
"... Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems. However, they are inherently sequential in their design which prevents them from taking advantage of modern multicore architectures. In this paper we prove t ..."
Abstract

Cited by 70 (4 self)
 Add to MetaCart
(Show Context)
Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems. However, they are inherently sequential in their design which prevents them from taking advantage of modern multicore architectures. In this paper we prove that online learning with delayed updates converges well, thereby facilitating parallel online learning. 1
Autonomous helicopter aerobatics through apprenticeship learning
 International Journal of Robotics Research
"... Autonomous helicopter flight is widely regarded to be a highly challenging control problem. Despite this fact, human experts can reliably fly helicopters through a wide range of maneuvers, including aerobatic maneuvers at the edge of the helicopter’s capabilities. We present apprenticeship learning ..."
Abstract

Cited by 62 (2 self)
 Add to MetaCart
Autonomous helicopter flight is widely regarded to be a highly challenging control problem. Despite this fact, human experts can reliably fly helicopters through a wide range of maneuvers, including aerobatic maneuvers at the edge of the helicopter’s capabilities. We present apprenticeship learning algorithms, which leverage expert demonstrations to efficiently learn good controllers for tasks being demonstrated by an expert. These apprenticeship learning algorithms have enabled us to significantly extend the state of the art in autonomous helicopter aerobatics. Our experimental results include the first autonomous execution of a wide range of maneuvers, including but not limited to inplace flips, inplace rolls, loops and hurricanes, and even autorotation landings, chaos and tictocs, which only exceptional human pilots can perform. Our results also include complete airshows, which require autonomous transitions between many of these maneuvers. Our controllers perform as well as, and often even better than, our expert pilot.
Learning to Search: Functional Gradient Techniques for Imitation Learning
 Autonomous Robots
, 2009
"... Programming robot behavior remains a challenging task. While it is often easy to abstractly define or even demonstrate a desired behavior, designing a controller that embodies the same behavior is difficult, time consuming, and ultimately expensive. The machine learning paradigm offers the promise o ..."
Abstract

Cited by 59 (19 self)
 Add to MetaCart
(Show Context)
Programming robot behavior remains a challenging task. While it is often easy to abstractly define or even demonstrate a desired behavior, designing a controller that embodies the same behavior is difficult, time consuming, and ultimately expensive. The machine learning paradigm offers the promise of enabling “programming by demonstration ” for developing highperformance robotic systems. Unfortunately, many “behavioral cloning ” (Bain & Sammut, 1995; Pomerleau, 1989; LeCun et al., 2006) approaches that utilize classical tools of supervised learning (e.g. decision trees, neural networks, or support vector machines) do not fit the needs of modern robotic systems. These systems are often built atop sophisticated planning algorithms that efficiently reason far into the future; consequently, ignoring these planning algorithms in lieu of a supervised learning approach often leads to myopic and poorquality robot performance. While planning algorithms have shown success in many realworld applications ranging from legged locomotion (Chestnutt et al., 2003) to outdoor unstructured navigation (Kelly et al., 2004; Stentz, 2009), such algorithms rely on fully specified cost functions that map sensor readings and environment models to quantifiable costs. Such cost functions are usually manually designed and programmed. Recently, a set of techniques has been developed that explore learning these functions from expert human demonstration.
Apprenticeship learning using inverse reinforcement learning and gradient methods
 Proc. UAI
, 2007
"... In this paper we propose a novel gradient algorithm to learn a policy from an expert’s observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem. The algorithm’s aim is to find a reward function such that the resulting o ..."
Abstract

Cited by 56 (1 self)
 Add to MetaCart
In this paper we propose a novel gradient algorithm to learn a policy from an expert’s observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem. The algorithm’s aim is to find a reward function such that the resulting optimal policy matches well the expert’s observed behavior. The main difficulty is that the mapping from the parameters to policies is both nonsmooth and highly redundant. Resorting to subdifferentials solves the first difficulty, while the second one is overcome by computing natural gradients. We tested the proposed method in two artificial domains and found it to be more reliable and efficient than some previous methods. 1