Results 1 - 10
of
30
Maximum entropy inverse reinforcement learning
- In Proc. AAAI
, 2008
"... Recent research has shown the benefit of framing problems of imitation learning as solutions to Markov Decision Problems. This approach reduces learning to the problem of recovering a utility function that makes the behavior induced by a near-optimal policy closely mimic demonstrated behavior. In th ..."
Abstract
-
Cited by 31 (10 self)
- Add to MetaCart
Recent research has shown the benefit of framing problems of imitation learning as solutions to Markov Decision Problems. This approach reduces learning to the problem of recovering a utility function that makes the behavior induced by a near-optimal policy closely mimic demonstrated behavior. In this work, we develop a probabilistic approach based on the principle of maximum entropy. Our approach provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods. We develop our technique in the context of modeling realworld navigation and driving behaviors where collected data is inherently noisy and imperfect. Our probabilistic approach enables modeling of route preferences as well as a powerful new approach to inferring destinations and routes based on partial trajectories.
Learning for Control from Multiple Demonstrations
"... We consider the problem of learning to follow a desired trajectory when given a small number of demonstrations from a sub-optimal expert. We present an algorithm that (i) extracts the—initially unknown—desired trajectory from the sub-optimal expert’s demonstrations and (ii) learns a local model suit ..."
Abstract
-
Cited by 24 (5 self)
- Add to MetaCart
We consider the problem of learning to follow a desired trajectory when given a small number of demonstrations from a sub-optimal expert. We present an algorithm that (i) extracts the—initially unknown—desired trajectory from the sub-optimal expert’s demonstrations and (ii) learns a local model suitable for control along the learned trajectory. We apply our algorithm to the problem of autonomous helicopter flight. In all cases, the autonomous helicopter’s performance exceeds that of our expert helicopter pilot’s demonstrations. Even stronger, our results significantly extend the state-of-the-art in autonomous helicopter aerobatics. In particular, our results include the first autonomous tic-tocs, loops and hurricane, vastly superior performance on previously performed aerobatic maneuvers (such as in-place flips and rolls), and a complete airshow, which requires autonomous transitions between these and various other maneuvers. 1.
Valuebased policy teaching with active indirect elicitation
- In Proc. 23rd National Conference on Artificial Intelligence
, 2008
"... Many situations arise in which an interested party’s utility is dependent on the actions of an agent; e.g., a teacher is interested in a student learning effectively and a firm is interested in a consumer’s behavior. We consider an environment in which the interested party can provide incentives to ..."
Abstract
-
Cited by 13 (10 self)
- Add to MetaCart
Many situations arise in which an interested party’s utility is dependent on the actions of an agent; e.g., a teacher is interested in a student learning effectively and a firm is interested in a consumer’s behavior. We consider an environment in which the interested party can provide incentives to affect the agent’s actions but cannot otherwise enforce actions. In value-based policy teaching, we situate this within the framework of sequential decision tasks modeled by Markov Decision Processes, and seek to associate limited rewards with states that induce the agent to follow a policy that maximizes the total expected value of the interested party. We show value-based policy teaching is NP-hard and provide a mixed integer program formulation. Focusing in particular on environments in which the agent’s reward is unknown to the interested party, we provide a method for active indirect elicitation wherein the agent’s reward function is inferred from observations about its response to incentives. Experimental results suggest that we can generally find the optimal incentive provision in a small number of elicitation rounds.
Apprenticeship learning for helicopter control
- Communications of the ACM
"... doi:10.1145/1538788.1538812 Autonomous helicopter flight is widely regarded to be a highly challenging control problem. As helicopters are highly unstable and exhibit complicated dynamical behavior, it is particularly difficult to design controllers that achieve high performance over a broad flight ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
doi:10.1145/1538788.1538812 Autonomous helicopter flight is widely regarded to be a highly challenging control problem. As helicopters are highly unstable and exhibit complicated dynamical behavior, it is particularly difficult to design controllers that achieve high performance over a broad flight regime. While these aircraft are notoriously difficult to control, there are expert human pilots who are nonetheless capable of demonstrating a wide variety of maneuvers, including aerobatic maneuvers at the edge of the helicopter’s performance envelope. In this paper, we present algorithms for modeling and control that leverage these demonstrations to build high-performance control systems for autonomous helicopters. More specifically, we detail our experiences with the Stanford Autonomous Helicopter, which is now capable of extreme aerobatic flight meeting or exceeding the performance of our own expert pilot. 1.
Planning-based Prediction for Pedestrians
"... Abstract — We present a novel approach for determining robot movements that efficiently accomplish the robot’s tasks while not hindering the movements of people within the environment. Our approach models the goal-directed trajectories of pedestrians using maximum entropy inverse optimal control. Th ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
Abstract — We present a novel approach for determining robot movements that efficiently accomplish the robot’s tasks while not hindering the movements of people within the environment. Our approach models the goal-directed trajectories of pedestrians using maximum entropy inverse optimal control. The advantage of this modeling approach is the generality of its learned cost function to changes in the environment and to entirely different environments. We employ the predictions of this model of pedestrian trajectories in a novel incremental planner and quantitatively show the improvement in hindrancesensitive robot trajectory planning provided by our approach. I.
Active Learning for Reward Estimation in Inverse Reinforcement Learning
, 2009
"... Inverse reinforcement learning addresses the general problem of recovering a reward function from samples of a policy provided by an expert/demonstrator. In this paper, we introduce active learning for inverse reinforcement learning. We propose an algorithm that allows the agent to query the demonst ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Inverse reinforcement learning addresses the general problem of recovering a reward function from samples of a policy provided by an expert/demonstrator. In this paper, we introduce active learning for inverse reinforcement learning. We propose an algorithm that allows the agent to query the demonstrator for samples at specific states, instead of relying only on samples provided at “arbitrary” states. The purpose of our algorithm is to estimate the reward function with similar accuracy as other methods from the literature while reducing the amount of policy samples required from the expert. We also discuss the use of our algorithm in higher dimensional problems, using both Monte Carlo and gradient methods. We present illustrative results of our algorithm in several simulated examples of different complexities.
Abstraction Levels for Robotic Imitation: Overview and Computational Approaches
, 2010
"... This chapter reviews several approaches to the problem of learning by imitation in robotics. We start by describing several cognitive processes identified in the literature as necessary for imitation. We then proceed by surveying different approaches to this problem, placing particular emphasys on m ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This chapter reviews several approaches to the problem of learning by imitation in robotics. We start by describing several cognitive processes identified in the literature as necessary for imitation. We then proceed by surveying different approaches to this problem, placing particular emphasys on methods whereby an agent first learns about its own body dynamics by means of self-exploration and then uses this knowledge about its own body to recognize the actions being performed by other agents. This general approach is related to the motor theory of perception, particularly to the mirror neurons found in primates. We distinguish three fundamental classes of methods, corresponding to three abstraction levels at which imitation can be addressed. As such, the methods surveyed herein exhibit behaviors that range from raw sensory-motor trajectory matching to high-level abstract task replication. We also discuss the impact that knowledge about the world and/or the demonstrator can have on the particular behaviors exhibited.
Learning to Search: Structured Prediction Techniques for Imitation Learning
, 2009
"... Modern robots successfully manipulate objects, navigate rugged terrain, drive in urban settings, and play world-class chess. Unfortunately, programming these robots is challenging, timeconsuming and expensive; the parameters governing their behavior are often unintuitive, even when the desired behav ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Modern robots successfully manipulate objects, navigate rugged terrain, drive in urban settings, and play world-class chess. Unfortunately, programming these robots is challenging, timeconsuming and expensive; the parameters governing their behavior are often unintuitive, even when the desired behavior is clear and easily demonstrated. Inspired by successful end-to-end learning systems such as neural network controlled driving platforms (Pomerleau, 1989), learning-based “programming by demonstration ” has gained currency as a method to achieve intelligent robot behavior. Unfortunately, with highly structured algorithms at their core, modern robotic systems are hard to train using classical learning techniques. Rather than redefining robot architectures to accommodate existing learning algorithms, this thesis develops learning techniques that leverage the performance of modern robotic components. We begin with a discussion of a novel imitation learning framework we call Maximum Margin Planning which automates finding a cost function for optimal planning and control algorithms such as A*. In the linear setting, this framework has firm theoretical backing in the form of strong generalization and regret bounds. Further, we have developed practical nonlinear generalizations that are effective and efficient for real-world problems. This framework reduces imitation learning
Inverse Optimal Heuristic Control for Imitation Learning
"... One common approach to imitation learning is behavioral cloning (BC), which employs straightforward supervised learning (i.e., classification) to directly map observations to controls. A second approach is inverse optimal control (IOC), which formalizes the problem of learning sequential decision-ma ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
One common approach to imitation learning is behavioral cloning (BC), which employs straightforward supervised learning (i.e., classification) to directly map observations to controls. A second approach is inverse optimal control (IOC), which formalizes the problem of learning sequential decision-making behavior over long horizons as a problem of recovering a utility function that explains observed behavior. This paper presents inverse optimal heuristic control (IOHC), a novel approach to imitation learning that capitalizes on the strengths of both paradigms. It employs long-horizon IOC-style modeling in a low-dimensional space where inference remains tractable, while incorporating an additional descriptive set of BC-style features to guide a higher-dimensional overall action selection. We provide experimental results demonstrating the capabilities of our model on a simple illustrative problem as well as on two real world problems: turn-prediction for taxi drivers, and pedestrian prediction within an office environment. 1
Bootstrapping Apprenticeship Learning
"... We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient tool for generalizing the demonstration, based on the assumption that the expert is maximizing ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient tool for generalizing the demonstration, based on the assumption that the expert is maximizing a utility function that is a linear combination of state-action features. Most IRL algorithms use a simple Monte Carlo estimation to approximate the expected feature counts under the expert’s policy. In this paper, we show that the quality of the learned policies is highly sensitive to the error in estimating the feature counts. To reduce this error, we introduce a novel approach for bootstrapping the demonstration by assuming that: (i), the expert is (near-)optimal, and (ii), the dynamics of the system is known. Empirical results on gridworlds and car racing problems show that our approach is able to learn good policies from a small number of demonstrations. 1

