Results 11  20
of
304
Apprenticeship learning using inverse reinforcement learning and gradient methods
 Proc. UAI
, 2007
"... In this paper we propose a novel gradient algorithm to learn a policy from an expert’s observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem. The algorithm’s aim is to find a reward function such that the resulting o ..."
Abstract

Cited by 56 (1 self)
 Add to MetaCart
(Show Context)
In this paper we propose a novel gradient algorithm to learn a policy from an expert’s observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem. The algorithm’s aim is to find a reward function such that the resulting optimal policy matches well the expert’s observed behavior. The main difficulty is that the mapping from the parameters to policies is both nonsmooth and highly redundant. Resorting to subdifferentials solves the first difficulty, while the second one is overcome by computing natural gradients. We tested the proposed method in two artificial domains and found it to be more reliable and efficient than some previous methods. 1
J.A.: Navigate like a cabbie: Probabilistic reasoning from observed contextaware behavior
 In: Proceedings of the 10th International Conference on Ubiquitous Computing
, 2008
"... We present PROCAB, an efficient method for Probabilistically Reasoning from Observed ContextAware Behavior. It models the contextdependent utilities and underlying reasons that people take different actions. The model generalizes to unseen situations and scales to incorporate rich contextual in ..."
Abstract

Cited by 55 (6 self)
 Add to MetaCart
(Show Context)
We present PROCAB, an efficient method for Probabilistically Reasoning from Observed ContextAware Behavior. It models the contextdependent utilities and underlying reasons that people take different actions. The model generalizes to unseen situations and scales to incorporate rich contextual information. We train our model using the route preferences of 25 taxi drivers demonstrated in over 100,000 miles of collected data, and demonstrate the performance of our model by inferring: (1) decision at next intersection, (2) route to known destination, and (3) destination given partially traveled route. Author Keywords Decision modeling, vehicle navigation, route prediction
Pairwise Preference Learning and Ranking
 Proceedings of the 14th European Conference on Machine Learning
, 2003
"... We consider supervised learning of a ranking function, which is a mapping from instances to total orders over a set of labels (options). The training information consists of examples with partial (and possibly inconsistent) information about their associated rankings. From these, we induce a rank ..."
Abstract

Cited by 52 (11 self)
 Add to MetaCart
(Show Context)
We consider supervised learning of a ranking function, which is a mapping from instances to total orders over a set of labels (options). The training information consists of examples with partial (and possibly inconsistent) information about their associated rankings. From these, we induce a ranking function by reducing the original problem to a number of binary classification problems, one for each pair of labels. The main objective of this work is to investigate the tradeoff between the quality of the induced ranking function and the computational complexity of the algorithm, both depending on the amount of preference information given for each example. To this end, we present theoretical results on the complexity of pairwise preference learning.
Bayesian models of human action understanding
 Advances in Neural Information Processing Systems 18
, 2006
"... We present a Bayesian framework for explaining how people reason about and predict the actions of an intentional agent, based on observing its behavior. Actionunderstanding is cast as a problem of inverting a probabilistic generative model, which assumes that agents tend to act rationally in order ..."
Abstract

Cited by 51 (6 self)
 Add to MetaCart
(Show Context)
We present a Bayesian framework for explaining how people reason about and predict the actions of an intentional agent, based on observing its behavior. Actionunderstanding is cast as a problem of inverting a probabilistic generative model, which assumes that agents tend to act rationally in order to achieve their goals given the constraints of their environment. Working in a simple spriteworld domain, we show how this model can be used to infer the goal of an agent and predict how the agent will act in novel situations or when environmental constraints change. The model provides a qualitative account of several kinds of inferences that preverbal infants have been shown to perform, and also fits quantitative predictions that adult observers make in a new experiment. 1
Active Learning for Reward Estimation in Inverse Reinforcement Learning
, 2009
"... Inverse reinforcement learning addresses the general problem of recovering a reward function from samples of a policy provided by an expert/demonstrator. In this paper, we introduce active learning for inverse reinforcement learning. We propose an algorithm that allows the agent to query the demonst ..."
Abstract

Cited by 42 (14 self)
 Add to MetaCart
(Show Context)
Inverse reinforcement learning addresses the general problem of recovering a reward function from samples of a policy provided by an expert/demonstrator. In this paper, we introduce active learning for inverse reinforcement learning. We propose an algorithm that allows the agent to query the demonstrator for samples at specific states, instead of relying only on samples provided at “arbitrary” states. The purpose of our algorithm is to estimate the reward function with similar accuracy as other methods from the literature while reducing the amount of policy samples required from the expert. We also discuss the use of our algorithm in higher dimensional problems, using both Monte Carlo and gradient methods. We present illustrative results of our algorithm in several simulated examples of different complexities.
Dynamic imitation in a humanoid robot through nonparametric probabilistic inference
 In Proceedings of Robotics: Science and Systems (RSS’06
, 2006
"... Abstract — We tackle the problem of learning imitative wholebody motions in a humanoid robot using probabilistic inference in Bayesian networks. Our inferencebased approach affords a straightforward method to exploit rich yet uncertain prior information obtained from human motion capture data. Dyna ..."
Abstract

Cited by 41 (5 self)
 Add to MetaCart
(Show Context)
Abstract — We tackle the problem of learning imitative wholebody motions in a humanoid robot using probabilistic inference in Bayesian networks. Our inferencebased approach affords a straightforward method to exploit rich yet uncertain prior information obtained from human motion capture data. Dynamic imitation implies that the robot must interact with its environment and account for forces such as gravity and inertia during imitation. Rather than explicitly modeling these forces and the body of the humanoid as in traditional approaches, we show that stable imitative motion can be achieved by learning a sensorbased representation of dynamic balance. Bayesian networks provide a sound theoretical framework for combining prior kinematic information (from observing a human demonstrator) with prior dynamic information (based on previous experience) to model and subsequently infer motions which, with high probability, will be dynamically stable. By posing the problem as one of inference in a Bayesian network, we show that methods developed for approximate inference can be leveraged to efficiently perform inference of actions. Additionally, by using nonparametric inference and a nonparametric (Gaussian process) forward model, our approach does not make any strong assumptions about the physical environment or the mass and inertial properties of the humanoid robot. We propose an iterative, probabilistically constrained algorithm for exploring the space of motor commands and show that the algorithm can quickly discover dynamically stable actions for wholebody imitation of human motion. Experimental results based on simulation and subsequent execution by a HOAP2 humanoid robot demonstrate that our algorithm is able to imitate a human performing actions such as squatting and a onelegged balance. I.
Reinforcement learning with limited reinforcement: Using bayes risk for active learning in pomdps. ISAIM (online proceedings
, 2008
"... Partially Observable Markov Decision Processes (POMDPs) have succeeded in planning domains that require balancing actions that increase an agent’s knowledge and actions that increase an agent’s reward. Unfortunately, most POMDPs are defined with a large number of parameters which are difficult to sp ..."
Abstract

Cited by 37 (8 self)
 Add to MetaCart
(Show Context)
Partially Observable Markov Decision Processes (POMDPs) have succeeded in planning domains that require balancing actions that increase an agent’s knowledge and actions that increase an agent’s reward. Unfortunately, most POMDPs are defined with a large number of parameters which are difficult to specify only from domain knowledge. In this paper, we present an approximation approach that allows us to treat the POMDP model parameters as additional hidden state in a “modeluncertainty ” POMDP. Coupled with modeldirected queries, our planner actively learns good policies. We demonstrate our approach on several POMDP problems. 1.
Relative Entropy Inverse Reinforcement Learning
"... We consider the problem of imitation learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient tool for generalizing the demonstration, based on the assumption that the expert is optimally acting ..."
Abstract

Cited by 27 (3 self)
 Add to MetaCart
(Show Context)
We consider the problem of imitation learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient tool for generalizing the demonstration, based on the assumption that the expert is optimally acting in a Markov Decision Process (MDP). Most of the past work on IRL requires that a (near)optimal policy can be computed for different reward functions. However, this requirement can hardly be satisfied in systems with a large, or continuous, state space. In this paper, we propose a modelfree IRL algorithm, where the relative entropy between the empirical distribution of the stateaction trajectories under a baseline policy and their distribution under the learned policy is minimized by stochastic gradient descent. We compare this new approach to wellknown IRL algorithms using learned MDP models. Empirical results on simulated car racing, gridworld and ballinacup problems show that our approach is able to learn good policies from a small number of demonstrations. 1
Apprenticeship learning for helicopter control
 Communications of the ACM
"... doi:10.1145/1538788.1538812 Autonomous helicopter flight is widely regarded to be a highly challenging control problem. As helicopters are highly unstable and exhibit complicated dynamical behavior, it is particularly difficult to design controllers that achieve high performance over a broad flight ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
doi:10.1145/1538788.1538812 Autonomous helicopter flight is widely regarded to be a highly challenging control problem. As helicopters are highly unstable and exhibit complicated dynamical behavior, it is particularly difficult to design controllers that achieve high performance over a broad flight regime. While these aircraft are notoriously difficult to control, there are expert human pilots who are nonetheless capable of demonstrating a wide variety of maneuvers, including aerobatic maneuvers at the edge of the helicopter’s performance envelope. In this paper, we present algorithms for modeling and control that leverage these demonstrations to build highperformance control systems for autonomous helicopters. More specifically, we detail our experiences with the Stanford Autonomous Helicopter, which is now capable of extreme aerobatic flight meeting or exceeding the performance of our own expert pilot. 1.
Learning to navigate through crowded environments
 In ICRA
, 2010
"... Abstract — The goal of this research is to enable mobile robots to navigate through crowded environments such as indoor shopping malls, airports, or downtown side walks. The key research question addressed in this paper is how to learn planners that generate humanlike motion behavior. Our approach ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
(Show Context)
Abstract — The goal of this research is to enable mobile robots to navigate through crowded environments such as indoor shopping malls, airports, or downtown side walks. The key research question addressed in this paper is how to learn planners that generate humanlike motion behavior. Our approach uses inverse reinforcement learning (IRL) to learn humanlike navigation behavior based on example paths. Since robots have only limited sensing, we extend existing IRL methods to the case of partially observable environments. We demonstrate the capabilities of our approach using a realistic crowd flow simulator in which we modeled multiple scenarios in crowded environments. We show that our planner learned to guide the robot along the flow of people when the environment is crowded, and along the shortest path if no people are around. I.