Results 1 - 10
of
16
Active Learning for Reward Estimation in Inverse Reinforcement Learning
, 2009
"... Inverse reinforcement learning addresses the general problem of recovering a reward function from samples of a policy provided by an expert/demonstrator. In this paper, we introduce active learning for inverse reinforcement learning. We propose an algorithm that allows the agent to query the demonst ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Inverse reinforcement learning addresses the general problem of recovering a reward function from samples of a policy provided by an expert/demonstrator. In this paper, we introduce active learning for inverse reinforcement learning. We propose an algorithm that allows the agent to query the demonstrator for samples at specific states, instead of relying only on samples provided at “arbitrary” states. The purpose of our algorithm is to estimate the reward function with similar accuracy as other methods from the literature while reducing the amount of policy samples required from the expert. We also discuss the use of our algorithm in higher dimensional problems, using both Monte Carlo and gradient methods. We present illustrative results of our algorithm in several simulated examples of different complexities.
Abstraction Levels for Robotic Imitation: Overview and Computational Approaches
, 2010
"... This chapter reviews several approaches to the problem of learning by imitation in robotics. We start by describing several cognitive processes identified in the literature as necessary for imitation. We then proceed by surveying different approaches to this problem, placing particular emphasys on m ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This chapter reviews several approaches to the problem of learning by imitation in robotics. We start by describing several cognitive processes identified in the literature as necessary for imitation. We then proceed by surveying different approaches to this problem, placing particular emphasys on methods whereby an agent first learns about its own body dynamics by means of self-exploration and then uses this knowledge about its own body to recognize the actions being performed by other agents. This general approach is related to the motor theory of perception, particularly to the mirror neurons found in primates. We distinguish three fundamental classes of methods, corresponding to three abstraction levels at which imitation can be addressed. As such, the methods surveyed herein exhibit behaviors that range from raw sensory-motor trajectory matching to high-level abstract task replication. We also discuss the impact that knowledge about the world and/or the demonstrator can have on the particular behaviors exhibited.
Transferring Impedance Control Strategies Between Heterogeneous Systems via Apprenticeship Learning
"... Abstract — We present a novel method for designing controllers for robots with variable impedance actuators. We take an imitation learning approach, whereby we learn impedance modulation strategies from observations of behaviour (for example, that of humans) and transfer these to a robotic plant wit ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Abstract — We present a novel method for designing controllers for robots with variable impedance actuators. We take an imitation learning approach, whereby we learn impedance modulation strategies from observations of behaviour (for example, that of humans) and transfer these to a robotic plant with very different actuators and dynamics. In contrast to previous approaches where impedance characteristics are directly imitated, our method uses task performance as the metric of imitation, ensuring that the learnt controllers are directly optimised for the hardware of the imitator. As a key ingredient, we use apprenticeship learning to model the optimisation criteria underlying observed behaviour, in order to frame a correspondent optimal control problem for the imitator. We then apply local optimal feedback control techniques to find an appropriate impedance modulation strategy under the imitator’s dynamics. We test our approach on systems of varying complexity, including a novel, antagonistic series elastic actuator and a biologically realistic two-joint, six-muscle model of the human arm. I.
Inverse Optimal Control with Linearly-Solvable MDPs
"... We present new algorithms for inverse optimal control (or inverse reinforcement learning, IRL) within the framework of linearlysolvable MDPs (LMDPs). Unlike most prior IRL algorithms which recover only the control policy of the expert, we recover the policy, the value function and the cost function. ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We present new algorithms for inverse optimal control (or inverse reinforcement learning, IRL) within the framework of linearlysolvable MDPs (LMDPs). Unlike most prior IRL algorithms which recover only the control policy of the expert, we recover the policy, the value function and the cost function. This is possible because here the cost and value functions are uniquely defined given the policy. Despite these special properties, we can handle a wide variety of problems such as the grid worlds popular in RL and most of the nonlinear problems arising in robotics and control engineering. Direct comparisons to
Bootstrapping Apprenticeship Learning
"... We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient tool for generalizing the demonstration, based on the assumption that the expert is maximizing ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient tool for generalizing the demonstration, based on the assumption that the expert is maximizing a utility function that is a linear combination of state-action features. Most IRL algorithms use a simple Monte Carlo estimation to approximate the expected feature counts under the expert’s policy. In this paper, we show that the quality of the learned policies is highly sensitive to the error in estimating the feature counts. To reduce this error, we introduce a novel approach for bootstrapping the demonstration by assuming that: (i), the expert is (near-)optimal, and (ii), the dynamics of the system is known. Empirical results on gridworlds and car racing problems show that our approach is able to learn good policies from a small number of demonstrations. 1
A Reduction from Apprenticeship Learning to Classification
"... We provide new theoretical results for apprenticeship learning, a variant of reinforcement learning in which the true reward function is unknown, and the goal is to perform well relative to an observed expert. We study a common approach to learning from expert demonstrations: using a classification ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We provide new theoretical results for apprenticeship learning, a variant of reinforcement learning in which the true reward function is unknown, and the goal is to perform well relative to an observed expert. We study a common approach to learning from expert demonstrations: using a classification algorithm to learn to imitate the expert’s behavior. Although this straightforward learning strategy is widely-used in practice, it has been subject to very little formal analysis. We prove that, if the learned classifier has error rate ǫ, the difference between the value of the apprentice’s policy and the expert’s policy is O ( √ ǫ). Further, we prove that this difference is onlyO(ǫ) when the expert’s policy is close to optimal. This latter result has an important practical consequence: Not only does imitating a near-optimal expert result in a better policy, but far fewer demonstrations are required to successfully imitate such an expert. This suggests an opportunity for substantial savings whenever the expert is known to be good, but demonstrations are expensive or difficult to obtain. 1
Relative Entropy Inverse Reinforcement Learning
"... We consider the problem of imitation learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient tool for generalizing the demonstration, based on the assumption that the expert is optimally acting ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We consider the problem of imitation learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient tool for generalizing the demonstration, based on the assumption that the expert is optimally acting in a Markov Decision Process (MDP). Most of the past work on IRL requires that a (near)optimal policy can be computed for different reward functions. However, this requirement can hardly be satisfied in systems with a large, or continuous, state space. In this paper, we propose a model-free IRL algorithm, where the relative entropy between the empirical distribution of the state-action trajectories under a baseline policy and their distribution under the learned policy is minimized by stochastic gradient descent. We compare this new approach to well-known IRL algorithms using learned MDP models. Empirical results on simulated car racing, gridworld and ball-in-a-cup problems show that our approach is able to learn good policies from a small number of demonstrations. 1
Apprenticeship Learning via Soft Local Homomorphisms
"... Abstract — We consider the problem of apprenticeship learning when the expert’s demonstration covers only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient solution to this problem based on the assumption that the expert is optimally acting in a Markov D ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract — We consider the problem of apprenticeship learning when the expert’s demonstration covers only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient solution to this problem based on the assumption that the expert is optimally acting in a Markov Decision Process (MDP). However, past work on IRL requires an accurate estimate of the frequency of encountering each feature of the states when the robot follows the expert’s policy. Given that the complete policy of the expert is unknown, the features frequencies can only be empirically estimated from the demonstrated trajectories. In this paper, we propose to use a transfer method, known as soft homomorphism, in order to generalize the expert’s policy to unvisited regions of the state space. The generalized policy can be used either as the robot’s final policy, or to calculate the features frequencies within an IRL algorithm. Empirical results show that our approach is able to learn good policies from a small number of demonstrations. I.
Learning from Demonstration Using MDP Induced Metrics
"... In this paper we address the problem of learning a policy from demonstration. Assuming that the policy to be learned is the optimal policy for an underlying MDP, we propose a novel way of leveraging the underlying MDP structure in a kernel-based approach. Our proposed approach rests on the insight t ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper we address the problem of learning a policy from demonstration. Assuming that the policy to be learned is the optimal policy for an underlying MDP, we propose a novel way of leveraging the underlying MDP structure in a kernel-based approach. Our proposed approach rests on the insight that the MDP structure can be encapsulated into an adequate state-space metric. In particular we show that, using MDP metrics, we are able to cast the problem of learning from demonstration as a classi cation problem and attain similar generalization performance as methods based on inverse reinforcement learning at a much lower online computational cost. Our method is also able to attain superior generalization than other supervised learning methods that fail to consider the MDP structure.
H U N I V E
"... Acquiring new skills is an ability which would be very useful for a robot to possess. One conceivable method whereby a robot could learn new skills is if a teacher with expert knowledge of those skills could demonstrate these skills to the learning robot, in a variety of different settings. What is ..."
Abstract
- Add to MetaCart
Acquiring new skills is an ability which would be very useful for a robot to possess. One conceivable method whereby a robot could learn new skills is if a teacher with expert knowledge of those skills could demonstrate these skills to the learning robot, in a variety of different settings. What is required would be that the robot does not merely learn to mimic the teacher exactly, as the teacher would be unable to demonstrate the skill in every possible state of the world. Instead the robot should be equipped with some robust method for adapting the learned skill to new situations. The research presented in this document embarks down a path towards solving this problem. Drawing on previous work from the fields of apprenticeship learning and reinforcement learning, an algorithm is presented for teaching an agent a new skill. This algorithm consists of two phases: the first in which the robot learns primitive skills from an expert, and the second which is a mechanism for composing these skills to provide strategies for successfully using the skill in unseen situations. This two phase solution is inspired by curriculum learning, and allows the agent to construct structured strategies in a hierarchical framework.

