Results 1  10
of
203
Autonomous Helicopter Control using Reinforcement Learning Policy Search Methods
 In International Conference on Robotics and Automation
, 2001
"... Many control problems in the robotics field can be cast as Partially Observed Markovian Decision Problems (POMDPs), an optimal control formalism. Finding optimal solutions to such problems in general, however is known to be intractable. It has often been observed that in practice, simple structured ..."
Abstract

Cited by 89 (1 self)
 Add to MetaCart
Many control problems in the robotics field can be cast as Partially Observed Markovian Decision Problems (POMDPs), an optimal control formalism. Finding optimal solutions to such problems in general, however is known to be intractable. It has often been observed that in practice, simple structured controllers suffice for good suboptimal control, and recent research in the artificial intelligence community has focused on policy search methods as techniques for finding suboptimal controllers when such structured controllers do exist. Traditional modelbased reinforcement learning algorithms make a certainty equivalence assumption on their learned models and calculate optimal policies for a maximumlikelihood Markovian model. In this work, we consider algorithms that evaluate and synthesize controllers under distributions of Markovian models. Previous work has demonstrated that algorithms that maximize mean reward with respect to model uncertainty leads to safer and more robust controll...
An application of reinforcement learning to aerobatic helicopter flight
 In Advances in Neural Information Processing Systems 19
, 2007
"... Autonomous helicopter flight is widely regarded to be a highly challenging control problem. This paper presents the first successful autonomous completion on a real RC helicopter of the following four aerobatic maneuvers: forward flip and sideways roll at low speed, tailin funnel, and nosein funne ..."
Abstract

Cited by 73 (8 self)
 Add to MetaCart
Autonomous helicopter flight is widely regarded to be a highly challenging control problem. This paper presents the first successful autonomous completion on a real RC helicopter of the following four aerobatic maneuvers: forward flip and sideways roll at low speed, tailin funnel, and nosein funnel. Our experimental results significantly extend the state of the art in autonomous helicopter flight. We used the following approach: First we had a pilot fly the helicopter to help us find a helicopter dynamics model and a reward (cost) function. Then we used a reinforcement learning (optimal control) algorithm to find a controller that is optimized for the resulting model and reward function. More specifically, we used differential dynamic programming (DDP), an extension of the linear quadratic regulator (LQR). 1
Using relative novelty to identify useful temporal abstractions in reinforcement learning
 In Proceedings of the TwentyFirst International Conference on Machine Learning
, 2004
"... We present a new method for automatically creating useful temporal abstractions in reinforcement learning. We argue that states that allow the agent to transition to a different region of the state space are useful subgoals, and propose a method for identifying them using the concept of relative nov ..."
Abstract

Cited by 68 (12 self)
 Add to MetaCart
We present a new method for automatically creating useful temporal abstractions in reinforcement learning. We argue that states that allow the agent to transition to a different region of the state space are useful subgoals, and propose a method for identifying them using the concept of relative novelty. When such a state is identified, a temporallyextended activity (e.g., an option) is generated that takes the agent efficiently to this state. We illustrate the utility of the method in a number of tasks. 1.
Exploration and apprenticeship learning in reinforcement learning
 in Proc. 21st International Conference on Machine Learning
, 2005
"... We consider reinforcement learning in systems with unknown dynamics. Algorithms such as E 3 (Kearns and Singh, 2002) learn nearoptimal policies by using “exploration policies ” to drive the system towards poorly modeled states, so as to encourage exploration. But this makes these algorithms impract ..."
Abstract

Cited by 67 (2 self)
 Add to MetaCart
We consider reinforcement learning in systems with unknown dynamics. Algorithms such as E 3 (Kearns and Singh, 2002) learn nearoptimal policies by using “exploration policies ” to drive the system towards poorly modeled states, so as to encourage exploration. But this makes these algorithms impractical for many systems; for example, on an autonomous helicopter, overly aggressive exploration may well result in a crash. In this paper, we consider the apprenticeship learning setting in which a teacher demonstration of the task is available. We show that, given the initial demonstration, no explicit exploration is necessary, and we can attain nearoptimal performance (compared to the teacher) simply by repeatedly executing “exploitation policies ” that try to maximize rewards. In finitestate MDPs, our algorithm scales polynomially in the number of states; in continuousstate linear dynamical systems, it scales polynomially in the dimension of the state. These results are proved using a martingale construction over relative losses. 1.
A theoretical analysis of modelbased interval estimation
 Proceedings of the Twentysecond International Conference on Machine Learning (ICML05
, 2005
"... Several algorithms for learning nearoptimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Modelbased Interval Estimation (MBIE) learns efficiently in practice, effectively balancing exploration and exploitation. This paper pres ..."
Abstract

Cited by 62 (9 self)
 Add to MetaCart
Several algorithms for learning nearoptimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Modelbased Interval Estimation (MBIE) learns efficiently in practice, effectively balancing exploration and exploitation. This paper presents the first theoretical analysis of MBIE, proving its efficiency even under worstcase conditions. The paper also introduces a new performance metric, average loss, and relates it to its less “online ” cousins from the literature. 1.
Accelerating Reinforcement Learning through Implicit Imitation
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2003
"... Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments ..."
Abstract

Cited by 51 (0 self)
 Add to MetaCart
Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments
Transfer Learning for Reinforcement Learning Domains: A Survey
"... The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of ..."
Abstract

Cited by 48 (7 self)
 Add to MetaCart
The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of transfer learning has only recently been applied to reinforcement learning tasks. The core idea of transfer is that experience gained in learning to perform one task can help improve learning performance in a related, but different, task. In this article we present a framework that classifies transfer learning methods in terms of their capabilities and goals, and then use it to survey the existing literature, as well as to suggest future directions for transfer learning work.
Efficient structure learning in factoredstate MDPs
, 2007
"... We consider the problem of reinforcement learning in factoredstate MDPs in the setting in which learning is conducted in one long trial with no resets allowed. We show how to extend existing efficient algorithms that learn the conditional probability tables of dynamic Bayesian networks (DBNs) given ..."
Abstract

Cited by 47 (10 self)
 Add to MetaCart
We consider the problem of reinforcement learning in factoredstate MDPs in the setting in which learning is conducted in one long trial with no resets allowed. We show how to extend existing efficient algorithms that learn the conditional probability tables of dynamic Bayesian networks (DBNs) given their structure to the case in which DBN structure is not known in advance. Our method learns the DBN structures as part of the reinforcementlearning process and provably provides an efficient learning algorithm when combined with factored Rmax.