Results 1 - 10
of
143
Autonomous Helicopter Control using Reinforcement Learning Policy Search Methods
- In International Conference on Robotics and Automation
, 2001
"... Many control problems in the robotics field can be cast as Partially Observed Markovian Decision Problems (POMDPs), an optimal control formalism. Finding optimal solutions to such problems in general, however is known to be intractable. It has often been observed that in practice, simple structured ..."
Abstract
-
Cited by 71 (0 self)
- Add to MetaCart
Many control problems in the robotics field can be cast as Partially Observed Markovian Decision Problems (POMDPs), an optimal control formalism. Finding optimal solutions to such problems in general, however is known to be intractable. It has often been observed that in practice, simple structured controllers suffice for good sub-optimal control, and recent research in the artificial intelligence community has focused on policy search methods as techniques for finding sub-optimal controllers when such structured controllers do exist. Traditional model-based reinforcement learning algorithms make a certainty equivalence assumption on their learned models and calculate optimal policies for a maximumlikelihood Markovian model. In this work, we consider algorithms that evaluate and synthesize controllers under distributions of Markovian models. Previous work has demonstrated that algorithms that maximize mean reward with respect to model uncertainty leads to safer and more robust controll...
An application of reinforcement learning to aerobatic helicopter flight
- In Advances in Neural Information Processing Systems 19
, 2007
"... Autonomous helicopter flight is widely regarded to be a highly challenging control problem. This paper presents the first successful autonomous completion on a real RC helicopter of the following four aerobatic maneuvers: forward flip and sideways roll at low speed, tail-in funnel, and nose-in funne ..."
Abstract
-
Cited by 54 (8 self)
- Add to MetaCart
Autonomous helicopter flight is widely regarded to be a highly challenging control problem. This paper presents the first successful autonomous completion on a real RC helicopter of the following four aerobatic maneuvers: forward flip and sideways roll at low speed, tail-in funnel, and nose-in funnel. Our experimental results significantly extend the state of the art in autonomous helicopter flight. We used the following approach: First we had a pilot fly the helicopter to help us find a helicopter dynamics model and a reward (cost) function. Then we used a reinforcement learning (optimal control) algorithm to find a controller that is optimized for the resulting model and reward function. More specifically, we used differential dynamic programming (DDP), an extension of the linear quadratic regulator (LQR). 1
Exploration and apprenticeship learning in reinforcement learning
- in Proc. 21st International Conference on Machine Learning
, 2005
"... We consider reinforcement learning in systems with unknown dynamics. Algorithms such as E 3 (Kearns and Singh, 2002) learn near-optimal policies by using “exploration policies ” to drive the system towards poorly modeled states, so as to encourage exploration. But this makes these algorithms impract ..."
Abstract
-
Cited by 52 (2 self)
- Add to MetaCart
We consider reinforcement learning in systems with unknown dynamics. Algorithms such as E 3 (Kearns and Singh, 2002) learn near-optimal policies by using “exploration policies ” to drive the system towards poorly modeled states, so as to encourage exploration. But this makes these algorithms impractical for many systems; for example, on an autonomous helicopter, overly aggressive exploration may well result in a crash. In this paper, we consider the apprenticeship learning setting in which a teacher demonstration of the task is available. We show that, given the initial demonstration, no explicit exploration is necessary, and we can attain near-optimal performance (compared to the teacher) simply by repeatedly executing “exploitation policies ” that try to maximize rewards. In finite-state MDPs, our algorithm scales polynomially in the number of states; in continuous-state linear dynamical systems, it scales polynomially in the dimension of the state. These results are proved using a martingale construction over relative losses. 1.
Using relative novelty to identify useful temporal abstractions in reinforcement learning
- In Proceedings of the Twenty-First International Conference on Machine Learning
, 2004
"... We present a new method for automatically creating useful temporal abstractions in reinforcement learning. We argue that states that allow the agent to transition to a different region of the state space are useful subgoals, and propose a method for identifying them using the concept of relative nov ..."
Abstract
-
Cited by 51 (11 self)
- Add to MetaCart
We present a new method for automatically creating useful temporal abstractions in reinforcement learning. We argue that states that allow the agent to transition to a different region of the state space are useful subgoals, and propose a method for identifying them using the concept of relative novelty. When such a state is identified, a temporallyextended activity (e.g., an option) is generated that takes the agent efficiently to this state. We illustrate the utility of the method in a number of tasks. 1.
If multi-agent learning is the answer, what is the question?
- ARTIFICIAL INTELLIGENCE
, 2007
"... The area of learning in multi-agent systems is today one of the most fertile grounds for interaction between game theory and artificial intelligence. We focus on the foundational questions in this interdisciplinary area, and identify several distinct agendas that ought to, we argue, be separated. Th ..."
Abstract
-
Cited by 43 (0 self)
- Add to MetaCart
The area of learning in multi-agent systems is today one of the most fertile grounds for interaction between game theory and artificial intelligence. We focus on the foundational questions in this interdisciplinary area, and identify several distinct agendas that ought to, we argue, be separated. The goal of this article is to start a discussion in the research community that will result in firmer foundations for the area.
A theoretical analysis of model-based interval estimation
- Proceedings of the Twenty-second International Conference on Machine Learning (ICML-05
, 2005
"... Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Interval Estimation (MBIE) learns efficiently in practice, effectively balancing exploration and exploitation. This paper pres ..."
Abstract
-
Cited by 40 (6 self)
- Add to MetaCart
Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Interval Estimation (MBIE) learns efficiently in practice, effectively balancing exploration and exploitation. This paper presents the first theoretical analysis of MBIE, proving its efficiency even under worst-case conditions. The paper also introduces a new performance metric, average loss, and relates it to its less “online ” cousins from the literature. 1.
Efficient learning equilibrium
- In Proceedings of NIPS
, 2002
"... We introduce ecient learning equilibrium (ELE), a normative approach to learning in non-cooperative settings. In ELE, the learning algorithms themselves are required to be in equilibrium. In addition, the learning algorithms must arrive at a desired value after polynomial time, and a deviation from ..."
Abstract
-
Cited by 36 (6 self)
- Add to MetaCart
We introduce ecient learning equilibrium (ELE), a normative approach to learning in non-cooperative settings. In ELE, the learning algorithms themselves are required to be in equilibrium. In addition, the learning algorithms must arrive at a desired value after polynomial time, and a deviation from the prescribed ELE become irrational after polynomial time. We prove the existence of an ELE (where the desired value is the expected payoff in a Nash equilibrium) and of a Pareto-ELE (where the objective is the maximization of social surplus) in repeated games with perfect monitoring. We also show that an ELE does not always exist in the imperfect monitoring case. Finally, we discuss the extension of these results to general-sum stochastic games.
Accelerating Reinforcement Learning through Implicit Imitation
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2003
"... Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments ..."
Abstract
-
Cited by 36 (0 self)
- Add to MetaCart
Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments
Efficient structure learning in factored-state MDPs
- Proceedings of the Twenty-Second National Conference on Artificial Intelligence (AAAI-07
, 2007
"... We consider the problem of reinforcement learning in factored-state MDPs in the setting in which learning is conducted in one long trial with no resets allowed. We show how to extend existing efficient algorithms that learn the conditional probability tables of dynamic Bayesian networks (DBNs) given ..."
Abstract
-
Cited by 30 (8 self)
- Add to MetaCart
We consider the problem of reinforcement learning in factored-state MDPs in the setting in which learning is conducted in one long trial with no resets allowed. We show how to extend existing efficient algorithms that learn the conditional probability tables of dynamic Bayesian networks (DBNs) given their structure to the case in which DBN structure is not known in advance. Our method learns the DBN structures as part of the reinforcement-learning process and provably provides an efficient learning algorithm when combined with factored Rmax.
Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes
- Electronic Transactions in Artificial Intelligence
, 2001
"... The paper is addressed to AI workers with an interest in biomolecular genetics and also to biomolecular geneticists interested in what AI tools may do for them. The authors are engaged in a collaborative enterprise aimed at partially automating some aspects of scientic work. These aspects include t ..."
Abstract
-
Cited by 28 (7 self)
- Add to MetaCart
The paper is addressed to AI workers with an interest in biomolecular genetics and also to biomolecular geneticists interested in what AI tools may do for them. The authors are engaged in a collaborative enterprise aimed at partially automating some aspects of scientic work. These aspects include the processes of forming hypotheses, devising trials to discriminate between these competing hypotheses, physically performing these trials and then using the results of these trials to converge upon an accurate hypothesis. As a potential component of the reasoning carried out by an \articial scientist" this paper describes ASEProgol, an Active Learning system which uses Inductive Logic Programming to construct hypothesised rst-order theories and uses a CART-like algorithm to select trials for eliminating ILP derived hypotheses. In simulated yeast growth tests ASE-Progol was used to rediscover how genes participate in the aromatic amino acid pathway of Saccharomyces cerevisiae. The cost of the chemicals consumed in converging upon a hypothesis with an accuracy of around 88% was reduced by ve orders of magnitude when trials were selected by ASE-Progol rather than being sampled at random. While the naive strategy of always choosing the cheapest trial from the set of candidate trials led to lower cumulative costs than ASE-Progol, both the naive strategy and the random strategy took signicantly longer to converge upon a nal hypothesis than ASE-Progol. For example to reach an accuracy of 80%, ASE-Progol required 4 days while random sampling required 6 days and the naive strategy required 10 days. 1 1

