Results 1 
7 of
7
Neuroevolutionary reinforcement learning for generalized helicopter control
, 2009
"... Helicopter hovering is an important challenge problem in the field of reinforcement learning. This paper considers several neuroevolutionary approaches to discovering robust controllers for a generalized version of the problem used in the 2008 Reinforcement Learning Competition, in which wind in the ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Helicopter hovering is an important challenge problem in the field of reinforcement learning. This paper considers several neuroevolutionary approaches to discovering robust controllers for a generalized version of the problem used in the 2008 Reinforcement Learning Competition, in which wind in the helicopter’s environment varies from run to run. We present the simple modelfree strategy that won first place in the competition and also describe several more complex modelbased approaches. Our empirical results demonstrate that neuroevolution is effective at optimizing the weights of multilayer perceptrons, that linear regression is faster and more effective than evolution for learning models, and that modelbased approaches can outperform the simple modelfree strategy, especially if prior knowledge is used to aid model learning.
Characterizing Reinforcement Learning Methods through Parameterized Learning Problems
, 2011
"... The field of reinforcement learning (RL) has been energized in the past few decades by elegant theoretical results indicating under what conditions, and how quickly, certain algorithms are guaranteed to converge to optimal policies. However, in practical problems, these conditions are seldom met. ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
The field of reinforcement learning (RL) has been energized in the past few decades by elegant theoretical results indicating under what conditions, and how quickly, certain algorithms are guaranteed to converge to optimal policies. However, in practical problems, these conditions are seldom met. When we cannot achieve optimality, the performance of RL algorithms must be measured empirically. Consequently, in order to meaningfully differentiate learning methods, it becomes necessary to characterize their performance on different problems, taking into account factors such as state estimation, exploration, function approximation, and constraints on computation and memory. To this end, we propose parameterized learning problems, in which such factors can be controlled systematically and their effects on learning methods characterized through targeted studies. Apart from providing very precise control of the parameters that affect learning, our parameterized learning problems enable benchmarking against optimal behavior; their relatively small sizes facilitate extensive experimentation. Based on a survey of existing RL applications, in this article, we focus our attention on two predominant, “first order ” factors: partial observability and function approximation. We design
A COMPARATIVE STUDY OF DISCRETIZATION APPROACHES FOR STATE SPACE GENERALIZATION IN THE KEEPAWAY SOCCER TASK
"... There are two main branches of reinforcement learning: methods that search directly in the space of value functions that asses the utility of the behaviors (Temporal Difference Methods); and methods that search directly in the space of behaviors (Policy Search Methods). When applying Temporal Differ ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
There are two main branches of reinforcement learning: methods that search directly in the space of value functions that asses the utility of the behaviors (Temporal Difference Methods); and methods that search directly in the space of behaviors (Policy Search Methods). When applying Temporal Difference (TD) methods in domains with very large or continuous state spaces, the experience obtained by the learning agent in the interaction with the environment must be generalized. The generalization can be carried out in two different ways. On the one hand by discretizing the environment to use a tabular representation of the value functions (e.g. Vector Quantization QLearning algorithm). On the other hand, by using an approximation of the value functions based on a supervised learning method (e.g. CMAC QLearning algorithm). Other algorithms use both approaches to benefit from both mechanisms, allowing a higher performance. This is the case of the Two Step Reinforcement Learning algorithm. In the case of Policy Search Methods, the Evolutionary Reinforcement Learning algorithm has shown promising in RL tasks. All these algorithms present different ways to tackle the problem of large or continuous state spaces. In this chapter, we organize and discuss different generalization techniques to solve this problem. Finally, we demonstrate the usefulness of the different algorithms described to improve the learning process in the Keepaway domain.
April: Active preferencelearning based reinforcement learning
 In Proceedings ECMLPKDD 2012, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
, 2012
"... Abstract. This paper focuses on reinforcement learning (RL) with limited prior knowledge. In the domain of swarm robotics for instance, the expert can hardly design a reward function or demonstrate the target behavior, forbidding the use of both standard RL and inverse reinforcement learning. Althou ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. This paper focuses on reinforcement learning (RL) with limited prior knowledge. In the domain of swarm robotics for instance, the expert can hardly design a reward function or demonstrate the target behavior, forbidding the use of both standard RL and inverse reinforcement learning. Although with a limited expertise, the human expert is still often able to emit preferences and rank the agent demonstrations. Earlier work has presented an iterative preferencebased RL framework: expert preferences are exploited to learn an approximate policy return, thus enabling the agent to achieve direct policy search. Iteratively, the agent selects a new candidate policy and demonstrates it; the expert ranks the new demonstration comparatively to the previous best one; the expert’s ranking feedback enables the agent to refine the approximate policy return, and the process is iterated. In this paper, preferencebased reinforcement learning is combined with active ranking in order to decrease the number of ranking queries to the expert needed to yield a satisfactory policy. Experiments on the mountain car and the cancer treatment testbeds witness that a couple of dozen rankings enable to learn a competent policy.
On Learning with Imperfect Representations
"... Abstract—In this paper we present a perspective on the relationship between learning and representation in sequential decision making tasks. We undertake a brief survey of existing realworld applications, which demonstrates that the classical “tabular ” representation seldom applies in practice. Sp ..."
Abstract
 Add to MetaCart
Abstract—In this paper we present a perspective on the relationship between learning and representation in sequential decision making tasks. We undertake a brief survey of existing realworld applications, which demonstrates that the classical “tabular ” representation seldom applies in practice. Specifically, several practical tasks suffer from state aliasing, and most demand some form of generalization and function approximation. Coping with these representational aspects thus becomes an important direction for furthering the advent of reinforcement learning in practice. The central thesis we present in this position paper is that in practice, learning methods specifically developed to work with imperfect representations are likely to perform better than those developed for perfect representations and then applied in imperfectrepresentation settings. We specify an evaluation criterion for learning methods in practice, and propose a framework for their synthesis. In particular, we highlight the degrees of “representational bias ” prevalent in different learning methods. We reference a variety of relevant literature as a background for this introspective essay. I.
A COMPARATIVE STUDY OF DISCRETIZATION APPROACHES 1 Chapter 1 A COMPARATIVE STUDY OF DISCRETIZATION APPROACHES FOR STATE SPACE GENERALIZATION IN THE KEEPAWAY SOCCER TASK
"... There are two main branches of reinforcement learning: methods that search directly in the space of value functions that asses the utility of the behaviors (Temporal Difference Methods); and methods that search directly in the space of behaviors (Policy Search Methods). When applying Temporal Diff ..."
Abstract
 Add to MetaCart
There are two main branches of reinforcement learning: methods that search directly in the space of value functions that asses the utility of the behaviors (Temporal Difference Methods); and methods that search directly in the space of behaviors (Policy Search Methods). When applying Temporal Difference (TD) methods in domains with very large or continuous state spaces, the experience obtained by the learning agent in the interaction with the environment must be generalized. The generalization can be carried out in two different ways. On the one hand by discretizing the environment to use a tabular representation of the value functions (e.g. Vector Quantization QLearning algorithm). On the other hand, by using an approximation of the value functions based on a supervised learning method (e.g. CMAC QLearning algorithm). Other algorithms use both approaches to benefit from both mechanisms, allowing a higher performance. This is the case of the Two Step Reinforcement Learning algorithm. In the case of Policy Search Methods, the Evolutionary Reinforcement Learning algorithm has shown promising in RL tasks. All these algorithms present different ways to tackle the problem of large or continuous state spaces. In this chapter, we organize and discuss different generalization techniques to solve this problem. Finally, we demonstrate the usefulness of the different algorithms described to improve the learning process in the Keepaway domain. A COMPARATIVE STUDY OF DISCRETIZATION APPROACHES 3 1.