Results 1  10
of
56
Reinforcement learning: a survey
 Journal of Artificial Intelligence Research
, 1996
"... This paper surveys the field of reinforcement learning from a computerscience perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem ..."
Abstract

Cited by 1693 (27 self)
 Add to MetaCart
(Show Context)
This paper surveys the field of reinforcement learning from a computerscience perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trialanderror interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word "reinforcement." The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.
Learning to Drive a Bicycle using Reinforcement Learning and Shaping
, 1998
"... We present and solve a realworld problem of learning to drive a bicycle. We solve the problem by online reinforcement learning using the Sarsa()algorithm. Then we solve the composite problem of learning to balance a bicycle and then drive to a goal. In our approach the reinforcement function is in ..."
Abstract

Cited by 93 (3 self)
 Add to MetaCart
We present and solve a realworld problem of learning to drive a bicycle. We solve the problem by online reinforcement learning using the Sarsa()algorithm. Then we solve the composite problem of learning to balance a bicycle and then drive to a goal. In our approach the reinforcement function is independent of the task the agent tries to learn to solve. 1 Introduction Here we consider the problem of learning to balance on a bicycle. Having done this we want to drive the bicycle to a goal. The second problem is not as straightforward as it may seem. The learning agent has to solve two problems at the same time: Balancing on the bicycle and driving to a specific place. Recently, ideas from behavioural psychology have been adapted by reinforcement learning to solve this type of problem. We will return to this in section 3. In reinforcement learning an agent interacts with an environment or a system. At each time step the agent receives information on the state of the system and chooses ...
Issues in Using Function Approximation for Reinforcement Learning
 IN PROCEEDINGS OF THE FOURTH CONNECTIONIST MODELS SUMMER SCHOOL
, 1993
"... ..."
Learning to Solve Markovian Decision Processes
, 1994
"... This dissertation is about building learning control architectures for agents embedded in finite, stationary, and Markovian environments. Such architectures give embedded agents the ability to improve autonomously the efficiency with which they can achieve goals. Machine learning researchers have d ..."
Abstract

Cited by 51 (3 self)
 Add to MetaCart
This dissertation is about building learning control architectures for agents embedded in finite, stationary, and Markovian environments. Such architectures give embedded agents the ability to improve autonomously the efficiency with which they can achieve goals. Machine learning researchers have developed reinforcement learning (RL) algorithms based on dynamic programming (DP) that use the agent's experience in its environment to improve its decision policy incrementally. This is achieved by adapting an evaluation function in such a way that the decision policy that is "greedy" with respect to it improves with experience. This dissertation focuses on finite, stationary and Markovian environments for two reasons: it allows the develop...
Autonomous Discovery Of Temporal Abstractions From Interaction With An Environment
, 2002
"... This dissertation is dedicated to my parents, Bill and Gaye, who have always loved and believed in me and to my husband, Andy, whose love and support made it possible. ACKNOWLEDGMENTS Andrew Barto has been a great thesis advisor. He has helped me to become a better researcher by shaping my critical ..."
Abstract

Cited by 50 (2 self)
 Add to MetaCart
(Show Context)
This dissertation is dedicated to my parents, Bill and Gaye, who have always loved and believed in me and to my husband, Andy, whose love and support made it possible. ACKNOWLEDGMENTS Andrew Barto has been a great thesis advisor. He has helped me to become a better researcher by shaping my critical thinking as well as by improving my expressive skills. I also benefited greatly from having Rich Sutton as my second advisor during my first two years at the University of Massachusetts. I would like to thank the members of my thesis committee, Eliot Moss, Rod Grupen, and Neil Berthier for their feedback. Doina Precup and Kiri Wagstaff have been wonderful friends and supporters of my research. It is very helpful to have such smart women friends in CS. They provided support when I needed it and they pushed me when I needed that. I feel privileged to know Doina both as a mentor and as a friend. I thank Kiri for helpful feedback on drafts of my dissertation as well as the motivation provided by exchanging and reviewing each other’s thesis
A Hybrid Architecture for Adaptive Robot Control
, 2000
"... The autonomous operation of robot systems in an uncertain environment poses many challenges to their control architecture. Such systems must be reactive with respect to local disturbances and uncertainties and have to adapt to more persistent changes in environmental conditions and task requirements ..."
Abstract

Cited by 38 (0 self)
 Add to MetaCart
The autonomous operation of robot systems in an uncertain environment poses many challenges to their control architecture. Such systems must be reactive with respect to local disturbances and uncertainties and have to adapt to more persistent changes in environmental conditions and task requirements. In autonomous systems, this adaptation must often occur without outside intervention and within a single trial while avoiding catastrophic failure. This dissertation
A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions
, 1999
"... . This paper proposes a study of Reinforcement Learning (RL) for continuous statespace and time control problems, based on the theoretical framework of viscosity solutions (VSs). We use the method of dynamic programming (DP) which introduces the value function (VF), expectation of the best future cu ..."
Abstract

Cited by 29 (4 self)
 Add to MetaCart
(Show Context)
. This paper proposes a study of Reinforcement Learning (RL) for continuous statespace and time control problems, based on the theoretical framework of viscosity solutions (VSs). We use the method of dynamic programming (DP) which introduces the value function (VF), expectation of the best future cumulativereinforcement. In the continuous case, the value function satisfies a nonlinear first (or second) order (depending on the deterministic or stochastic aspect of the process) differential equation called the HamiltonJacobiBellman (HJB) equation. It is well known that there exists an infinity of generalized solutions (differentiable almost everywhere) to this equation, other than the VF. We show that gradientdescent methods may converge to one of these generalized solutions, thus failing to find the optimal control. In order to solve the HJB equation, we use the powerful framework of viscosity solutions and state that there exists a unique viscosity solution to the HJB equation, whi...
Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models
 IN PROCEEDINGS OF THE NINTH INTERNATIONAL MACHINE LEARNING CONFERENCE
, 1992
"... The close connection between reinforcement learning (RL) algorithms and dynamic programming algorithms has fueled research on RL within the machine learning community. Yet, despite increased theoretical understanding, RL algorithms remain applicable to simple tasks only. In this paper I use t ..."
Abstract

Cited by 28 (2 self)
 Add to MetaCart
The close connection between reinforcement learning (RL) algorithms and dynamic programming algorithms has fueled research on RL within the machine learning community. Yet, despite increased theoretical understanding, RL algorithms remain applicable to simple tasks only. In this paper I use the abstract framework afforded by the connection to dynamic programming to discuss the scaling issues faced by RL researchers. I focus on learning agents that have to learn to solve multiple structured RL tasks in the same environment. I propose learning abstract environment models where the abstract actions represent "intentions" of achieving a particular state. Such models are variable temporal resolution models because in different parts of the state space the abstract actions span different number of time steps. The operational definitions of abstract actions can be learned incrementally using repeated experience at solving RL tasks. I prove that under certain conditions s...
Incremental Dynamic Programming for OnLine Adaptive Optimal Control
, 1994
"... Reinforcement learning algorithms based on the principles of Dynamic Programming (DP) have enjoyed a great deal of recent attention both empirically and theoretically. These algorithms have been referred to generically as Incremental Dynamic Programming (IDP) algorithms. IDP algorithms are intended ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
Reinforcement learning algorithms based on the principles of Dynamic Programming (DP) have enjoyed a great deal of recent attention both empirically and theoretically. These algorithms have been referred to generically as Incremental Dynamic Programming (IDP) algorithms. IDP algorithms are intended for use in situations where the information or computational resources needed by traditional dynamic programming algorithms are not available. IDP algorithms attempt to find a global solution to a DP problem by incrementally improving local constraint satisfaction properties as experience is gained through interaction with the environment. This class of algorithms is not new, going back at least as far as Samuel's adaptive checkersplaying programs,...
LargeScale Dynamic Optimization Using Teams of Reinforcement Learning Agents
, 1996
"... Recent algorithmic and theoretical advances in reinforcement learning (RL) are attracting widespread interest. RL algorithms have appeared that approximate dynamic programming (DP) on an incremental basis. Unlike traditional DP algorithms, these algorithms do not require knowledge of the state trans ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
Recent algorithmic and theoretical advances in reinforcement learning (RL) are attracting widespread interest. RL algorithms have appeared that approximate dynamic programming (DP) on an incremental basis. Unlike traditional DP algorithms, these algorithms do not require knowledge of the state transition probabilities or reward structure of a system. This allows them to be trained using real or simulated experiences, focusing their computations on the areas of state space that are actually visited during control, making them computationally tractable on very large problems. RL algorithms can be used as components of multiagent algorithms. If each member of a team of agents employs one of these algorithms, a new collective learning algor...