Results 1  10
of
118
Reinforcement learning: a survey
 Journal of Artificial Intelligence Research
, 1996
"... This paper surveys the field of reinforcement learning from a computerscience perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem ..."
Abstract

Cited by 1680 (27 self)
 Add to MetaCart
(Show Context)
This paper surveys the field of reinforcement learning from a computerscience perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trialanderror interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word "reinforcement." The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.
Recent advances in hierarchical reinforcement learning
, 2003
"... A preliminary unedited version of this paper was incorrectly published as part of Volume ..."
Abstract

Cited by 224 (25 self)
 Add to MetaCart
(Show Context)
A preliminary unedited version of this paper was incorrectly published as part of Volume
Algorithms for Sequential Decision Making
, 1996
"... Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of ..."
Abstract

Cited by 213 (9 self)
 Add to MetaCart
(Show Context)
Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a longrun measure of reward, and "I" is an automated planning or learning system (agent). In particular,
MachineLearning Research  Four Current Directions
"... Machine Learning research has been making great progress in many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (a) improving classification accuracy by learning ensembles of classifiers, (b) methods for scaling up super ..."
Abstract

Cited by 140 (1 self)
 Add to MetaCart
Machine Learning research has been making great progress in many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (a) improving classification accuracy by learning ensembles of classifiers, (b) methods for scaling up supervised learning algorithms, (c) reinforcement learning, and (d) learning complex stochastic models.
Hierarchical Control and Learning for Markov Decision Processes
, 1998
"... This dissertation investigates the use of hierarchy and problem decomposition as a means of solving large, stochastic, sequential decision problems. These problems are framed as Markov decision problems (MDPs). The new technical content of this dissertation begins with a discussion of the concept o ..."
Abstract

Cited by 121 (2 self)
 Add to MetaCart
This dissertation investigates the use of hierarchy and problem decomposition as a means of solving large, stochastic, sequential decision problems. These problems are framed as Markov decision problems (MDPs). The new technical content of this dissertation begins with a discussion of the concept of temporal abstraction. Temporal abstraction is shown to be equivalent to the transformation of a policy defined over a region of an MDP to an action in a semiMarkov decision problem (SMDP). Several algorithms are presented for performing this transformation efficiently. This dissertation introduces the HAM method for generating hierarchical, temporally abstract actions. This method permits the partial specification of abstract actions in a way that corresponds to an abstract plan or strategy. Abstr...
SelfImproving Factory Simulation using Continuoustime AverageReward Reinforcement Learning
 Proceedings of the 14th International Conference on Machine Learning
, 1997
"... Many factory optimization problems, from inventory control to scheduling and reliability, can be formulated as continuoustime Markov decision processes. A primary goal in such problems is to find a gainoptimal policy that minimizes the longrun average cost. This paper describes a new averagerewar ..."
Abstract

Cited by 53 (11 self)
 Add to MetaCart
(Show Context)
Many factory optimization problems, from inventory control to scheduling and reliability, can be formulated as continuoustime Markov decision processes. A primary goal in such problems is to find a gainoptimal policy that minimizes the longrun average cost. This paper describes a new averagereward algorithm called SMART for finding gainoptimal policies in continuous time semiMarkov decision processes. The paper presents a detailed experimental study of SMART on a large unreliable production inventory problem. SMART outperforms two wellknown reliability heuristics from industrial engineering. A key feature of this study is the integration of the reinforcement learning algorithm directly into two commercial discreteevent simulation packages, ARENA and CSIM, paving the way for this approach to be applied to many other factory optimization problems for which there already exist simulation models. 1 Introduction Many problems in industrial design and manufacturing, such as schedulin...
A Generalized ReinforcementLearning Model: Convergence and Applications
 In Proceedings of the 13th International Conference on Machine Learning
, 1996
"... Reinforcement learning is the process by which an autonomous agent uses its experience interacting with an environment to improve its behavior. The Markov decision process (mdp) model is a popular way of formalizing the reinforcementlearning problem, but it is by no means the only way. In this pap ..."
Abstract

Cited by 49 (6 self)
 Add to MetaCart
Reinforcement learning is the process by which an autonomous agent uses its experience interacting with an environment to improve its behavior. The Markov decision process (mdp) model is a popular way of formalizing the reinforcementlearning problem, but it is by no means the only way. In this paper, we show how many of the important theoretical results concerning reinforcement learning in mdps extend to a generalized mdp model that includes mdps, twoplayer games and mdps under a worstcase optimality criterion as special cases. The basis of this extension is a stochasticapproximation theorem that reduces asynchronous convergence to synchronous convergence. 1 INTRODUCTION Reinforcement learning is the process by which an agent improves its behavior in an environment via experience. A reinforcementlearning scenario is defined by the experience presented to the agent at each step, and the criterion for evaluating the agent's behavior. One particularly wellstudied reinforcementle...
Learning Algorithms for Markov Decision Processes with Average Cost
 SIAM Journal on Control and Optimization
, 2001
"... Abstract. This paper gives the first rigorous convergence analysis of analogues of Watkins’s Qlearning algorithm, applied to average cost control of finitestate Markov chains. We discuss two algorithms which may be viewed as stochastic approximation counterparts of two existing algorithms for recu ..."
Abstract

Cited by 47 (9 self)
 Add to MetaCart
(Show Context)
Abstract. This paper gives the first rigorous convergence analysis of analogues of Watkins’s Qlearning algorithm, applied to average cost control of finitestate Markov chains. We discuss two algorithms which may be viewed as stochastic approximation counterparts of two existing algorithms for recursively computing the value function of the average cost problem—the traditional relative value iteration (RVI) algorithm and a recent algorithm of Bertsekas based on the stochastic shortest path (SSP) formulation of the problem. Both synchronous and asynchronous implementations are considered and analyzed using the ODE method. This involves establishing asymptotic stability of associated ODE limits. The SSP algorithm also uses ideas from twotimescale stochastic approximation. Key words. simulationbased algorithms, Qlearning, controlled Markov chains, average cost control, stochastic approximation, dynamic programming AMS subject classification. 93E20 PII. S0363012999361974 1. Introduction. Qlearning algorithms are simulationbased reinforcement learning algorithms for learning the value function arising in the dynamic programming approach to Markov decision processes. They were first introduced for the discounted
A unified analysis of valuefunctionbased reinforcementlearning algorithms. Neural Computation
, 1997
"... Reinforcement learning is the problem of generating optimal behavior in a sequential decisionma.king environment given the opportunity of interacting,vith it. Many algorithms for solving reinforcementlearning problems work by computing improved estimates of the optimal value function. \Ve extend p ..."
Abstract

Cited by 46 (8 self)
 Add to MetaCart
Reinforcement learning is the problem of generating optimal behavior in a sequential decisionma.king environment given the opportunity of interacting,vith it. Many algorithms for solving reinforcementlearning problems work by computing improved estimates of the optimal value function. \Ve extend prior analyses of reinforcementlearning algorithms and present a powerful new theorem that can provide a unified analysis of valuefunctionbased reinforcementlearning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcementlearning algorithm to be proven by verifying that a Himplcr HynchronouH algorithm convergeH. \Ve illuHtrate the application of the theorem by analyzing the convergence of Qlearningl modelbased reinforcement learning, Qlearning with multistate updates, Qlearning for:\farkov games, and risksensitive reinforcement learning. 1
Learning and Value Function Approximation in Complex Decision Processes
, 1998
"... In principle, a wide variety of sequential decision problems  ranging from dynamic resource allocation in telecommunication networks to financial risk management  can be formulated in terms of stochastic control and solved by the algorithms of dynamic programming. Such algorithms compute and sto ..."
Abstract

Cited by 42 (4 self)
 Add to MetaCart
In principle, a wide variety of sequential decision problems  ranging from dynamic resource allocation in telecommunication networks to financial risk management  can be formulated in terms of stochastic control and solved by the algorithms of dynamic programming. Such algorithms compute and store a value function, which evaluates expected future reward as a function of current state. Unfortunately, exact computation of the value function typically requires time and storage that grow proportionately with the number of states, and consequently, the enormous state spaces that arise in practical applications render the algorithms intractable. In this thesis, we study tractable methods that approximate the value function. Our work builds on research in an area of artificial intelligence known as reinforcement learning. A point of focus of this thesis is temporaldifference learning  a stochastic algorithm inspired to some extent by phenomena observed in animal behavior. Given a selection of...