• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Reinforcement learning: a survey (1996)

Cached

  • Download as a PDF

Download Links

  • [www.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [opim.wharton.upenn.edu]
  • [grace.wharton.upenn.edu]
  • [www.mil.ufl.edu]
  • [iridia.ulb.ac.be]
  • [www.ri.cmu.edu]
  • [pecan.srv.cs.cmu.edu]
  • [www.cs.duke.edu]
  • [www.cs.cmu.edu]
  • [www.cse.wustl.edu]
  • [www.nbu.bg]
  • [www.eecs.harvard.edu]
  • [www.eecs.harvard.edu]
  • [csl.anu.edu.au]
  • [www-2.cs.cmu.edu]
  • [www.ri.cmu.edu]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Leslie Pack Kaelbling , Michael L. Littman , Andrew W. Moore
Venue:Journal of Artificial Intelligence Research
Citations:1134 - 21 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@ARTICLE{Kaelbling96reinforcementlearning:,
    author = {Leslie Pack Kaelbling and Michael L. Littman and Andrew W. Moore},
    title = {Reinforcement learning: a survey},
    journal = {Journal of Artificial Intelligence Research},
    year = {1996},
    volume = {4},
    pages = {237--285}
}

Years of Citing Articles

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word "reinforcement." The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.

Citations

1964 Dynamic Programming - Bellman - 1957
1515 A Theory of the Learnable - Valiant - 1984
1433 Genetic algorithms in search, optimization, and machine learning - Goldberg - 1989
1232 Theory of Linear and Integer Programming - SCHRIJVER - 1986
1135 Learning from delayed rewards - Watkins - 1989
1060 Learning to predict by the methods of temporal differences - Sutton - 1988
545 Some Studies in Machine Learning using the Game of Checkers - Samuel - 2000
477 Tsitsiklis. Parallel and Distributed Computation: Numerical Methods - Bertsekas, N - 1989
472 Learning to act using real-time dynamic programming - Barto, Bradtke, et al. - 1995
434 Dynamic programming and Markov processes - Howard - 1960
427 Integrated architectures for learning, planning, and reacting based on approximating integrated architectures for learning, planning, and reacting based on approximating dynamic programming - Sutton - 1990
423 Neuronlike adaptive elements that can solve dicult learning problems - Barto, Sutton, et al. - 1983
417 Markov games as a framework for multi-agent reinforcement learning - Littman - 1994
377 Parallel distributed processing: Explorations in the microstructure of cognition - Rumelhart, McClelland - 1986
342 Dynamic Programming and Optimal Control, Athena Scientific - Bertsekas - 2000
334 Practical issues in temporal difference learning - Tesauro - 1992
331 Dynamic Programming: Deterministic and Stochastic Models - Bertsekas - 1987
330 Temporal Difference Learning and TDGammon - Tesauro - 1995
316 Automatic programming of behavior-based robots using reinforcement learning - Mahadevan, Connell - 1992
300 Generalization in reinforcement learning: Successful examples using sparse coarse coding - Sutton - 1996
295 Empirical Model-Building and Response Surfaces - Box, Draper - 1987
280 Learning in embedded systems - Kaelbling - 1993
275 Prioritized sweeping: Reinforcement learning with less data and less time - Moore, Atkeson - 1993
262 Simple statistical gradient-following algorithms for connectionist reinforcement learning - Williams - 1992
250 Improving elevator performance using reinforcement learning - Crites, Barto - 1996
243 Acting optimally in partially observable stochastic domains - Cassandra, Kaelbling, et al. - 1994
243 Reinforcement Learning with Selective Perception and Hidden State - McCallum - 1996
239 Classifier fitness based on accuracy - Wilson - 1995
232 A New Approach to Manipulator control: the Cerebellar Model Articulation Controller - Albus - 1975
228 Neural Network Perception for Mobile Robot Guidance - Pomerleau - 1992
224 On-line Q-learning using connectionist systems - Rummery, Niranjan - 1994
223 Generalization in reinforcement learning: safely approximating the value function - Boyan, Moore - 1995
222 Motivated reinforcement learning - Dayan - 2002
219 Temporal Credit Assignment in Reinforcement Learning - Sutton - 1984
207 Residual Algorithms: Reinforcement Learning with Function Approximation - Baird - 1995
203 The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces - Moore, Atkeson - 1995
202 Learning policies for partially observable environments: Scaling up - Littman, Cassandra, et al.
190 Td-gammon, a self-teaching backgammon program, achieves master-level play - Tesauro - 1994
190 Learning to coordinate behaviors - Maes, Brooks - 1990
186 On the convergence of stochastic iterative dynamic programming algorithms - Jaakkola, Jordan, et al. - 1994
179 Locally weighted regression – an approach to regression-analysis by local fitting - Cleveland, Devlin - 1988
178 Reinforcement Learning for Robots Using Neural Networks - Lin - 1993
173 Reinforcement learning with perceptual aliasing: The perceptual distinctions approach - Chrisman - 1992
172 Stable function approximation in dynamic programming - Gordon - 1995
168 Bandit problems: Sequential allocation of experiments - Berry, Fristedt - 1985
166 Reinforcement learning with replacing eligibility traces - Singh, Sutton - 1996
165 Survey of partially observable markov decision processes: Theory, models, and algorithms - Monahan - 1982
154 Adaptation in Natural and Arti cial Systems - Holland - 1975
151 A survey of algorithmic methods for partially observed Markov decision processes - Lovejoy - 1991
151 Reward functions for accelerated learning - Mataric - 1994
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University