|
2829
|
Reinforcement Learning I: Introduction
– Richard S. Sutton, Andrew G. Barto
- 1998
|
|
1137
|
Learning from delayed rewards
– C J C H Watkins
- 1989
|
|
1965
|
Dynamic Programming
– R Bellman
- 1957
|
|
427
|
Dyna, an Integrated Architecture for Learning, Planning, and Reacting
– Richard S. Sutton
- 1991
|
|
1060
|
Learning to predict by the methods of temporal differences
– Richard S. Sutton
- 1988
|
|
178
|
Reinforcement learning for robots using neural networks
– L-J Lin
- 1992
|
|
85
|
Efficient Learning and Planning Within the Dyna Framework
– Jing Peng, Ronald J. Williams
- 1993
|
|
222
|
Motivated Reinforcement Learning
– Peter Dayan
- 2001
|
|
472
|
Learning to act using real-time dynamic programming
– Andrew G. Barto, Steven J. Bradtke, Satinder P. Singh
- 1993
|
|
94
|
Hierarchical Learning in Stochastic Domains: Preliminary Results
– Leslie Pack Kaelbling
- 1993
|
|
434
|
Dynamic programming and Markov processes
– R A Howard
- 1960
|
|
187
|
Convergence of Stochastic Iterative Dynamic Programming Algorithms
– Tommi Jaakkola, Michael I. Jordan, Satinder P. Singh
- 1994
|
|
173
|
Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach
– Lonnie Chrisman
- 1992
|
|
55
|
TD Models: Modeling the World at a Mixture of Time Scales
– Richard S. Sutton
- 1995
|
|
280
|
Learning in embedded systems
– L P Kaelbling
- 1993
|
|
226
|
On-Line Q-Learning Using Connectionist Systems
– G. A. Rummery, M. Niranjan
- 1994
|
|
219
|
Temporal Credit Assignment in Reinforcement Learning
– R S Sutton
- 1984
|
|
1134
|
Reinforcement learning: a survey
– Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore
- 1996
|
|
331
|
Dynamic programming: deterministic and stochastic models
– D P Bertsekas
- 1987
|