|
1135
|
Learning from delayed rewards
– J C Watkins
- 1989
|
|
2827
|
Reinforcement Learning I: Introduction
– Richard S. Sutton, Andrew G. Barto
- 1998
|
|
223
|
Generalization in Reinforcement Learning: Safely Approximating the Value Function
– Justin A. Boyan, Andrew W. Moore
- 1995
|
|
224
|
On-Line Q-Learning Using Connectionist Systems
– G. A. Rummery, M. Niranjan
- 1994
|
|
1060
|
Learning to predict by the methods of temporal differences
– Richard S. Sutton
- 1988
|
|
166
|
Reinforcement Learning with Replacing Eligibility Traces
– Satinder Singh, Richard S. Sutton
- 1996
|
|
190
|
Td-gammon, a self-teaching backgammon program, achieves master-level play
– G Tesauro
- 1994
|
|
172
|
Stable Function Approximation in Dynamic Programming
– Geoffrey J. Gordon
- 1995
|
|
184
|
An analysis of temporal-difference learning with function approximation
– John N. Tsitsiklis, Benjamin Van Roy
- 1997
|
|
129
|
Asynchronous Stochastic Approximation and Q-Learning
– John N. Tsitsiklis, Richard Sutton
- 1994
|
|
207
|
Residual Algorithms: Reinforcement Learning with Function Approximation
– Leemon Baird
- 1995
|
|
186
|
Convergence of Stochastic Iterative Dynamic Programming Algorithms
– Tommi Jaakkola, Michael I. Jordan, Satinder P. Singh
- 1994
|
|
250
|
Improving Elevator Performance Using Reinforcement Learning
– Robert Crites, Andrew Barto
- 1996
|
|
203
|
The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces
– Andrew W. Moore, Christopher G. Atkeson
- 1995
|
|
97
|
Reinforcement Learning with Soft State Aggregation
– Satinder P. Singh, Tommi Jaakkola, Michael I. Jordan
- 1995
|
|
1134
|
Reinforcement learning: a survey
– Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore
- 1996
|
|
334
|
Practical Issues in Temporal Difference Learning
– Gerald Tesauro
- 1992
|
|
147
|
Learning to predict by the methods of temporal di erence
– R S Sutton
- 1988
|
|
42
|
Problem Solving With Reinforcement Learning
– Gavin Adrian Rummery
- 1995
|