|
207
|
Residual Algorithms: Reinforcement Learning with Function Approximation
– Leemon Baird
- 1995
|
|
224
|
Generalization in Reinforcement Learning: Safely Approximating the Value Function
– Justin A. Boyan, Andrew W. Moore
- 1995
|
|
1137
|
Learning from delayed rewards
– C J C H Watkins
- 1989
|
|
1060
|
Learning to predict by the methods of temporal differences
– Richard S. Sutton
- 1988
|
|
119
|
Feature-Based Methods For Large Scale Dynamic Programming
– John N. Tsitsiklis, Benjamin Van Roy
- 1994
|
|
1965
|
Dynamic Programming
– R Bellman
- 1957
|
|
250
|
Improving Elevator Performance Using Reinforcement Learning
– Robert Crites, Andrew Barto
- 1996
|
|
202
|
Learning policies for partially observable environments: Scaling up
– Michael L. Littman, Anthony R. Cassandra, Leslie Pack Kaelbling
- 1995
|
|
300
|
Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding
– Richard S. Sutton
- 1996
|
|
130
|
Asynchronous Stochastic Approximation and Q-Learning
– John N. Tsitsiklis, Richard Sutton
- 1994
|
|
187
|
Convergence of Stochastic Iterative Dynamic Programming Algorithms
– Tommi Jaakkola, Michael I. Jordan, Satinder P. Singh
- 1994
|
|
168
|
Reinforcement Learning with Replacing Eligibility Traces
– Satinder Singh, Richard S. Sutton, P. Kaelbling
- 1996
|
|
97
|
Reinforcement Learning with Soft State Aggregation
– Satinder P. Singh, Tommi Jaakkola, Michael I. Jordan
- 1995
|
|
200
|
Exploiting structure in policy construction
– Craig Boutilier, Richard Dearden, Moisés Goldszmidt
- 1995
|
|
1134
|
Reinforcement learning: a survey
– Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore
- 1996
|
|
203
|
The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces
– Andrew W. Moore, Christopher G. Atkeson
- 1995
|
|
478
|
Parallel and Distributed Computation: Numerical Methods. Athena Scientific
– D Bertsekas, J Tsitsiklis
- 1989
|
|
472
|
Learning to act using real-time dynamic programming
– Andrew G. Barto, Steven J. Bradtke, Satinder P. Singh
- 1993
|
|
343
|
Dynamic Programming and Optimal Control (Athena Scientific
– D Bertsekas
- 1995
|