Stable Function Approximation in Dynamic Programming (1995)

Cached

Download Links

by Geoffrey J. Gordon
Venue:IN MACHINE LEARNING: PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE
Citations:207 - 5 self

Documents Related by Co-Citation

1226 Learning to predict by the methods of temporal differences – Richard S. Sutton - 1988
237 Residual Algorithms: Reinforcement Learning with Function Approximation – Leemon Baird - 1995
251 Generalization in Reinforcement Learning: Safely Approximating the Value Function – Justin A. Boyan, Andrew W. Moore - 1995
1309 Learning from Delayed Rewards – C Watkins - 1989
135 Feature-Based Methods For Large Scale Dynamic Programming – John N. Tsitsiklis, Benjamin Van Roy - 1994
2593 On the theory of dynamic programming – Richard E Bellman - 1952
355 Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding – Richard S. Sutton - 1996
278 Improving Elevator Performance Using Reinforcement Learning – Robert Crites, Andrew Barto - 1996
151 Asynchronous Stochastic Approximation and Q-Learning – John N. Tsitsiklis, Richard Sutton - 1994
207 Convergence of Stochastic Iterative Dynamic Programming Algorithms – Tommi Jaakkola, Michael I. Jordan, Satinder P. Singh - 1994
234 Learning policies for partially observable environments: Scaling up – Michael L. Littman, Anthony R. Cassandra, Leslie Pack Kaelbling - 1995
186 Reinforcement Learning with Replacing Eligibility Traces – Satinder Singh, Richard S. Sutton - 1996
363 Practical Issues in Temporal Difference Learning – Gerald Tesauro - 1992
373 Dynamic Programming: Deterministic and Stochastic Model – D P Bertsekas - 1987
226 Exploiting structure in policy construction – Craig Boutilier, Richard Dearden, Mois├ęs Goldszmidt - 1995
242 Temporal credit assignment in reinforcement learning. Doctoral dissertation – R S Sutton - 1984
455 Dynamic programming and optimal control. Athena Scientific – D Bertsekas - 2001
1298 Reinforcement learning: a survey – Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore - 1996
527 Learning to act using real-time dynamic programming – Andrew G. Barto, Steven J. Bradtke, Satinder P. Singh - 1993