Least Squares Policy Evaluation Algorithms With Linear Function Approximation (2002)

by A. Nedic , D. P. Bertsekas
Venue:Theory and Applications
Citations:66 - 10 self

Active Bibliography

26 Improved Temporal Difference Methods with Linear Function Approximation – Dimitri P. Bertsekas, Angelia Nedich, Vivek S. Borkar
1324 Reinforcement learning: a survey – Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore - 1996
815 Probability: Theory and examples – Rick Durrett - 2011
624 Stochastic Perturbation Theory – G. W. Stewart - 1988
1246 Learning to predict by the methods of temporal differences – Richard S. Sutton - 1988
773 Markov chains for exploring posterior distributions – Luke Tierney - 1994
540 Learning to act using real-time dynamic programming – Andrew G. Barto, Steven J. Bradtke, Satinder P. Singh - 1993
486 Integrated architectures for learning, planning, and reacting based on approximating dynamic programming – Richard S. Sutton - 1990
440 Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning – Richard Sutton, Doina Precup, Satinder Singh - 1999