Least Squares Policy Evaluation Algorithms With Linear Function Approximation (2002)

by A. Nedic , D. P. Bertsekas
Venue:Theory and Applications
Citations:50 - 7 self

Documents Related by Co-Citation

139 Linear least-squares algorithms for temporal difference learning – Steven J. Bradtke, Andrew G. Barto, Pack Kaelbling - 1996
140 Actor-Critic Algorithms – Vijay R. Konda, John N. Tsitsiklis - 2001
184 An analysis of temporal-difference learning with function approximation – John N. Tsitsiklis, Benjamin Van Roy - 1997
1060 Learning to predict by the methods of temporal differences – Richard S. Sutton - 1988
65 Technical update: Least-squares temporal difference learning – Justin A. Boyan - 2002
82 Least-Squares Temporal Difference Learning – Justin A. Boyan - 1999
262 Policy Gradient Methods for Reinforcement Learning with Function Approximation – Richard S. Sutton, David Mcallester, Satinder Singh, Yishay Mansour - 1999
2827 Reinforcement Learning I: Introduction – Richard S. Sutton, Andrew G. Barto - 1998
554 Nonlinear programming, Athena Scientific – D Bertsekas - 1995
24 Improved Temporal Difference Methods with Linear Function Approximation – Dimitri P. Bertsekas, Angelia Nedich, Vivek S. Borkar
21 On the existence of fixed points for approximate value iteration and temporal-difference learning – D P de Farias, B V Roy
30 Temporal differences-based policy iteration and applications in neuro-dynamic programming – Dimitri P. Bertsekas, Sergey Ioffe - 1996
41 Error Bounds for Approximate Policy Iteration – Rmi Munos
58 Optimal Stopping of Markov Processes: Hilbert Space Theory, Approximation Algorithms, and an Application to Pricing High-Dimensional Financial Derivatives – John N. Tsitsiklis, Benjamin Van Roy - 1997
24 A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference – David Choi, Benjamin Van Roy - 2001
15 A least squares Q-learning algorithm for optimal stopping problems, LIDS – H Yu, D P Bertsekas - 2007
262 Simple statistical gradient-following algorithms for connectionist reinforcement learning – Ronald J. Williams - 1992
85 A Natural Policy Gradient – Sham Kakade
32 The convergence of TD(λ) for general λ – P Dayan - 1992