179

Linear leastsquares algorithms for temporal difference learning
– Steven J. Bradtke, Andrew G. Barto, Pack Kaelbling
 1996

1222

Learning to predict by the methods of temporal differences
– Richard S. Sutton
 1988

174

ActorCritic Algorithms
– Vijay R. Konda, John N. Tsitsiklis
 2001

216

An analysis of temporaldifference learning with function approximation
– John N. Tsitsiklis, Benjamin Van Roy
 1997

88

Technical update: Leastsquares temporal difference learning
– Justin A. Boyan
 2002

95

LeastSquares Temporal Difference Learning
– Justin A. Boyan
 1999

40

Temporal differencesbased policy iteration and applications in neurodynamic programming
– Dimitri P. Bertsekas, Sergey Ioffe
 1996

75

Optimal Stopping of Markov Processes: Hilbert Space Theory, Approximation Algorithms, and an Application to Pricing HighDimensional Financial Derivatives
– John N. Tsitsiklis, Benjamin Van Roy
 1997

25

Improved Temporal Difference Methods with Linear Function Approximation
– Dimitri P. Bertsekas, Angelia Nedich, Vivek S. Borkar

3746

Reinforcement Learning I: Introduction
– Richard S. Sutton, Andrew G. Barto
 1998

317

Policy Gradient Methods for Reinforcement Learning with Function Approximation
– Richard S. Sutton, David Mcallester, Satinder Singh, Yishay Mansour
 1999

737

NonLinear Programming. Athena Scientific
– D Bertsekas
 1995

24

On the existence of fixed points for approximate value iteration and temporaldifference learning
– D P de Farias, B V Roy
 2000

17

A least squares Qlearning algorithm for optimal stopping problems
– H Yu, D P Bertsekas
 2007

742

NeuroDynamic Programming
– D Bertsekas, John N Tsitsiklis
 1996

53

Error Bounds for Approximate Policy Iteration
– Rmi Munos

318

Simple statistical gradientfollowing algorithms for connectionist reinforcement learning
– Ronald J. Williams
 1992

33

A Generalized Kalman Filter for Fixed Point Approximation and Efficient TemporalDifference
– David Choi, Benjamin Van Roy
 2001

21

Projected equation methods for approximate solution of large linear systems
– Dimitri P. Bertsekas, et al.
