|
139
|
Linear least-squares algorithms for temporal difference learning
– Steven J. Bradtke, Andrew G. Barto, Pack Kaelbling
- 1996
|
|
140
|
Actor-Critic Algorithms
– Vijay R. Konda, John N. Tsitsiklis
- 2001
|
|
184
|
An analysis of temporal-difference learning with function approximation
– John N. Tsitsiklis, Benjamin Van Roy
- 1997
|
|
1060
|
Learning to predict by the methods of temporal differences
– Richard S. Sutton
- 1988
|
|
65
|
Technical update: Least-squares temporal difference learning
– Justin A. Boyan
- 2002
|
|
82
|
Least-Squares Temporal Difference Learning
– Justin A. Boyan
- 1999
|
|
262
|
Policy Gradient Methods for Reinforcement Learning with Function Approximation
– Richard S. Sutton, David Mcallester, Satinder Singh, Yishay Mansour
- 1999
|
|
2827
|
Reinforcement Learning I: Introduction
– Richard S. Sutton, Andrew G. Barto
- 1998
|
|
554
|
Nonlinear programming, Athena Scientific
– D Bertsekas
- 1995
|
|
24
|
Improved Temporal Difference Methods with Linear Function Approximation
– Dimitri P. Bertsekas, Angelia Nedich, Vivek S. Borkar
|
|
21
|
On the existence of fixed points for approximate value iteration and temporal-difference learning
– D P de Farias, B V Roy
|
|
30
|
Temporal differences-based policy iteration and applications in neuro-dynamic programming
– Dimitri P. Bertsekas, Sergey Ioffe
- 1996
|
|
41
|
Error Bounds for Approximate Policy Iteration
– Rmi Munos
|
|
58
|
Optimal Stopping of Markov Processes: Hilbert Space Theory, Approximation Algorithms, and an Application to Pricing High-Dimensional Financial Derivatives
– John N. Tsitsiklis, Benjamin Van Roy
- 1997
|
|
24
|
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference
– David Choi, Benjamin Van Roy
- 2001
|
|
15
|
A least squares Q-learning algorithm for optimal stopping problems, LIDS
– H Yu, D P Bertsekas
- 2007
|
|
262
|
Simple statistical gradient-following algorithms for connectionist reinforcement learning
– Ronald J. Williams
- 1992
|
|
85
|
A Natural Policy Gradient
– Sham Kakade
|
|
32
|
The convergence of TD(λ) for general λ
– P Dayan
- 1992
|