MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Least-Squares Temporal Difference Learning (1999) [41 citations — 0 self]

by Justin Boyan
In Proceedings of the Sixteenth International Conference on Machine Learning
Add To MetaCart

Abstract:

TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works by incrementally updating the value function after each observed transition. It has two major drawbacks: it makes inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value function approximations and = 0, the Least-Squares TD (LSTD) algorithm of Bradtke and Barto [5] eliminates all stepsize parameters and improves data efficiency. This paper extends Bradtke and Barto's work in three significant ways. First, it presents a simpler derivation of the LSTD algorithm. Second, it generalizes from = 0 to arbitrary values of ; at the extreme of = 1, the resulting algorithm is shown to be a practical formulation of supervised linea...

Citations

1933 Reinforcement Learning: An introduction – Sutton, Barto - 1998
1127 Numerical Recipes In C: The Art of Scientific Computing – Flannery - 1992
931 Learning to predict by the methods of temporal differences – Sutton - 1988
374 Integrated architectures for learning, planning, and reacting based on approximating dynamic programming – Sutton - 1990
245 Prioritized sweeping: Reinforcement learning with less data and less real time – Moore, Atkeson - 1993
179 Reinforcement Learning for Robots Using Neural Networks – Lin - 1993
133 An analysis of temporal-difference learning with function approximation – Tsitsiklis, Roy - 1997
99 Reinforcement learning for dynamic channel allocation in cellular telephone systems – Singh, Bertsekas - 1997
56 Linear least-squares algorithms for temporal difference learning – Bradtke, Barto - 1996
51 Learning evaluation functions for global optimization and boolean satisfiability – BOYAN, MOORE - 1998
31 Gain adaptation beats least squares – Sutton - 1992
24 Learning Evaluation Functions for Global Optimization – Boyan - 1998
20 A comparison of direct and modelbased reinforcement learning – Atkeson, Santamaria - 1997