The Convergence of TD(λ) for General λ (1992)
| Citations: | 6 - 0 self |
BibTeX
@MISC{Dayan92theconvergence,
author = {Peter Dayan},
title = {The Convergence of TD(λ) for General λ},
year = {1992}
}
OpenURL
Abstract
The method of temporal differences (TD) is one way of making consistent predictions about the future. This paper uses some analysis of Watkins [19] to extend a convergence theorem due to Sutton [17] from the case which only uses information from adjacent time steps to that involving information from arbitrary ones. It also considers how this version of TD behaves in the face of linearly dependent representations for states -- demonstrating that it still converges, but to a different answer from the least mean squares algorithm. Finally, it adapts Watkins' theorem that Q-learning, his closely related prediction and action learning method, converges with probability one, to demonstrate this strong form of convergence for a slightly modified version of TD. Running head : TD() for General 1 This paper is based on a chapter of my thesis [5]. I am very grateful to Andy Barto, Steve Finch, Alex Lascarides, Satinder Singh, Chris Watkins, David Willshaw, the large number of people who read drafts of the thesis, and particularly Rich Sutton and two anonymous reviewers for their helpful advice and comments. Support was from SERC. 1 1







