## Reinforcement Learning with Replacing Eligibility Traces (1996)

Venue: | MACHINE LEARNING |

Citations: | 186 - 11 self |

### BibTeX

@INPROCEEDINGS{Singh96reinforcementlearning,

author = {Satinder Singh and Richard S. Sutton},

title = {Reinforcement Learning with Replacing Eligibility Traces},

booktitle = {MACHINE LEARNING},

year = {1996},

pages = {123--158},

publisher = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

The eligibility trace is one of the basic mechanisms used in reinforcement learning to handle delayed reward. In this paper we introduce a new kind of eligibility trace, the replacing trace, analyze it theoretically, and show that it results in faster, more reliable learning than the conventional trace. Both kinds of trace assign credit to prior events according to how recently they occurred, but only the conventional trace gives greater credit to repeated events. Our analysis is for conventional and replace-trace versions of the offline TD(1) algorithm applied to undiscounted absorbing Markov chains. First, we show that these methods converge under repeated presentations of the training set to the same predictions as two well known Monte Carlo methods. We then analyze the relative efficiency of the two Monte Carlo methods. We show that the method corresponding to conventional TD is biased, whereas the method corresponding to replace-trace TD is unbiased. In addition, we show that t...