• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

DMCA

Learning to predict by the methods of temporal differences (1988)

Cached

  • Download as a PDF

Download Links

  • [www.cs.ualberta.ca]
  • [www-anw.cs.umass.edu]
  • [webdocs.cs.ualberta.ca]
  • [webdocs.cs.ualberta.ca]
  • [jmvidal.cse.sc.edu]
  • [incompleteideas.net]
  • [webdocs.cs.ualberta.ca]
  • [www-anw.cs.umass.edu]
  • [www.cs.ualberta.ca]
  • [webdocs.cs.ualberta.ca]
  • [incompleteideas.net]
  • [webdocs.cs.ualberta.ca]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Richard S. Sutton
Venue:MACHINE LEARNING
Citations:1520 - 56 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Sutton88learningto,
    author = {Richard S. Sutton},
    title = {Learning to predict by the methods of temporal differences},
    booktitle = {MACHINE LEARNING},
    year = {1988},
    pages = {9--44},
    publisher = {Kluwer Academic Publishers}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional prediction-learning methods assign credit by means of the difference between predicted and actual outcomes, the new methods assign credit by means of the difference between temporally successive predictions. Although such temporal-difference methods have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic, they have remained poorly understood. Here we prove their convergence and optimality for special cases and relate them to supervised-learning methods. For most real-world prediction problems, temporal-difference methods require less memory and less peak computation than conventional methods and they produce more accurate predictions. We argue that most problems to which supervised learning is currently applied are really prediction problems of the sort to which temporal-difference methods can be applied to advantage.

Keyphrases

temporal-difference method    temporal difference    prediction problem    adaptive heuristic critic    checker player    accurate prediction    past experience    bucket brigade    peak computation    new method    conventional method    whereas conventional prediction-learning method    real-world prediction problem    incremental learning procedure    supervised-learning method    known system    successive prediction    future behavior    special case    actual outcome   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University