• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

Practical Issues in Temporal Difference Learning (1992)

Cached

  • Download as a PDF

Download Links

  • [www.research.ibm.com]
  • [www.research.ibm.com]
  • [www-anw.cs.umass.edu]
  • [www.cs.ualberta.ca]
  • [webdocs.cs.ualberta.ca]
  • [incompleteideas.net]
  • [webdocs.cs.ualberta.ca]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Gerald Tesauro
Venue:Machine Learning
Citations:415 - 2 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Tesauro92practicalissues,
    author = {Gerald Tesauro},
    title = {Practical Issues in Temporal Difference Learning},
    booktitle = {Machine Learning},
    year = {1992},
    pages = {257--277}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

This paper examines whether temporal difference methods for training connectionist networks, such as Suttons's TD(lambda) algorithm can be successfully applied to complex real-world problems. A number of important practical issues are identified and discussed from a general theoretical perspective. These practical issues are then examined in the context of a case study in which TD(lambda) is applied to learning the game of backgammon from the outcome of self-play. This is apparently the first application of this algorithm to a complex nontrivial task. It is found that, with zero knowledge built in, the network is able to learn from scratch to play the entire game at a fairly strong intermediate level of performance which is clearly better than conventional commercial programs and which in fact surpasses comparable networks trained on a massive human expert data set. This indicates that TD learning may work better in practice than one would expect based on current theory, and it suggests that further analysis of TD methods, as well as applications in other complex domains may be worth investigating.

Keyphrases

practical issue    temporal difference learning    current theory    connectionist network    fact surpasses comparable network    case study    zero knowledge    entire game    general theoretical perspective    complex nontrivial task    td learning    complex domain    complex real-world problem    conventional commercial program    temporal difference method    first application    massive human expert data set    important practical issue    td method    strong intermediate level   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University