• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 296,080
Next 10 →

Learning to predict by the methods of temporal differences

by Richard S. Sutton - MACHINE LEARNING , 1988
"... This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional prediction-learning methods assign credit by means of the difference between predi ..."
Abstract - Cited by 1521 (56 self) - Add to MetaCart
predicted and actual outcomes, the new methods assign credit by means of the difference between temporally successive predictions. Although such temporal-difference methods have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic

Improved Temporal Difference Methods with Linear Function Approximation

by Dimitri P. Bertsekas, Angelia Nedich, Vivek S. Borkar
"... This chapter considers temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost and linear cost function approximation. This problem arises as a subproblem in the policy iteration method of dynamic programming. Additional d ..."
Abstract - Cited by 32 (7 self) - Add to MetaCart
This chapter considers temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost and linear cost function approximation. This problem arises as a subproblem in the policy iteration method of dynamic programming. Additional

Pathologies of Temporal Difference Methods in Approximate Dynamic Programming

by Dimitri P. Bertsekas , 2010
"... Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated convergence behavior is complex, and not well understood at present. An important question is whether the policy iteration p ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated convergence behavior is complex, and not well understood at present. An important question is whether the policy iteration

Temporal Difference Methods for General Projected Equations

by Dimitri P. Bertsekas , 2011
"... We consider projected equations for approximate solution of high-dimensional fixed point problems within lowdimensional subspaces. We introduce an analytical framework based on an equivalence with variational inequalities, and algorithms that may be implemented with low-dimensional simulation. These ..."
Abstract - Cited by 10 (4 self) - Add to MetaCart
. These algorithms originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Even when specialized to DP, our methods include extensions/new versions of TD methods, which offer special implementation advantages and reduced overhead over the standard

Projected Equations, Variational Inequalities, and Temporal Difference Methods

by Dimitri P. Bertsekas , 2009
"... We consider projected equations for approximate solution of high-dimensional fixed point problems within lowdimensional subspaces. We introduce an analytical framework based on an equivalence with variational inequalities (VIs), and a class of iterative feasible direction methods that may be impleme ..."
Abstract - Cited by 8 (1 self) - Add to MetaCart
be implemented with low-dimensional simulation. These methods originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Even when specialized to DP, our methods include extensions/new versions of TD algorithms, which offer special implementation

Convergence Results for Some Temporal Difference Methods Based on Least Squares

by Huizhen Yu, Dimitri P. Bertsekas - LAB. FOR INFORMATION AND DECISION SYSTEMS REPORT 2697 , 2008
"... We consider finite-state Markov decision processes, and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms of the type known as LSPE(λ). These are temporal difference methods for constructing a linear function approximation of the cost function o ..."
Abstract - Cited by 26 (10 self) - Add to MetaCart
We consider finite-state Markov decision processes, and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms of the type known as LSPE(λ). These are temporal difference methods for constructing a linear function approximation of the cost function

Temporal difference methods for the variance of the reward to go

by Dotan Di Castro, Shie Mannor - In Proceedings of the 30th International Conference on Machine Learning , 2013
"... In this paper we extend temporal difference policy evaluation algorithms to performance criteria that include the variance of the cumulative reward. Such criteria are useful for risk management, and are important in domains such as finance and process control. We propose variants of both TD(0) and L ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
In this paper we extend temporal difference policy evaluation algorithms to performance criteria that include the variance of the cumulative reward. Such criteria are useful for risk management, and are important in domains such as finance and process control. We propose variants of both TD(0

Practical Issues in Temporal Difference Learning

by Gerald Tesauro - Machine Learning , 1992
"... This paper examines whether temporal difference methods for training connectionist networks, such as Suttons's TD(lambda) algorithm can be successfully applied to complex real-world problems. A number of important practical issues are identified and discussed from a general theoretical perspect ..."
Abstract - Cited by 415 (2 self) - Add to MetaCart
This paper examines whether temporal difference methods for training connectionist networks, such as Suttons's TD(lambda) algorithm can be successfully applied to complex real-world problems. A number of important practical issues are identified and discussed from a general theoretical

Learning to Play Board Games using Temporal Difference Methods

by Marco A. Wiering, Jan Peter Patist, Henk Mannen , 2007
"... A promising approach to learn to play board games is to use reinforcement learning algorithms that can learn a game position evaluation function. In this paper we examine and compare three different methods for generating training games: (1) Learning by self-play, (2) Learning by playing against an ..."
Abstract - Add to MetaCart
compared these three methods using temporal difference methods to learn the game of backgammon. For particular games such as draughts and chess, learning from a large database containing games played by human experts has as a large advantage that during the generation of (useful) training games

Comparing evolutionary and temporal difference methods in a reinforcement learning domain

by Matthew E. Taylor, Shimon Whiteson, Peter Stone - In GECCO 2006: Proceedings of the Genetic and Evolutionary Computation Conference (pp. 1321–1328). 123 Agent Multi-Agent Syst , 2006
"... Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving reinforcement learning (RL) problems. However, since few rigorous empirical comparisons have been conducted, there are no general guidelines describing the methods ’ relative strengths and weaknesses. ..."
Abstract - Cited by 41 (13 self) - Add to MetaCart
Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving reinforcement learning (RL) problems. However, since few rigorous empirical comparisons have been conducted, there are no general guidelines describing the methods ’ relative strengths and weaknesses
Next 10 →
Results 1 - 10 of 296,080
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University