Results 1 - 10
of
296,080
Learning to predict by the methods of temporal differences
- MACHINE LEARNING
, 1988
"... This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional prediction-learning methods assign credit by means of the difference between predi ..."
Abstract
-
Cited by 1521 (56 self)
- Add to MetaCart
predicted and actual outcomes, the new methods assign credit by means of the difference between temporally successive predictions. Although such temporal-difference methods have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic
Improved Temporal Difference Methods with Linear Function Approximation
"... This chapter considers temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost and linear cost function approximation. This problem arises as a subproblem in the policy iteration method of dynamic programming. Additional d ..."
Abstract
-
Cited by 32 (7 self)
- Add to MetaCart
This chapter considers temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost and linear cost function approximation. This problem arises as a subproblem in the policy iteration method of dynamic programming. Additional
Pathologies of Temporal Difference Methods in Approximate Dynamic Programming
, 2010
"... Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated convergence behavior is complex, and not well understood at present. An important question is whether the policy iteration p ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated convergence behavior is complex, and not well understood at present. An important question is whether the policy iteration
Temporal Difference Methods for General Projected Equations
, 2011
"... We consider projected equations for approximate solution of high-dimensional fixed point problems within lowdimensional subspaces. We introduce an analytical framework based on an equivalence with variational inequalities, and algorithms that may be implemented with low-dimensional simulation. These ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
. These algorithms originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Even when specialized to DP, our methods include extensions/new versions of TD methods, which offer special implementation advantages and reduced overhead over the standard
Projected Equations, Variational Inequalities, and Temporal Difference Methods
, 2009
"... We consider projected equations for approximate solution of high-dimensional fixed point problems within lowdimensional subspaces. We introduce an analytical framework based on an equivalence with variational inequalities (VIs), and a class of iterative feasible direction methods that may be impleme ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
be implemented with low-dimensional simulation. These methods originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Even when specialized to DP, our methods include extensions/new versions of TD algorithms, which offer special implementation
Convergence Results for Some Temporal Difference Methods Based on Least Squares
- LAB. FOR INFORMATION AND DECISION SYSTEMS REPORT 2697
, 2008
"... We consider finite-state Markov decision processes, and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms of the type known as LSPE(λ). These are temporal difference methods for constructing a linear function approximation of the cost function o ..."
Abstract
-
Cited by 26 (10 self)
- Add to MetaCart
We consider finite-state Markov decision processes, and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms of the type known as LSPE(λ). These are temporal difference methods for constructing a linear function approximation of the cost function
Temporal difference methods for the variance of the reward to go
- In Proceedings of the 30th International Conference on Machine Learning
, 2013
"... In this paper we extend temporal difference policy evaluation algorithms to performance criteria that include the variance of the cumulative reward. Such criteria are useful for risk management, and are important in domains such as finance and process control. We propose variants of both TD(0) and L ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In this paper we extend temporal difference policy evaluation algorithms to performance criteria that include the variance of the cumulative reward. Such criteria are useful for risk management, and are important in domains such as finance and process control. We propose variants of both TD(0
Practical Issues in Temporal Difference Learning
- Machine Learning
, 1992
"... This paper examines whether temporal difference methods for training connectionist networks, such as Suttons's TD(lambda) algorithm can be successfully applied to complex real-world problems. A number of important practical issues are identified and discussed from a general theoretical perspect ..."
Abstract
-
Cited by 415 (2 self)
- Add to MetaCart
This paper examines whether temporal difference methods for training connectionist networks, such as Suttons's TD(lambda) algorithm can be successfully applied to complex real-world problems. A number of important practical issues are identified and discussed from a general theoretical
Learning to Play Board Games using Temporal Difference Methods
, 2007
"... A promising approach to learn to play board games is to use reinforcement learning algorithms that can learn a game position evaluation function. In this paper we examine and compare three different methods for generating training games: (1) Learning by self-play, (2) Learning by playing against an ..."
Abstract
- Add to MetaCart
compared these three methods using temporal difference methods to learn the game of backgammon. For particular games such as draughts and chess, learning from a large database containing games played by human experts has as a large advantage that during the generation of (useful) training games
Comparing evolutionary and temporal difference methods in a reinforcement learning domain
- In GECCO 2006: Proceedings of the Genetic and Evolutionary Computation Conference (pp. 1321–1328). 123 Agent Multi-Agent Syst
, 2006
"... Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving reinforcement learning (RL) problems. However, since few rigorous empirical comparisons have been conducted, there are no general guidelines describing the methods ’ relative strengths and weaknesses. ..."
Abstract
-
Cited by 41 (13 self)
- Add to MetaCart
Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving reinforcement learning (RL) problems. However, since few rigorous empirical comparisons have been conducted, there are no general guidelines describing the methods ’ relative strengths and weaknesses
Results 1 - 10
of
296,080