• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Temporal Difference Methods for General Projected Equations

Cached

  • Download as a PDF

Download Links

  • [web.mit.edu]
  • [www.mit.edu]
  • [www.mit.edu:8001]
  • [www-mit.mit.edu]
  • [web.mit.edu]
  • [www.mit.edu:8001]
  • [www.mit.edu]
  • [www-mit.mit.edu]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Dimitri P. Bertsekas
Citations:3 - 3 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Bertsekas_temporaldifference,
    author = {Dimitri P. Bertsekas},
    title = {Temporal Difference Methods for General Projected Equations},
    year = {}
}

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

Abstract—We consider projected equations for approximate solution of high-dimensional fixed point problems within lowdimensional subspaces. We introduce an analytical framework based on an equivalence with variational inequalities, and algorithms that may be implemented with low-dimensional simulation. These algorithms originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Even when specialized to DP, our methods include extensions/new versions of TD methods, which offer special implementation advantages and reduced overhead over the standard LSTD and LSPE methods, and can deal with near singularity in the associated matrix inversion. We develop deterministic iterative methods and their simulationbased versions, and we discuss a sharp qualitative distinction between them: the performance of the former is greatly affected by direction and feature scaling, yet the latter have the same asymptotic convergence rate regardless of scaling, because of their common simulation-induced performance bottleneck. Index Terms—Dynamic programming, Markov decision processes, approximation methods, temporal difference methods, reinforcement learning. I.

Citations

1075 Iterative Methods for Sparse Linear Systems - Saad - 1996
1060 Learning to predict by the methods of temporal differences - Sutton - 1988
878 Markov Decision Processes: Discrete Stochastic Dynamic Programming - Puterman - 1994
649 Reinforcement learning - Sutton, Barto - 1998
504 Nonlinear programming - Bertsekas - 1999
477 Tsitsiklis. Parallel and Distributed Computation: Numerical Methods - Bertsekas, N - 1989
184 An analysis of temporal-difference learning with function approximation - Tsitsiklis, Roy - 1997
178 Monotone operators and the proximal point algorithm - Rockafellar - 1976
140 On actor-critic algorithms - Konda, Tsitsiklis
139 Linear least-squares algorithms for temporal difference learning - Bradtke, Barto - 1996
72 Approximate Dynamic Programming: Solving the curses of dimensionality of dimensionality - Powell - 2007
68 Positive Solutions of Operator Equations - Krasnoselskii - 1964
65 Technical update: Least-squares temporal difference learning - Boyan
50 Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems: Theory and Applications - Nedić, Bertsekas - 2003
34 Computational Galerkin Methods - Fletcher - 1984
30 Temporal differences-based policy iteration and applications in neuro-dynamic programming - Bertsekas, Ioffe - 1996
28 Projection methods for variational inequalities with application to the traffic assignment problem - Bertsekas, Gafni - 1982
26 Stochastic approximation: A dynamical systems viewpoint. Cambridge Univ - Borkar - 2008
24 A generalized kalman filter for fixed point approximation and efficient temporal-difference learning - Choi, Roy - 2006
24 Improved temporal difference methods with linear function approximation - Bertsekas, Borkar, et al. - 2004
21 On the existence of fixed points for approximate value iteration and temporal-difference learning - Farias, Roy
17 Average cost temporal-difference learning - Tsitsiklis, Roy - 1999
15 A least squares Q-learning algorithm for optimal stopping problems, LIDS - Yu, Bertsekas - 2007
14 Dynamic Programming and Optimal Control, 3rd ed - Bertsekas
13 Projected Equation Methods for Approximate Solution of Large Linear Systems - Yu, Bertsekas, et al.
13 Convergence results for some temporal difference methods based on least-squares - Yu, Bertsekas - 2006
5 stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives - Optimal - 1999
5 Approximate Simulation-Based Solution of LargeScale - Wang, Polydorides, et al. - 2009
4 Projected Equations, Variational Inequalities, and Temporal Difference Methods - Bertsekas - 2009
3 Least Squares Temporal Difference Methods: An Analysis Under General Conditions - Yu - 2010
1 Regularisation d’ inequations variationnelles par approximations successives,” Revue Francaise d’Informatique et de - Martinet - 1970
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University