• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Convergence results for some temporal difference methods based on least-squares (2006)

by H Yu, D P Bertsekas
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 10

Analyzing feature generation for valuefunction approximation

by Ronald Parr, Christopher Painter-wakefield, Lihong Li, Michael Littman - In Proceedings of the 24th International Conference on Machine Learning , 2007
"... We analyze a simple, Bellman-error-based approach to generating basis functions for valuefunction approximation. We show that it generates orthogonal basis functions that provably tighten approximation error bounds. We also illustrate the use of this approach in the presence of noise on some sample ..."
Abstract - Cited by 30 (4 self) - Add to MetaCart
We analyze a simple, Bellman-error-based approach to generating basis functions for valuefunction approximation. We show that it generates orthogonal basis functions that provably tighten approximation error bounds. We also illustrate the use of this approach in the presence of noise on some sample problems. 1.

Projected Equations, Variational Inequalities, and Temporal Difference Methods

by Dimitri P. Bertsekas , 2009
"... We consider projected equations for approximate solution of high-dimensional fixed point problems within lowdimensional subspaces. We introduce an analytical framework based on an equivalence with variational inequalities (VIs), and a class of iterative feasible direction methods that may be impleme ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
We consider projected equations for approximate solution of high-dimensional fixed point problems within lowdimensional subspaces. We introduce an analytical framework based on an equivalence with variational inequalities (VIs), and a class of iterative feasible direction methods that may be implemented with low-dimensional simulation. These methods originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Even when specialized to DP, our methods include extensions/new versions of TD algorithms, which offer special implementation advantages and reduced overhead over the standard LSTD and LSPE methods. We demonstrate a sharp qualitative distinction between the deterministic and the simulation-based versions: the performance of the former is greatly affected by direction and feature scaling, yet the latter asymptotically perform identically, regardless of scaling. I.

Temporal Difference Methods for General Projected Equations

by Dimitri P. Bertsekas
"... Abstract—We consider projected equations for approximate solution of high-dimensional fixed point problems within lowdimensional subspaces. We introduce an analytical framework based on an equivalence with variational inequalities, and algorithms that may be implemented with low-dimensional simulati ..."
Abstract - Cited by 3 (3 self) - Add to MetaCart
Abstract—We consider projected equations for approximate solution of high-dimensional fixed point problems within lowdimensional subspaces. We introduce an analytical framework based on an equivalence with variational inequalities, and algorithms that may be implemented with low-dimensional simulation. These algorithms originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Even when specialized to DP, our methods include extensions/new versions of TD methods, which offer special implementation advantages and reduced overhead over the standard LSTD and LSPE methods, and can deal with near singularity in the associated matrix inversion. We develop deterministic iterative methods and their simulationbased versions, and we discuss a sharp qualitative distinction between them: the performance of the former is greatly affected by direction and feature scaling, yet the latter have the same asymptotic convergence rate regardless of scaling, because of their common simulation-induced performance bottleneck. Index Terms—Dynamic programming, Markov decision processes, approximation methods, temporal difference methods, reinforcement learning. I.

Lambda-Policy Iteration: A Review and a New Implementation†

by Dimitri P. Bertsekas
"... In this paper we discuss λ-policy iteration, a method for exact and approximate dynamic programming. It is intermediate between the classical value iteration (VI) and policy iteration (PI) methods, and it is closely related to optimistic (also known as modified) PI, whereby each policy evaluation is ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
In this paper we discuss λ-policy iteration, a method for exact and approximate dynamic programming. It is intermediate between the classical value iteration (VI) and policy iteration (PI) methods, and it is closely related to optimistic (also known as modified) PI, whereby each policy evaluation is done approximately, using a finite number of VI. We review the theory of the method and associated questions of bias and exploration arising in simulation-based cost function approximation. We then discuss various implementations, which offer advantages over well-established PI methods that use LSPE(λ), LSTD(λ), or TD(λ) for policy evaluation with cost function approximation. One of these implementations is based on a new simulation scheme, called geometric sampling, which uses multiple short trajectories rather than a single infinitely long trajectory. 1.

On the Convergence of Iterative Simulation-Based Methods for Singular Linear Systems

by Mengdi Wang, Dimitri P. Bertsekas , 2012
"... We consider simulation-based algorithms for linear systems of equations, Ax = b, where A is singular. The convergence properties of iterative solution methods can be impaired when the methods are implemented with simulation, as is often done in important classes of large-scale problems. We focus on ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
We consider simulation-based algorithms for linear systems of equations, Ax = b, where A is singular. The convergence properties of iterative solution methods can be impaired when the methods are implemented with simulation, as is often done in important classes of large-scale problems. We focus on special cases of singular systems, including some arising in approximate dynamic programming, where convergence of the residual sequence may be obtained without a stabilization mechanism, while the sequence of iterates may diverge. For some of these special cases, under additional assumptions, we show that the sequence is guaranteed to converge. For situations where the sequence of iterates diverges, we propose schemes for extracting from the divergent sequence another sequence that converges to a solution of Ax = b. 1

Contents

by Dimitri P. Bertsekas, Huizhen Yu
"... We consider fixed point equations, and approximation of the solution by projection on a low-dimensional subspace. We propose stochastic iterative algorithms, based on simulation, which converge to the approximate solution and are suitable for large-dimensional problems. We focus primarily on general ..."
Abstract - Add to MetaCart
We consider fixed point equations, and approximation of the solution by projection on a low-dimensional subspace. We propose stochastic iterative algorithms, based on simulation, which converge to the approximate solution and are suitable for large-dimensional problems. We focus primarily on general linear systems and propose extensions of recent approximate dynamic programming methods, based on the use of temporal differences, which solve a projected form of Bellman’s equation by using simulation-based approximations

ARTICLE IN PRESS Journal of Computational and Applied Mathematics ( ) – Contents lists available at ScienceDirect Journal of Computational and Applied

by unknown authors
"... journal homepage: www.elsevier.com/locate/cam Projected equation methods for approximate solution of large ..."
Abstract - Add to MetaCart
journal homepage: www.elsevier.com/locate/cam Projected equation methods for approximate solution of large

Approximate Dynamic Programming Contents

by Dimitri P. Bertsekas
"... This is an updated version of the research-oriented Chapter 6 on Approximate Dynamic Programming. In addition to editorial revisions and rearrangements, it includes an account of new research (joint with J. Yu), which is collected mostly in the new Section 6.7. The chapter will be periodically updat ..."
Abstract - Add to MetaCart
This is an updated version of the research-oriented Chapter 6 on Approximate Dynamic Programming. In addition to editorial revisions and rearrangements, it includes an account of new research (joint with J. Yu), which is collected mostly in the new Section 6.7. The chapter will be periodically updated as new research becomes available, and will replace the current chapter 6 in the book’s next printing.

Contents lists available at ScienceDirect Journal of Computational and Applied

by unknown authors
"... journal homepage: www.elsevier.com/locate/cam Projected equation methods for approximate solution of large ..."
Abstract - Add to MetaCart
journal homepage: www.elsevier.com/locate/cam Projected equation methods for approximate solution of large

Contents

by unknown authors
"... 1.1 The dynamic programming and reinforcement learning problem.. 2 1.2 Approximation in dynamic programming and reinforcement learning 5 1.3 About this book............................ 8 ..."
Abstract - Add to MetaCart
1.1 The dynamic programming and reinforcement learning problem.. 2 1.2 Approximation in dynamic programming and reinforcement learning 5 1.3 About this book............................ 8
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University