Results 1 -
2 of
2
Least Squares Policy Evaluation Algorithms With Linear Function Approximation
- Theory and Applications
, 2002
"... We consider policy evaluation algorithms within the context of infinite-horizon dynamic programming problems with discounted cost. We focus on discrete-time dynamic systems with a large number of states, and we discuss two methods, which use simulation, temporal differences, and linear cost function ..."
Abstract
-
Cited by 50 (7 self)
- Add to MetaCart
We consider policy evaluation algorithms within the context of infinite-horizon dynamic programming problems with discounted cost. We focus on discrete-time dynamic systems with a large number of states, and we discuss two methods, which use simulation, temporal differences, and linear cost function approximation. The first method is a new gradient-like algorithm involving least-squares subproblems and a diminishing stepsize, which is based on the #-policy iteration method of Bertsekas and Ioffe. The second method is the LSTD(#) algorithm recently proposed by Boyan, which for # =0coincides with the linear least-squares temporal-difference algorithm of Bradtke and Barto. At present, there is only a convergence result by Bradtke and Barto for the LSTD(0) algorithm. Here, we strengthen this result by showing the convergence of LSTD(#), with probability 1, for every # [0, 1].
Convergence Proofs of Least Squares Policy Iteration Algorithm for High-Dimensional Infinite Horizon Markov Decision Process Problems
"... Most of the current theory for dynamic programming algorithms focuses on finite state, finite action Markov decision problems, with a paucity of theory for the convergence of approximation algorithms with continuous states. In this paper we propose a policy iteration algorithm for infinite-horizon M ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Most of the current theory for dynamic programming algorithms focuses on finite state, finite action Markov decision problems, with a paucity of theory for the convergence of approximation algorithms with continuous states. In this paper we propose a policy iteration algorithm for infinite-horizon Markov decision problems where the state and action spaces are continuous and the expectation cannot be computed exactly. We show that an appropriately designed least squares (LS) or recursive least squares (RLS) method is provably convergent under certain problem structure assumptions on value functions. In addition, we show that the LS/RLS approximate policy iteration algorithm converges in the mean, meaning that the mean error between the approximate policy value function and the optimal value function shrinks to zero as successive approximations become more accurate. Furthermore, the convergence results are extended to the more general case of unknown basis functions The core concept of dynamic programming for solving Markov decision process (MDP) is Bellman’s equation, which is often written in the standard form (Puterman (1994)) Vt(St) = max{C(St, xt) + γ

