Results 1 
2 of
2
Least Squares Policy Evaluation Algorithms With Linear Function Approximation
 Theory and Applications
, 2002
"... We consider policy evaluation algorithms within the context of infinitehorizon dynamic programming problems with discounted cost. We focus on discretetime dynamic systems with a large number of states, and we discuss two methods, which use simulation, temporal differences, and linear cost function ..."
Abstract

Cited by 90 (13 self)
 Add to MetaCart
We consider policy evaluation algorithms within the context of infinitehorizon dynamic programming problems with discounted cost. We focus on discretetime dynamic systems with a large number of states, and we discuss two methods, which use simulation, temporal differences, and linear cost function approximation. The first method is a new gradientlike algorithm involving leastsquares subproblems and a diminishing stepsize, which is based on the #policy iteration method of Bertsekas and Ioffe. The second method is the LSTD(#) algorithm recently proposed by Boyan, which for # =0coincides with the linear leastsquares temporaldifference algorithm of Bradtke and Barto. At present, there is only a convergence result by Bradtke and Barto for the LSTD(0) algorithm. Here, we strengthen this result by showing the convergence of LSTD(#), with probability 1, for every # [0, 1].
Convergence Proofs of Least Squares Policy Iteration Algorithm for HighDimensional Infinite Horizon Markov Decision Process Problems
"... Most of the current theory for dynamic programming algorithms focuses on finite state, finite action Markov decision problems, with a paucity of theory for the convergence of approximation algorithms with continuous states. In this paper we propose a policy iteration algorithm for infinitehorizon M ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Most of the current theory for dynamic programming algorithms focuses on finite state, finite action Markov decision problems, with a paucity of theory for the convergence of approximation algorithms with continuous states. In this paper we propose a policy iteration algorithm for infinitehorizon Markov decision problems where the state and action spaces are continuous and the expectation cannot be computed exactly. We show that an appropriately designed least squares (LS) or recursive least squares (RLS) method is provably convergent under certain problem structure assumptions on value functions. In addition, we show that the LS/RLS approximate policy iteration algorithm converges in the mean, meaning that the mean error between the approximate policy value function and the optimal value function shrinks to zero as successive approximations become more accurate. Furthermore, the convergence results are extended to the more general case of unknown basis functions The core concept of dynamic programming for solving Markov decision process (MDP) is Bellman’s equation, which is often written in the standard form (Puterman (1994)) Vt(St) = max{C(St, xt) + γ