## Projected Equations, Variational Inequalities, and Temporal Difference Methods (2009)

### Cached

### Download Links

Citations: | 6 - 1 self |

### BibTeX

@MISC{Bertsekas09projectedequations,,

author = {Dimitri P. Bertsekas},

title = {Projected Equations, Variational Inequalities, and Temporal Difference Methods},

year = {2009}

}

### OpenURL

### Abstract

We consider projected equations for approximate solution of high-dimensional fixed point problems within lowdimensional subspaces. We introduce an analytical framework based on an equivalence with variational inequalities (VIs), and a class of iterative feasible direction methods that may be implemented with low-dimensional simulation. These methods originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Even when specialized to DP, our methods include extensions/new versions of TD algorithms, which offer special implementation advantages and reduced overhead over the standard LSTD and LSPE methods. We demonstrate a sharp qualitative distinction between the deterministic and the simulation-based versions: the performance of the former is greatly affected by direction and feature scaling, yet the latter asymptotically perform identically, regardless of scaling. I.

### Citations

1231 | Learning to predict by the method of temporal differences
- Sutton
- 1988
(Show Context)
Citation Context ...class of algorithms for solving the DP/Bellman projected equation is based on simulation and a notion of residual known as temporal difference (TD). These algorithms include TD(λ) (proposed by Sutton =-=[Sut88]-=-), least squares policy evaluation (LSPE; originally proposed by Bertsekas and Ioffe [BeI96], and followed up by Nedić and Bertsekas, [NeB03], Bertsekas, Borkar, and Nedić [BBN04], and Yu and Bertseka... |

1207 |
Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman
- 1994
(Show Context)
Citation Context ...o be discussed in the next section. In the case of a discounted MDP, where A = αP , α ∈ (0, 1) is a discount factor, and P is the transition matrix of the 1 110 associated Markov chain (see [Ber07], =-=[Put94]-=-), the method reduces to the projected value iteration method, and is related to the LSPE method, which is its simulation-based implementation (see [Ber07], for a textbook account, and also [YuB06], w... |

625 |
Tsitsiklis. Parallel and Distributed Computation: Numerical Methods
- Bertsekas, N
- 1989
(Show Context)
Citation Context ...ositive stepsize, D is a positive definite symmetric matrix, and PD,R[·] denotes projection on R with respect to the norm ‖r‖D = √ r ′ Dr. The standard convergence result for these methods (see e.g., =-=[BeT89]-=-, Section 3.5.3, or [PaF03], Section 12.1.1) assumes that F is Lipschitz continuous and strongly monotone over R, in the sense that for some scalars L > 0 and β > 0, we have ‖F (r1) − F (r2)‖ ≤ L‖r1 −... |

340 | Dynamic Programming and
- Bertsekas
- 1995
(Show Context)
Citation Context ...roximate linear programming (Schweitzer and Seidman [ScS86]), the Bellman equation error method (see [BeT96], Section 6.10), or methods based on low-dimensional aggregation (see [BeT96], Section 6.7, =-=[Ber05]-=-, Section 6.3.4). We pay particular attention to a class of iterative projection methods for monotone VIs, specially adapted to the solution of projected equations of very large dimension, and impleme... |

265 |
Monotone operators and the proximal point algorithm
- Rockafellar
- 1976
(Show Context)
Citation Context ...Ck is replaced by the positive definite matrix C and dk is replaced by d, the algorithm is a special case of the proximal point algorithm applied to monotone VIs (see Martinet [Mar70] and Rockafellar =-=[Roc76]-=-). A similar iteration based on Eq. (37) can be used in the more general case where ˆ R ̸= ℜ s . Another type of regularization approach for the case ˆ R = ℜ s , is to replace the system Ckr = dk equi... |

217 | An analysis of temporal-difference learning with function approximation (Technical Report LIDS-P-2322
- Tsitsiklis, Roy
- 1996
(Show Context)
Citation Context ...rojection on the subspace S, it is sufficient that ΠT rather than T be a contraction. The origin of the following proposition can be traced to the convergence proof of TD(λ) by Tsitsiklis and Van Roy =-=[TsV97]-=- (Lemma 9); see also [BeY08], Prop. 5. Proposition 2. Assume that ˆ S = S and that ΠT is a contraction with respect to the norm ‖ · ‖Ξ over the subspace S. Then the function f of Eq. (5) is strongly m... |

181 | Linear least-squares algorithms for temporal difference learning
- Bradtke, Barto
- 1996
(Show Context)
Citation Context ... followed up by Nedić and Bertsekas, [NeB03], Bertsekas, Borkar, and Nedić [BBN04], and Yu and Bertsekas1 [YuB06]), least squares temporal differences (LSTD; originally proposed by Bradtke and Barto =-=[BrB96]-=-, and followed up by Boyan [Boy02], and Nedić and Bertsekas [NeB03]), and the Fixed Point Kalman Filter (FPKF; proposed by Choi and Van Roy [ChV06]). The first three of these algorithms have been rece... |

177 | On actor-critic algorithms
- Konda, Tsitsiklis
- 2003
(Show Context)
Citation Context ...l scaling obviate the need for matrix inversion without loss of computational efficiency. We also briefly mention a connection with the TD(0) and FPKF algorithms, but note that the analysis of Konda (=-=[Kon02]-=-, Ch. 6), as well as an example given in Yu and Bertsekas [YuB07] show that even with optimal scaling, TD(0) and FPKF have worse convergence rate than our methods, which all converge at the optimal co... |

170 |
Approximate Dynamic Programming: Solving the Curses of Dimensionality
- Powell
- 2007
(Show Context)
Citation Context ...in the literature, has been extensively tested in practice, and is one of the major methods for approximate DP (see the books by Bertsekas and Tsitsiklis [BeT96], Sutton and Barto [SuB98], and Powell =-=[Pow07]-=-; Bertsekas [Ber07] provides a recent textbook treatment and up-to-date references). 1 In our notation ℜ s is the s-dimensional Euclidean space, all vectors in ℜ s are viewed as column vectors, and a ... |

156 |
Iterative Methods for Sparse Linear Systems, 2nd Edition
- Saad
- 2003
(Show Context)
Citation Context ...a least squares formulation. The vector that minimizes ‖x − Ax − b‖2 is approximated by an x ∈ S such that the residual (x − Ax − b) is orthogonal to U (this is known as the Petrov-Galerkin condition =-=[Saa03]-=-). If U = ΞS, where Ξ is a positive definite symmetric matrix, then the orthogonality condition is written as y ′ Ξ(x − Ax − b) = 0 for all y ∈ S, which together with the condition x ∈ S, is equivalen... |

142 | The Linear Programming Approach to Approximate Dynamic Programming
- FARIAS, B
(Show Context)
Citation Context ... 1 ‖¯x − Π¯x‖. 1 − α An interesting research question is to develop related error bounds for cases where ΠT is not a contraction. Such bounds are available for particular cases (de Farias and Van Roy =-=[DFV03]-=-, Yu and Bertsekas [YuB08]). We will now show that contraction properties of T or ΠT imply that f is strongly monotone over ˆ S, in the sense that for some scalar β > 0, we have ( f(x1) − f(x2) ) ′ (x... |

130 | R.: Efficient solution algorithms for factored MDPs
- Guestrin, Koller, et al.
(Show Context)
Citation Context ...onstraint aggregation (combining constraints), exploitation of special problem structure, and special types of basis functions that implicitly take into account the constraints; see [CaC05], [DFV04], =-=[GKP03]-=-, [GrH91], [MoK99], [PaT00], [ScP01], [TrZ93], [TrZ97], for related methods, analysis, and discussion of this issue in the context of approximate linear programming methods in DP and beyond. A possibl... |

123 | Tsitsiklis, “Congestion-dependent pricing of network services
- Paschalidis, N
(Show Context)
Citation Context ...ining constraints), exploitation of special problem structure, and special types of basis functions that implicitly take into account the constraints; see [CaC05], [DFV04], [GKP03], [GrH91], [MoK99], =-=[PaT00]-=-, [ScP01], [TrZ93], [TrZ97], for related methods, analysis, and discussion of this issue in the context of approximate linear programming methods in DP and beyond. A possible alternative is to elimina... |

97 | Generalized polynomial approximations in Markovian decision processes - Schweitzer, Seidmann - 1985 |

96 | B.: On constraint sampling in the linear programming approach to approximate dynamic programming
- Farias, Roy
- 2001
(Show Context)
Citation Context ...xtent to the case where R is a strict subset of ℜn , but when R is of the form R = {r | Φr ∈ X} for some set X ⊂ ℜn , simulation may be necessary to approximate R (i.e., constraint sampling, [CaC05], =-=[DFV04]-=-), and efficient ways for doing so in the context of our methodology is a subject for further research. We note that there are interesting problems where the set R is a strict subset of ℜs and/or T is... |

89 | Technical update: Least-squares temporal difference learning
- Boyan
- 1999
(Show Context)
Citation Context ...s, [NeB03], Bertsekas, Borkar, and Nedić [BBN04], and Yu and Bertsekas1 [YuB06]), least squares temporal differences (LSTD; originally proposed by Bradtke and Barto [BrB96], and followed up by Boyan =-=[Boy02]-=-, and Nedić and Bertsekas [NeB03]), and the Fixed Point Kalman Filter (FPKF; proposed by Choi and Van Roy [ChV06]). The first three of these algorithms have been recently extended for approximate solu... |

75 | Optimal stopping of Markov processes : Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional nancial derivatives. IEEE transactions on automatic control, 44(10):1840 1851
- Tsitsiklis, Roy
- 1999
(Show Context)
Citation Context ...olution of general linear fixed point problems (see Bertsekas and Yu [BeY08]). There are also extensions to some nonlinear fixed point problems arising in optimal stopping (see Tsitsiklis and Van Roy =-=[TsV99b]-=-, and Yu and Bertsekas [YuB07]), and more general contexts (Choi and Van Roy [ChV06], and Bertsekas and Yu [BeY08]). We note that solving the projected equation is closely related to Galerkin and fini... |

71 |
Stochastic Approximation: A Dynamical Systems Viewpoint
- Borkar
(Show Context)
Citation Context ... [BBN04] and proved in [YuB06] (the rate is optimal among TD methods for approximate DP, in the sense of Konda [Kon02], Ch. 6). The proof idea is based on a two-time scale argument (see, e.g., Borkar =-=[Bor08]-=-, Ch. 6). Since for any Φ, the high-dimensional sequence ΦC −1 k dk does not depend on Φ (by Prop. 3), for any Dk and γ that lead to convergence, the simulation-based iteration (26) produces asymptoti... |

63 | Least squares policy evaluation algorithms with linear function approximation
- Nedic, Bertsekas
- 2003
(Show Context)
Citation Context ...e (TD). These algorithms include TD(λ) (proposed by Sutton [Sut88]), least squares policy evaluation (LSPE; originally proposed by Bertsekas and Ioffe [BeI96], and followed up by Nedić and Bertsekas, =-=[NeB03]-=-, Bertsekas, Borkar, and Nedić [BBN04], and Yu and Bertsekas1 [YuB06]), least squares temporal differences (LSTD; originally proposed by Bradtke and Barto [BrB96], and followed up by Boyan [Boy02], a... |

52 |
Régularisation d’inéquations variationnelles par approximations successives
- Martinet
- 1970
(Show Context)
Citation Context ...tion. In the case where Ck is replaced by the positive definite matrix C and dk is replaced by d, the algorithm is a special case of the proximal point algorithm applied to monotone VIs (see Martinet =-=[Mar70]-=- and Rockafellar [Roc76]). A similar iteration based on Eq. (37) can be used in the more general case where ˆ R ̸= ℜ s . Another type of regularization approach for the case ˆ R = ℜ s , is to replace ... |

48 |
Solution of large-scale symmetric travelling salesman problems
- Grotschel, Holland
- 1991
(Show Context)
Citation Context ... aggregation (combining constraints), exploitation of special problem structure, and special types of basis functions that implicitly take into account the constraints; see [CaC05], [DFV04], [GKP03], =-=[GrH91]-=-, [MoK99], [PaT00], [ScP01], [TrZ93], [TrZ97], for related methods, analysis, and discussion of this issue in the context of approximate linear programming methods in DP and beyond. A possible alterna... |

40 | Temporal differences-based policy iteration and applications in neuro-dynamic programming
- Bertsekas, Ioffe
- 1996
(Show Context)
Citation Context ... a notion of residual known as temporal difference (TD). These algorithms include TD(λ) (proposed by Sutton [Sut88]), least squares policy evaluation (LSPE; originally proposed by Bertsekas and Ioffe =-=[BeI96]-=-, and followed up by Nedić and Bertsekas, [NeB03], Bertsekas, Borkar, and Nedić [BBN04], and Yu and Bertsekas1 [YuB06]), least squares temporal differences (LSTD; originally proposed by Bradtke and B... |

34 |
Projection Methods for Variational Inequalities with Application to the Traffic Assignment Problem
- Bertsekas, Gafni
- 1982
(Show Context)
Citation Context ... despite the lack of strong monotonicity of F , it turns out that this iteration is convergent in a way similar to the case where F is strongly monotone. In particular, a paper by Bertsekas and Gafni =-=[BeG82]-=-, devoted to the convergence analysis of iteration (11), has shown that there exists ¯γ > 0 such that rk → r ∗ linearly for each γ ∈ (0, ¯γ], where r ∗ is some solution of f(Φr ∗ ) ′ Φ(r − r ∗ ) ≥ 0, ... |

33 | A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning. Discrete Event Dynamic Systems
- Choi, Roy
- 2006
(Show Context)
Citation Context ...nces (LSTD; originally proposed by Bradtke and Barto [BrB96], and followed up by Boyan [Boy02], and Nedić and Bertsekas [NeB03]), and the Fixed Point Kalman Filter (FPKF; proposed by Choi and Van Roy =-=[ChV06]-=-). The first three of these algorithms have been recently extended for approximate solution of general linear fixed point problems (see Bertsekas and Yu [BeY08]). There are also extensions to some non... |

30 | R.: Direct value-approximation for factored MDPs
- Schuurmans, Patrascu
(Show Context)
Citation Context ...straints), exploitation of special problem structure, and special types of basis functions that implicitly take into account the constraints; see [CaC05], [DFV04], [GKP03], [GrH91], [MoK99], [PaT00], =-=[ScP01]-=-, [TrZ93], [TrZ97], for related methods, analysis, and discussion of this issue in the context of approximate linear programming methods in DP and beyond. A possible alternative is to eliminate the co... |

29 | Spline approximations to value functions: A linear programming approach
- Trick, Zin
- 1997
(Show Context)
Citation Context ...ation of special problem structure, and special types of basis functions that implicitly take into account the constraints; see [CaC05], [DFV04], [GKP03], [GrH91], [MoK99], [PaT00], [ScP01], [TrZ93], =-=[TrZ97]-=-, for related methods, analysis, and discussion of this issue in the context of approximate linear programming methods in DP and beyond. A possible alternative is to eliminate the constraint x ∈ X by ... |

26 | P.R.: New linear program performance bound for queuing networks
- Morrison, Kumar
- 1999
(Show Context)
Citation Context ...ion (combining constraints), exploitation of special problem structure, and special types of basis functions that implicitly take into account the constraints; see [CaC05], [DFV04], [GKP03], [GrH91], =-=[MoK99]-=-, [PaT00], [ScP01], [TrZ93], [TrZ97], for related methods, analysis, and discussion of this issue in the context of approximate linear programming methods in DP and beyond. A possible alternative is t... |

25 | Improved temporal difference methods with linear function approximation
- Bertsekas, Borkar, et al.
- 2004
(Show Context)
Citation Context ... (proposed by Sutton [Sut88]), least squares policy evaluation (LSPE; originally proposed by Bertsekas and Ioffe [BeI96], and followed up by Nedić and Bertsekas, [NeB03], Bertsekas, Borkar, and Nedić =-=[BBN04]-=-, and Yu and Bertsekas1 [YuB06]), least squares temporal differences (LSTD; originally proposed by Bradtke and Barto [BrB96], and followed up by Boyan [Boy02], and Nedić and Bertsekas [NeB03]), and t... |

25 |
On the existence of fixed points for approximate value iteration and temporal-difference learning
- Farias, Roy
(Show Context)
Citation Context ... the Markov chain. When these algorithms are extended to solve nonlinear versions of Bellman’s equation, they become unreliable because in the nonlinear context, ΠT need not be a contraction [BeT96], =-=[DFV00]-=- (a notable exception is optimal stopping problems, as shown by Tsitsiklis and Van Roy [TsV97], [TsV99b]; see also Yu and Bertsekas [YuB07]). B. Galerkin Approximation This is an older methodology, wh... |

21 | Projected Equation Methods for Approximate Solution of Large Linear Systems
- Bertsekas, Yu
(Show Context)
Citation Context ...ter (FPKF; proposed by Choi and Van Roy [ChV06]). The first three of these algorithms have been recently extended for approximate solution of general linear fixed point problems (see Bertsekas and Yu =-=[BeY08]-=-). There are also extensions to some nonlinear fixed point problems arising in optimal stopping (see Tsitsiklis and Van Roy [TsV99b], and Yu and Bertsekas [YuB07]), and more general contexts (Choi and... |

17 | Convergence results for some temporal difference methods based on least squares
- Yu, Bertsekas
- 2006
(Show Context)
Citation Context ...east squares policy evaluation (LSPE; originally proposed by Bertsekas and Ioffe [BeI96], and followed up by Nedić and Bertsekas, [NeB03], Bertsekas, Borkar, and Nedić [BBN04], and Yu and Bertsekas1 =-=[YuB06]-=-), least squares temporal differences (LSTD; originally proposed by Bradtke and Barto [BrB96], and followed up by Boyan [Boy02], and Nedić and Bertsekas [NeB03]), and the Fixed Point Kalman Filter (FP... |

17 | A least squares Q-learning algorithm for optimal stopping problems - Yu, Bertsekas - 2007 |

16 | New Error Bounds for Approximations from Projected Linear Equations,” Lab. for Information and Decision Systems Report LIDS-P-2797
- Yu, Bertsekas
- 2008
(Show Context)
Citation Context ...teresting research question is to develop related error bounds for cases where ΠT is not a contraction. Such bounds are available for particular cases (de Farias and Van Roy [DFV03], Yu and Bertsekas =-=[YuB08]-=-). We will now show that contraction properties of T or ΠT imply that f is strongly monotone over ˆ S, in the sense that for some scalar β > 0, we have ( f(x1) − f(x2) ) ′ (x1 − x2) ≥ β‖x1 − x2‖ 2 , ∀... |

12 | S.: Adaptive importance sampling technique for Markov chains using stochastic approximation
- Ahamed, Borkar, et al.
- 2006
(Show Context)
Citation Context ... is usually chosen to be the transition matrix of the Markov chain used for row sampling, but this need not be the case, as has been suggested by several authors, including Ahamed, Borkar, and Juneja =-=[ABJ06]-=-, within a specialized approximate DP context, and Bertsekas and Yu [BeY08] for the general context of approximate solution of projected equations. The covariance of the simulation error (Ck−C, dk−d) ... |

8 |
A Linear Programming Approach to Solving Dynamic Programs.” Working Paper
- TRICK, S
- 1993
(Show Context)
Citation Context ..., exploitation of special problem structure, and special types of basis functions that implicitly take into account the constraints; see [CaC05], [DFV04], [GKP03], [GrH91], [MoK99], [PaT00], [ScP01], =-=[TrZ93]-=-, [TrZ97], for related methods, analysis, and discussion of this issue in the context of approximate linear programming methods in DP and beyond. A possible alternative is to eliminate the constraint ... |

5 |
algebra and its applications 4th ed
- Strang
- 2006
(Show Context)
Citation Context ...e, the equation approximation approach still applies. In particular, the vector ΦC + d is still a solution of the projected equation x = ΠT (x), where C + denotes pseudoinverse of C (see e.g., Strang =-=[Str05]-=-, Ch. 7). Furthermore, ΦC + k dk converges to that solution, provided Ck → C and dk → d. Let us also note a similar LSTD-type algorithm for the more general case where the projection is onto a polyhed... |

2 |
Approximate Solution of Operator Equations, translated by Louvish, D. Wolters-Noordhoff Pub
- Krasnoselskii
- 1972
(Show Context)
Citation Context ...ected equation is closely related to Galerkin and finite-element methods for the solution of large-scale computation problems arising in partial differential equations and related contexts (see e.g., =-=[Kra72]-=-, [Fle84]). For fixed point problems, these methods solve a projected equation, where the projection is on a subspace of basis functions. However, the use of the Monte-Carlo simulation ideas that are ... |