## New Error Bounds for Approximations from Projected Linear Equations (2008)

### Cached

### Download Links

Citations: | 16 - 9 self |

### BibTeX

@MISC{Yu08newerror,

author = {Huizhen Yu and Dimitri P. Bertsekas},

title = {New Error Bounds for Approximations from Projected Linear Equations},

year = {2008}

}

### OpenURL

### Abstract

We consider linear fixed point equations and their approximations by projection on a low dimensional subspace. We derive new bounds on the approximation error of the solution, which are expressed in terms of low dimensional matrices and can be computed by simulation. When the fixed point mapping is a contraction, as is typically the case in Markovian decision processes (MDP), one of our bounds is always sharper than the standard worst case bounds, and another one is often sharper. Our bounds also apply to the non-contraction case, including policy evaluation in MDP with nonstandard projections that enhance exploration. There are no error

### Citations

4987 | Matrix Analysis - Horn, Johnson - 1985 |

1299 | Learning to predict by the methods of temporal differences
- Sutton
- 1988
(Show Context)
Citation Context ...while k << n. This approach is common in approximate dynamic programming (DP) for Markov decision processes (MDP), and has been central in much of the recent research on the subject (see e.g., Sutton =-=[14]-=-, Tsitsiklis and Van Roy [17], Bertsekas and Tsitsiklis [3], Sutton and Barto [15], Bertsekas [2]). Let us give the background of two important applications in this context. For policy iteration algor... |

918 | Reinforcement Learning
- Sutton, Barto
- 1998
(Show Context)
Citation Context ...Markov decision processes (MDP), and has been central in much of the recent research on the subject (see e.g., Sutton [14], Tsitsiklis and Van Roy [17], Bertsekas and Tsitsiklis [3], Sutton and Barto =-=[15]-=-, Bertsekas [2]). Let us give the background of two important applications in this context. For policy iteration algorithms, the evaluation of the cost vector of a fixed policy requires solution of th... |

794 | Nonlinear Programming. Athena Scientific - Bertsekas - 1999 |

351 | Dynamic Programming and Optimal Control - Bertsekas - 1995 |

238 | An analysis of temporal-difference learning with function approximation
- Tsitsiklis, Roy
- 1997
(Show Context)
Citation Context ...s common in approximate dynamic programming (DP) for Markov decision processes (MDP), and has been central in much of the recent research on the subject (see e.g., Sutton [14], Tsitsiklis and Van Roy =-=[17]-=-, Bertsekas and Tsitsiklis [3], Sutton and Barto [15], Bertsekas [2]). Let us give the background of two important applications in this context. For policy iteration algorithms, the evaluation of the ... |

190 | Linear least-squares algorithms for temporal difference learning
- Bradtke, Barto
- 1996
(Show Context)
Citation Context ... and 2.2. The matrix in the first bound is easy to compute for all TD-type methods, and in fact it can be readily computed as a byproduct of least-squares based TD algorithms (e.g., Bradtke and Barto =-=[6]-=-, Boyan [5], Nedić and Bertsekas [13], Bertsekas and Yu [4]). The second bound is sharper than the first. It is in fact tight for a worst 1 1case choice of b; see Prop. 2.1 and Remark 2.3. Computing t... |

187 | Actor-Critic algorithms
- Konda, Tsitsiklis
- 2000
(Show Context)
Citation Context ... Projected Linear Equations 16 In addition, basis vectors of ̂ S can also be generated from Φ by using simulation (we estimate the “mean feature,” ξ ′ Φ, and subtract it from the rows of Φ; see e.g., =-=[Kon02]-=-), along with the approximation of the matrices B and M and without incurring much computation overhead. Figure 4 illustrates the error bounds, and shows how the use of ̂ S may improve them. It can be... |

131 | Dynamic Programming and Optimal - Bertsekas - 1995 |

100 | Least-squares temporal difference learning
- Boyan
- 1999
(Show Context)
Citation Context ...the availability of computable error bounds for non-contraction mappings facilitates the design of policy evaluation algorithms with improved exploration. In particular, we can use the LSTD algorithm =-=[Boy99]-=- to evaluate the cost or the Q-factor of a policy using special sampling methods that enhance exploration, and use the bound of Theorem 1 to estimate the corresponding amplification ratio. 4 Alternati... |

94 | Optimal Stopping of Markov Processes: Hilbert Space Theory, Approximation Algorithms, and an Application to Pricing HighDimensional Financial Derivatives
- Tsitsiklis, Roy
- 1999
(Show Context)
Citation Context ...age costs associated with stopping, and the minimization in min{c, x} is component-wise. Let ξ be the invariant distribution of the Markov chain. Algorithms analogous to TD(0) (Tsitsiklis and Van Roy =-=[19]-=-, Choi and Van Roy [7], Yu and Bertsekas [21, 22]) solve the projected Bellman equation, x = Πg + αΠP min{c, x}, which is also nonlinear and has a unique solution ¯x due to the contraction property of... |

65 | Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems: Theory and Applications
- Nedić, Bertsekas
- 2003
(Show Context)
Citation Context ...und is easy to compute for all TD-type methods, and in fact it can be readily computed as a byproduct of least-squares based TD algorithms (e.g., Bradtke and Barto [6], Boyan [5], Nedić and Bertsekas =-=[13]-=-, Bertsekas and Yu [4]). The second bound is sharper than the first. It is in fact tight for a worst 1 1case choice of b; see Prop. 2.1 and Remark 2.3. Computing the matrix in 1 the bound using simula... |

56 | Error bounds for approximate policy iteration
- Munos
- 2003
(Show Context)
Citation Context ...ul in the context of MDP for designing exploration mechanisms. Furthermore, in the context of MDP, these bounds can be used in performance bounds for approximate policy iteration, such as the ones of =-=[Mun03]-=-. One potential use of our bounds is to suggest changes in the projected equation in order to reduce the amplification ratio. For example, extensive computational experience with TD(λ) methods suggest... |

35 | A generalized kalman filter for fixed point approximation and efficient temporal-difference learning. Discrete Event Dynamic Systems
- Choi, Roy
- 2006
(Show Context)
Citation Context ...th stopping, and the minimization in min{c, x} is component-wise. Let ξ be the invariant distribution of the Markov chain. Algorithms analogous to TD(0) (Tsitsiklis and Van Roy [19], Choi and Van Roy =-=[7]-=-, Yu and Bertsekas [21, 22]) solve the projected Bellman equation, x = Πg + αΠP min{c, x}, which is also nonlinear and has a unique solution ¯x due to the contraction property of the mapping αΠP min{c... |

23 |
Dynamic Programming and Optimal Control, volume II. Athena Scienti 2nd edition
- Bertsekas
- 2001
(Show Context)
Citation Context ...2 H. Yu and D. P. Bertsekas first bound (see e.g., [2, 10]) holds if ‖ΠA‖ = α < 1 with respect to some norm ‖ · ‖, and has the form ‖x ∗ − ¯x‖ ≤ 1 1 − α ‖x∗ − Πx ∗ ‖ . (2) The second bound (see e.g., =-=[11, 1]-=-) holds in the usual case where ΠA is a contraction with respect to the Euclidean norm ‖ · ‖ξ, with ξ being the invariant distribution of the Markov chain underlying the problem, i.e., ‖ΠA‖ξ = α < 1. ... |

22 | Counter-Example to Temporal Differences Learning
- Bertsekas, P
- 1994
(Show Context)
Citation Context ...distance” ratio, ‖x∗ − ¯x‖ξ ‖x∗ − Πx∗ , ‖ξ ‖¯x − Πx∗‖ξ ‖x∗ − Πx∗ , ‖ξ respectively. We note that these two ratios can be large and a significant cause for concern, as illustrated by examples given in =-=[Ber95]-=- (see also [BT96, Ex. 6.5, pp. 288-289]). Figure 1 illustrates the relation between the bound, x ∗ and ¯x.The Author The Author July 9, July 19, 2008 July 200919, 2008 Error Bounds for Projected Line... |

22 | Projected equation methods for approximate solution of large linear systems
- Bertsekas, Yu
(Show Context)
Citation Context ...e e.g. Krasnose’skii et al. [11]. For example, important finite element and other methods for solving partial differential equations belong to the Galerkin class. In our recent paper Bertsekas and Yu =-=[4]-=-, we have extended TD-type methods to the case where A is an arbitrary matrix, subject only to the restriction that I − ΠA is invertible, using the Monte-Carlo simulation ideas that are central in app... |

19 | Average cost temporal-difference learning
- Tsitsiklis, Roy
- 1999
(Show Context)
Citation Context ...lis [3], Tsitsiklis and Van Roy [17]) holds if ‖ΠA‖ = α < 1 with respect to some norm ‖ · ‖, and has the form ‖x ∗ − ¯x‖ ≤ 1 1 − α ‖x∗ − Πx ∗ ‖. (3) The second bound (see e.g., Tsitsiklis and Van Roy =-=[18]-=-, Bertsekas [2]) holds in the case where ΠA is a contraction with respect to the Euclidean norm ‖ · ‖ξ, with ξ being the invariant distribution of the Markov chain underlying the problem, i.e., ‖ΠA‖ξ ... |

18 |
A least squares Q-learning algorithm for optimal stopping problems
- Yu, Bertsekas
- 2006
(Show Context)
Citation Context ...rror bounds with A = αP I¯x to bound ˆx − ¯x, once ¯x is computed, and consequently the matrices and vectors in Eq. (31) are available. The matrices in the bounds can be estimated similar to those in =-=[YB06]-=-. Thus the new error bounds can provide supplementary information about the approximation quality, in addition to the error bounds based on the contraction property [TV99b, Van07]. 3.2 Large General S... |

17 |
Approximate Solution of Operator Equations
- Stetsenko
- 1972
(Show Context)
Citation Context ...he projected equation approach of Eq. (1) belongs to the class of Galerkin methods, and finds broad application in the approximate solution of linear operator equations; see e.g. Krasnose’skii et al. =-=[11]-=-. For example, important finite element and other methods for solving partial differential equations belong to the Galerkin class. In our recent paper Bertsekas and Yu [4], we have extended TD-type me... |

10 | The many proofs of an identity on the Norm of oblique projections. Numerical Algorithms 42
- Szyld
- 2006
(Show Context)
Citation Context ...ere the matrix C is given in Eq. (47), exploit the idempotent property of C (i.e., C 2 = C) and its implication that ‖C −I‖ξ = ‖C‖ξ when C is neither the identity nor the zero matrix (see e.g., Szyld =-=[16]-=-), and then proceed with Lemma 2.2 to obtain the bound. The matrices ˜ E −1 and ˜ R have similar forms to the matrices B and R, respectively, with the matrix LΦ in place of the matrix Φ in B and R. Th... |

6 |
Q-Learning Algorithms for Optimal Stopping Based on Least Squares
- Yu, Bertsekas
- 2007
(Show Context)
Citation Context ...inimization in min{c, x} is component-wise. Let ξ be the invariant distribution of the Markov chain. Algorithms analogous to TD(0) (Tsitsiklis and Van Roy [19], Choi and Van Roy [7], Yu and Bertsekas =-=[21, 22]-=-) solve the projected Bellman equation, x = Πg + αΠP min{c, x}, which is also nonlinear and has a unique solution ¯x due to the contraction property of the mapping αΠP min{c, ·} with respect to ‖ · ‖ξ... |

5 |
stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives
- Optimal
- 1999
(Show Context)
Citation Context ...ear, x = Πg + αΠP min{c, x}. Based on the contraction property of the mapping αΠP min{c, ·} with respect to ‖ · ‖ξ, there is an error bound on the approximating solution ¯x analogous to the bound (4) =-=[TV99b]-=-: ‖¯x − x∗‖ξ ≤ 1 √ 1−α2 ‖x∗ − Πx∗‖ξ, and such error bound is also useful in bounding the performance of suboptimal policies constructed based on ¯x [Van07]. To apply our error bounds, we shall form a ... |

4 |
On regression-based stopping times. Discrete Event Dynam
- Roy
(Show Context)
Citation Context ...: �¯x − x ∗ � � ≤ �1/ √ 1 − � 2 ��x ∗ − �x ∗ � � (Tsitsiklis and Van Roy [19]), and such error bound is also useful in bounding the performance of suboptimal policies constructed based on ¯x (Van Roy =-=[20]-=-). To apply our error bounds, we will form a linear equation based on the approximating solution ¯x, which satisfies ¯x = �g + ��P min�c� ¯x� = �g + ��P�I − I ¯x�c + ��PI ¯x ¯x� (54) where I ¯x is an ... |

1 | On regression based stopping times - Roy - 2006 |

1 | On regression-based stopping times
- Roy
- 2007
(Show Context)
Citation Context ...he bound (4): ‖¯x − x∗ 1 ‖ξ ≤ √ 1−α2 ‖x∗ − Πx∗‖ξ (Tsitsiklis and Van Roy [19]), and such error bound is also useful in bounding the performance of suboptimal policies constructed based on ¯x (Van Roy =-=[20]-=-). To apply our error bounds, we will form a linear equation based on the approximating solution ¯x, which satisfies ¯x = Πg + αΠP min{c, ¯x} = Πg + αΠP (I − I¯x)c + αΠP I¯x¯x, (54) where I¯x is an n ... |

1 |
Approximate Solution ofOperator Equations. Wolters-Noordhoff
- Rutitskii, Stetsenko
- 1972
(Show Context)
Citation Context ...jected equation approach of Equation (1) belongs to the class of Galerkin methods and finds broad application in the approximate solution of linear operator equations; see, e.g., Krasnose’skii et al. =-=[11]-=-. For example, important finite element and other methods for solving partial differential equations belong to the Galerkin class. In our recent paper (Bertsekas and Yu [4]), we have extended TD-type ... |