## The linear programming approach to approximate dynamic programming (2001)

Venue: | Operations Research |

Citations: | 142 - 16 self |

### BibTeX

@ARTICLE{Farias01thelinear,

author = {D. P. De Farias and B. Van Roy},

title = {The linear programming approach to approximate dynamic programming},

journal = {Operations Research},

year = {2001},

volume = {51},

pages = {2003}

}

### Years of Citing Articles

### OpenURL

### Abstract

The curse of dimensionality gives rise to prohibitive computational requirements that render infeasible the exact solution of large-scale stochastic control problems. We study an efficient method based on linear programming for approximating solutions to such problems. The approach “fits ” a linear combination of pre-selected basis functions to the dynamic programming cost-to-go function. We develop error bounds that offer performance guarantees and also guide the selection of both basis functions and “state-relevance weights ” that influence quality of the approximation. Experimental results in the domain of queueing network control provide empirical support for the methodology. (Dynamic programming/optimal control: approximations/large-scale problems. Queues, algorithms: control of queueing networks.)

### Citations

4848 |
Neural Networks for Pattern Recognition
- Bishop
- 1995
(Show Context)
Citation Context ...esent approximate cost-to-go functions. It may be interesting to explore algorithms using nonlinear representations. Alternative representations encountered in the literature include neural networks (=-=Bishop 1995-=-, Haykin 1994) and splines (Chen et al. 1999, Trick and Zin 1997), among others. APPENDIX A: PROOFS Lemma 1. A vector ˜r solves max c T �r� s�t� T�r � �r� if and only if it solves min �J ∗ − �r�1�c� s... |

1231 | Learning to predict by the method of temporal differences
- Sutton
- 1988
(Show Context)
Citation Context ...te-relevance weights” for the approximate LP. An alternative to the approximate LP are temporaldifference (TD) learning methods (Bertsekas and Tsitsiklis 1996; Dayan 1992; de Farias and Van Roy 2000; =-=Sutton 1988-=-; Sutton and Barto 1998; Tsitsiklis and Van Roy 1997; Van Roy 1998, 2000). In such methods, one tries to find a fixed point for an “approximate dynamic programming operator” by simulating the system a... |

462 |
Dynamic programming and optimal control. Athena Scientific
- Bertsekas
- 1995
(Show Context)
Citation Context ...stem under the optimal policy is indeed stable—that should generally be the case if the discount factor is large. For a queue with infinite buffer the optimal service rate q�x� is nondecreasing in x (=-=Bertsekas 1995-=-), and stability therefore implies that q�x� � q�x 0�>p for all x � x 0 and some sufficiently large x 0. It is easy then to verify that the tail of the steady-state distribution has an upper bound wit... |

373 |
Temporal difference learning and TD-Gammon
- Tesauro
- 1995
(Show Context)
Citation Context ...ijk and Kallenberg 1979, Manne 1960). Over the years, interest in approximate dynamic programming has been fueled to a large extent by stories of empirical success in applications such as backgammon (=-=Tesauro 1995-=-), job shop scheduling (Zhang and Dietterich 1996), elevator scheduling (Crites and Barto 1996), and 0030-364X/03/5106-0850 1526-5463 electronic ISSNspricing of American options (Longstaff and Schwart... |

293 | Valuing American options by simulation: A simple least-squares approach
- Longstaff, Schwartz
- 2001
(Show Context)
Citation Context ...kgammon (Tesauro 1995), job shop scheduling (Zhang and Dietterich 1996), elevator scheduling (Crites and Barto 1996), and 0030-364X/03/5106-0850 1526-5463 electronic ISSNspricing of American options (=-=Longstaff and Schwartz 2001-=-, Tsitsiklis and Van Roy 2001). These case studies point toward approximate dynamic programming as a potentially powerful tool for large-scale stochastic control. However, significant trial and error ... |

279 | Improving elevator performance using reinforcement learning
- Crites, Barto
- 1996
(Show Context)
Citation Context ... programming has been fueled to a large extent by stories of empirical success in applications such as backgammon (Tesauro 1995), job shop scheduling (Zhang and Dietterich 1996), elevator scheduling (=-=Crites and Barto 1996-=-), and 0030-364X/03/5106-0850 1526-5463 electronic ISSNspricing of American options (Longstaff and Schwartz 2001, Tsitsiklis and Van Roy 2001). These case studies point toward approximate dynamic prog... |

217 | An analysis of temporal-difference learning with function approximation (Technical Report LIDS-P-2322 - Tsitsiklis, Roy - 1996 |

130 | R.: Efficient solution algorithms for factored MDPs
- Guestrin, Koller, et al.
(Show Context)
Citation Context ...ing use of constraint generation methods (e.g., Grötschel and Holland 1991, Schuurmans and Patrascu 2001) or structure allowing constraints to be represented compactly (e.g., Morrison and Kumar 1999, =-=Guestrin et al. 2002-=-). In the next four sections, we assume that the approximate LP can be solved, and we study the quality of the solution as an approximation to the cost-to-go function. 3. THE IMPORTANCE OF STATE-RELEV... |

123 | Tsitsiklis, “Congestion-dependent pricing of network services - Paschalidis, N |

100 | Dynamic instabilities and stabilization methods in distributed real-time scheduling of manufacturing systems - Kumar, Seidman - 1990 |

97 | Generalized polynomial approximations in Markovian decision processes - Schweitzer, Seidmann - 1985 |

96 | Ergodicity of stochastic processes describing the operation of open queueing networks - Rybko, Stolyar - 1992 |

66 | Approximate Solutions to Markov Decision Processes - Gordon - 1999 |

61 |
The convergence of td() for general
- Dayan
- 1992
(Show Context)
Citation Context ...ines in the choice of the so-called “state-relevance weights” for the approximate LP. An alternative to the approximate LP are temporaldifference (TD) learning methods (Bertsekas and Tsitsiklis 1996; =-=Dayan 1992-=-; de Farias and Van Roy 2000; Sutton 1988; Sutton and Barto 1998; Tsitsiklis and Van Roy 1997; Van Roy 1998, 2000). In such methods, one tries to find a fixed point for an “approximate dynamic program... |

50 |
Reinforcement Learning I: Introduction
- S, Barto
- 1998
(Show Context)
Citation Context ...r the approximate LP. An alternative to the approximate LP are temporaldifference (TD) learning methods (Bertsekas and Tsitsiklis 1996; Dayan 1992; de Farias and Van Roy 2000; Sutton 1988; Sutton and =-=Barto 1998-=-; Tsitsiklis and Van Roy 1997; Van Roy 1998, 2000). In such methods, one tries to find a fixed point for an “approximate dynamic programming operator” by simulating the system and learning from the ob... |

48 |
Solution of large-scale symmetric travelling salesman problems
- Grotschel, Holland
- 1991
(Show Context)
Citation Context ...e for alleviating the need to consider all constraints. Examples include heuristics presented in Trick and Zin (1993) and problemspecific approaches making use of constraint generation methods (e.g., =-=Grötschel and Holland 1991-=-, Schuurmans and Patrascu 2001) or structure allowing constraints to be represented compactly (e.g., Morrison and Kumar 1999, Guestrin et al. 2002). In the next four sections, we assume that the appro... |

46 |
Linear programming and sequential decisions
- Manne
- 1960
(Show Context)
Citation Context ...chweitzer and Seidman (1985), that generalizes the linear programming approach to exact dynamic programming (Borkar 1988, De Ghellinck 1960, Denardo 1970, D’Epenoux 1963, Hordijk and Kallenberg 1979, =-=Manne 1960-=-). Over the years, interest in approximate dynamic programming has been fueled to a large extent by stories of empirical success in applications such as backgammon (Tesauro 1995), job shop scheduling ... |

36 | Learning and value function approximation in complex decision processes - Roy - 1998 |

35 | A probabilistic production and inventory problem - d’Epenoux - 1963 |

30 | R.: Direct value-approximation for factored MDPs - Schuurmans, Patrascu |

29 | Value iteration and optimization of multiclass queueing networks - Chen, Meyn - 1998 |

28 |
Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming
- Chen, Ruppert, et al.
- 1999
(Show Context)
Citation Context ... It may be interesting to explore algorithms using nonlinear representations. Alternative representations encountered in the literature include neural networks (Bishop 1995, Haykin 1994) and splines (=-=Chen et al. 1999-=-, Trick and Zin 1997), among others. APPENDIX A: PROOFS Lemma 1. A vector ˜r solves max c T �r� s�t� T�r � �r� if and only if it solves min �J ∗ − �r�1�c� s�t� T�r � �r� Proof. It is well known that t... |

26 | P.R.: New linear program performance bound for queuing networks
- Morrison, Kumar
- 1999
(Show Context)
Citation Context ...emspecific approaches making use of constraint generation methods (e.g., Grötschel and Holland 1991, Schuurmans and Patrascu 2001) or structure allowing constraints to be represented compactly (e.g., =-=Morrison and Kumar 1999-=-, Guestrin et al. 2002). In the next four sections, we assume that the approximate LP can be solved, and we study the quality of the solution as an approximation to the cost-to-go function. 3. THE IMP... |

21 | Performance of multiclass Markovian queueing networks via piecewise linear Lyapunov functions - Bertsimas, Gamarnik, et al. |

14 |
A convex analytic approach to Markov decision processes", Probab
- Borkar
- 1988
(Show Context)
Citation Context ...e algorithm we study is based on a linear programming formulation, originally proposed by Schweitzer and Seidman (1985), that generalizes the linear programming approach to exact dynamic programming (=-=Borkar 1988-=-, De Ghellinck 1960, Denardo 1970, D’Epenoux 1963, Hordijk and Kallenberg 1979, Manne 1960). Over the years, interest in approximate dynamic programming has been fueled to a large extent by stories of... |

13 |
Linear programing and Markov decision chains
- Hordijk, LCM
- 1979
(Show Context)
Citation Context ...ion, originally proposed by Schweitzer and Seidman (1985), that generalizes the linear programming approach to exact dynamic programming (Borkar 1988, De Ghellinck 1960, Denardo 1970, D’Epenoux 1963, =-=Hordijk and Kallenberg 1979-=-, Manne 1960). Over the years, interest in approximate dynamic programming has been fueled to a large extent by stories of empirical success in applications such as backgammon (Tesauro 1995), job shop... |

10 |
On linear programming in a Markov decision problem
- Denardo
- 1970
(Show Context)
Citation Context ... a linear programming formulation, originally proposed by Schweitzer and Seidman (1985), that generalizes the linear programming approach to exact dynamic programming (Borkar 1988, De Ghellinck 1960, =-=Denardo 1970-=-, D’Epenoux 1963, Hordijk and Kallenberg 1979, Manne 1960). Over the years, interest in approximate dynamic programming has been fueled to a large extent by stories of empirical success in application... |

8 | A Linear Programming Approach to Solving Dynamic Programs.” Working Paper - TRICK, S - 1993 |

5 | Les problèmes de décision séquentielle. Cahiers du Centre dÉtudes de Recherche Opérationnelle - Ghellinck - 1960 |

2 | On the existence of fixed points for appproximate value iteration and temporaldifference learning - Farias, Roy - 2000 |

1 |
Neural Networks: A Comprehensive Formulation
- Haykin
- 1994
(Show Context)
Citation Context ...mate cost-to-go functions. It may be interesting to explore algorithms using nonlinear representations. Alternative representations encountered in the literature include neural networks (Bishop 1995, =-=Haykin 1994-=-) and splines (Chen et al. 1999, Trick and Zin 1997), among others. APPENDIX A: PROOFS Lemma 1. A vector ˜r solves max c T �r� s�t� T�r � �r� if and only if it solves min �J ∗ − �r�1�c� s�t� T�r � �r�... |