#### DMCA

## The linear programming approach to approximate dynamic programming (2001)

Venue: | Operations Research |

Citations: | 221 - 16 self |

### Citations

6460 |
Neural networks and pattern recognition
- Bishop
- 1995
(Show Context)
Citation Context ...esent approximate cost-to-go functions. It may be interesting to explore algorithms using nonlinear representations. Alternative representations encountered in the literature include neural networks (=-=Bishop 1995-=-, Haykin 1994) and splines (Chen et al. 1999, Trick and Zin 1997), among others. APPENDIX A: PROOFS Lemma 1. A vector ˜r solves max c T �r� s�t� T�r � �r� if and only if it solves min �J ∗ − �r�1�c� s... |

1488 | Learning to predict by the methods of temporal differences
- Sutton
- 1988
(Show Context)
Citation Context ...te-relevance weights” for the approximate LP. An alternative to the approximate LP are temporaldifference (TD) learning methods (Bertsekas and Tsitsiklis 1996; Dayan 1992; de Farias and Van Roy 2000; =-=Sutton 1988-=-; Sutton and Barto 1998; Tsitsiklis and Van Roy 1997; Van Roy 1998, 2000). In such methods, one tries to find a fixed point for an “approximate dynamic programming operator” by simulating the system a... |

712 |
Dynamic Programming and Optimal Control. Athena Scientific
- Bertsekas
- 1995
(Show Context)
Citation Context ...stem under the optimal policy is indeed stable—that should generally be the case if the discount factor is large. For a queue with infinite buffer the optimal service rate q�x� is nondecreasing in x (=-=Bertsekas 1995-=-), and stability therefore implies that q�x� � q�x 0�>p for all x � x 0 and some sufficiently large x 0. It is easy then to verify that the tail of the steady-state distribution has an upper bound wit... |

498 | Valuing American options by simulation: a simple least-squares approach
- Longstaff, Schwartz
- 2001
(Show Context)
Citation Context ...kgammon (Tesauro 1995), job shop scheduling (Zhang and Dietterich 1996), elevator scheduling (Crites and Barto 1996), and 0030-364X/03/5106-0850 1526-5463 electronic ISSNspricing of American options (=-=Longstaff and Schwartz 2001-=-, Tsitsiklis and Van Roy 2001). These case studies point toward approximate dynamic programming as a potentially powerful tool for large-scale stochastic control. However, significant trial and error ... |

477 |
Temporal difference learning and td-gammon
- Tesauro
- 1995
(Show Context)
Citation Context ...ijk and Kallenberg 1979, Manne 1960). Over the years, interest in approximate dynamic programming has been fueled to a large extent by stories of empirical success in applications such as backgammon (=-=Tesauro 1995-=-), job shop scheduling (Zhang and Dietterich 1996), elevator scheduling (Crites and Barto 1996), and 0030-364X/03/5106-0850 1526-5463 electronic ISSNspricing of American options (Longstaff and Schwart... |

321 | Improving elevator performance using reinforcement learning
- Crites, Barto
- 1996
(Show Context)
Citation Context ... programming has been fueled to a large extent by stories of empirical success in applications such as backgammon (Tesauro 1995), job shop scheduling (Zhang and Dietterich 1996), elevator scheduling (=-=Crites and Barto 1996-=-), and 0030-364X/03/5106-0850 1526-5463 electronic ISSNspricing of American options (Longstaff and Schwartz 2001, Tsitsiklis and Van Roy 2001). These case studies point toward approximate dynamic prog... |

309 | An analysis of temporal-difference learning with function approximation - Tsitsiklis, Roy - 1997 |

170 | Efficient solution algorithms for factored MDPs
- Guestrin, Koller, et al.
- 2003
(Show Context)
Citation Context ...ing use of constraint generation methods (e.g., Grötschel and Holland 1991, Schuurmans and Patrascu 2001) or structure allowing constraints to be represented compactly (e.g., Morrison and Kumar 1999, =-=Guestrin et al. 2002-=-). In the next four sections, we assume that the approximate LP can be solved, and we study the quality of the solution as an approximation to the cost-to-go function. 3. THE IMPORTANCE OF STATE-RELEV... |

154 | Congestion-dependent pricing of network services - PASCHALIDIS, TSITSIKILIS - 1998 |

139 | Ergodicity of stochastic processes describing the operation of open queueing networks - Rybko, Stolyar - 1992 |

134 | Dynamic instabilities and stabilization methods in distributed real-time scheduling of manufacturing systems - Kumar, Seidman - 1990 |

134 | Generalized polynomial approximations in Markovian decision processes - Schweitzer, Seidmann - 1985 |

97 |
Reinforcement Learning: An Introduction
- Shutton, Barto
- 1998
(Show Context)
Citation Context ...r the approximate LP. An alternative to the approximate LP are temporaldifference (TD) learning methods (Bertsekas and Tsitsiklis 1996; Dayan 1992; de Farias and Van Roy 2000; Sutton 1988; Sutton and =-=Barto 1998-=-; Tsitsiklis and Van Roy 1997; Van Roy 1998, 2000). In such methods, one tries to find a fixed point for an “approximate dynamic programming operator” by simulating the system and learning from the ob... |

77 | Approximate solutions to Markov decision processes - Gordon - 1999 |

73 |
Linear programming and sequential decisions
- Manne
- 1960
(Show Context)
Citation Context ...chweitzer and Seidman (1985), that generalizes the linear programming approach to exact dynamic programming (Borkar 1988, De Ghellinck 1960, Denardo 1970, D’Epenoux 1963, Hordijk and Kallenberg 1979, =-=Manne 1960-=-). Over the years, interest in approximate dynamic programming has been fueled to a large extent by stories of empirical success in applications such as backgammon (Tesauro 1995), job shop scheduling ... |

62 |
The convergence of TD() for general
- Dayan
- 1992
(Show Context)
Citation Context ...ines in the choice of the so-called “state-relevance weights” for the approximate LP. An alternative to the approximate LP are temporaldifference (TD) learning methods (Bertsekas and Tsitsiklis 1996; =-=Dayan 1992-=-; de Farias and Van Roy 2000; Sutton 1988; Sutton and Barto 1998; Tsitsiklis and Van Roy 1997; Van Roy 1998, 2000). In such methods, one tries to find a fixed point for an “approximate dynamic program... |

55 |
Solution of large-scale symmetric travelling salesman problems
- Grötschel, Holland
(Show Context)
Citation Context ...e for alleviating the need to consider all constraints. Examples include heuristics presented in Trick and Zin (1993) and problemspecific approaches making use of constraint generation methods (e.g., =-=Grötschel and Holland 1991-=-, Schuurmans and Patrascu 2001) or structure allowing constraints to be represented compactly (e.g., Morrison and Kumar 1999, Guestrin et al. 2002). In the next four sections, we assume that the appro... |

47 | A probabilistic production and inventory problem (English translation - d'Epenoux - 1963 |

41 | Value iteration and optimization of multiclass queueing networks, Queueing Systems Theory and Applications - Chen, Meyn - 1999 |

41 | Learning and Value Functions Approximation in Complex Decision Processes - Roy - 1998 |

37 | New linear program performance bounds for queueing networks
- Kumar, Morrison
(Show Context)
Citation Context ...emspecific approaches making use of constraint generation methods (e.g., Grötschel and Holland 1991, Schuurmans and Patrascu 2001) or structure allowing constraints to be represented compactly (e.g., =-=Morrison and Kumar 1999-=-, Guestrin et al. 2002). In the next four sections, we assume that the approximate LP can be solved, and we study the quality of the solution as an approximation to the cost-to-go function. 3. THE IMP... |

35 | Performance of multiclass markovian queueing networks via piecewise linear lyapunov functions. Forthcoming - Bertsimas, Gamarnik, et al. - 2000 |

34 |
Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming
- Ruppert, Shoemaker
- 1999
(Show Context)
Citation Context ... It may be interesting to explore algorithms using nonlinear representations. Alternative representations encountered in the literature include neural networks (Bishop 1995, Haykin 1994) and splines (=-=Chen et al. 1999-=-, Trick and Zin 1997), among others. APPENDIX A: PROOFS Lemma 1. A vector ˜r solves max c T �r� s�t� T�r � �r� if and only if it solves min �J ∗ − �r�1�c� s�t� T�r � �r� Proof. It is well known that t... |

32 | Direct value-approximation for factored MDPs - Schuurmans, Patrascu - 2001 |

22 |
A Convex Analytic Approach to Markov Decision Processes. Probability Theory and Related Fields
- Borkar
- 1988
(Show Context)
Citation Context ...e algorithm we study is based on a linear programming formulation, originally proposed by Schweitzer and Seidman (1985), that generalizes the linear programming approach to exact dynamic programming (=-=Borkar 1988-=-, De Ghellinck 1960, Denardo 1970, D’Epenoux 1963, Hordijk and Kallenberg 1979, Manne 1960). Over the years, interest in approximate dynamic programming has been fueled to a large extent by stories of... |

18 |
Linear programming and Markov decision chains
- Hordijk, Kallenberg
- 1979
(Show Context)
Citation Context ...ion, originally proposed by Schweitzer and Seidman (1985), that generalizes the linear programming approach to exact dynamic programming (Borkar 1988, De Ghellinck 1960, Denardo 1970, D’Epenoux 1963, =-=Hordijk and Kallenberg 1979-=-, Manne 1960). Over the years, interest in approximate dynamic programming has been fueled to a large extent by stories of empirical success in applications such as backgammon (Tesauro 1995), job shop... |

14 |
On Linear Programming in a Markov Decision Problem
- Denardo
- 1970
(Show Context)
Citation Context ... a linear programming formulation, originally proposed by Schweitzer and Seidman (1985), that generalizes the linear programming approach to exact dynamic programming (Borkar 1988, De Ghellinck 1960, =-=Denardo 1970-=-, D’Epenoux 1963, Hordijk and Kallenberg 1979, Manne 1960). Over the years, interest in approximate dynamic programming has been fueled to a large extent by stories of empirical success in application... |

9 | A linear programming approach to solving dynamic programs. Unpublished manuscript - Trick, Zin - 1993 |

6 | Les problemes de decisions sequentielles. Cahiers du Centre d’Etudes de Recherche - Ghellinck - 1960 |

2 | On the existence of fixed points for appproximate value iteration and temporaldifference learning - Farias, Roy - 2000 |

1 |
Neural Networks: A Comprehensive Formulation
- Haykin
- 1994
(Show Context)
Citation Context ...mate cost-to-go functions. It may be interesting to explore algorithms using nonlinear representations. Alternative representations encountered in the literature include neural networks (Bishop 1995, =-=Haykin 1994-=-) and splines (Chen et al. 1999, Trick and Zin 1997), among others. APPENDIX A: PROOFS Lemma 1. A vector ˜r solves max c T �r� s�t� T�r � �r� if and only if it solves min �J ∗ − �r�1�c� s�t� T�r � �r�... |