## Feature-Based Methods For Large Scale Dynamic Programming (1994)

Venue: | Machine Learning |

Citations: | 134 - 6 self |

### BibTeX

@INPROCEEDINGS{Tsitsiklis94feature-basedmethods,

author = {John N. Tsitsiklis and Benjamin Van Roy},

title = {Feature-Based Methods For Large Scale Dynamic Programming},

booktitle = {Machine Learning},

year = {1994},

pages = {59--94}

}

### Years of Citing Articles

### OpenURL

### Abstract

We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be Combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations, that is, representations that involve an arbitrarily complex feature extraction stage and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. We also apply one of these algorithms to pro- duce a computer program that plays Tetris at a respectable skill level. Furthermore, we provide a counterexample illustrating the difficulties of integrating compact representations and dynamic programming: which exemplifies the shortcomings of several methods in current practice, including Q-learning and temporal-difference learning.

### Citations

1225 | Learning to predict by the method of temporal differences
- Sutton
- 1988
(Show Context)
Citation Context ... functions using linear combinations of fixed sets of basis functions. More recently, reinforcement learning researchers have developed a number of approaches, including temporal-difference learning [=-=Sutton, 1988-=-] and Q-learning [Watkins and Dayan, 1992], which have been used for dynamic programming with many types of compact representation, especially artificial neural networks. Aside from the work of Whitt ... |

637 |
Networks for approximation and learning
- Poggio, Girosi
- 1990
(Show Context)
Citation Context ...ations consisting of linear combinations of localized basis functions have attracted considerable interest as general architectures for function approximation. Two examples are radial basis function (=-=Poggio and Girosi, 1990-=-) and wavelet networks (Bakshi and Stephanopoulos, 1993). With these representations, states are typically contained in a Euclidean space Nd (typically forming a finite grid). Let us continue to view ... |

456 |
Dynamic Programming and Optimal Control,” Athena Scientific
- Bertsekas
- 1995
(Show Context)
Citation Context ...ion making under uncertainty (stochastic control) have been studied extensively in the operations research and control theory literature for a long time, using the methodology of dynamic programming (=-=Bertsekas, 1995-=-). The "planning problems" studied by the artificial intelligence community are of a related nature although, until recently, this was mostly done in a deterministic setting leading to search or short... |

373 |
Dynamic Programming: Deterministic and Stochastic Models
- Bertsekas
- 1987
(Show Context)
Citation Context ...ion making under uncertainty (stochastic control) have been studied extensively in the operations research and control theory literature for a long time, using the methodology of dynamic programming [=-=Bertsekas, 1987]. The &qu-=-ot;planning problems" studied by the artificial intelligence community are of a related nature although, until recently, this was mostly done in a deterministic setting leading to search or short... |

366 | Practical issues in temporal difference learning - Tesauro - 1992 |

207 | Stable Function Approximation in Dynamic Programming - Gordon - 1995 |

206 | On the convergence of stochastic iterative dynamic programming algorithms
- Jaakkola, Jordan, et al.
- 1994
(Show Context)
Citation Context ...rithm corresponds to a stochastic approximation involving a maximum norm contraction, and then appeal to a theorem concerning asynchronous stochastic approximation due to Tsitsiklis (1994) (see also (=-=Jaakola, Jordan, and Singh, 1994-=-)), which is discussed in Appendix B, and a theorem concerning multi-representation contractions presented and proven in Appendix A. 5.3. The Quality of Approximations Theorem 1 establishes that the q... |

201 |
Planning as Search: A Quantitative Approach
- Korf
- 1987
(Show Context)
Citation Context ...udied by the artificial intelligence community are of a related nature although, until recently, this was mostly done in a deterministic setting leading to search or shortest path problems in graphs [=-=Korf, 1987-=-]. In either context, realistic problems have usually proved to be very difficult mostly due to the large size of the underlying state space or of the graph to be searched. In artificial intelligence,... |

152 | Asynchronous stochastic approximation and Q-learning
- Tsitsiklis
- 1994
(Show Context)
Citation Context ... but, unfortunately, this parameter vector generates a poor 3 There are proofs of convergence for these algorithms when a full cost-to-go vector (not a compact representation) is used [Watkins, 1992; =-=Tsitsiklis, 1994-=-; Jaakola, Jordan, Singh, 1994]. Problems arise due to the ways in which these methods are interfaced with compact representations. 12sestimate of the optimal cost-to-go vector (in terms of Euclidean ... |

150 | Reinforcement learning algorithm for partially observable Markov decision process - Jaakkola, Singh, et al. - 1994 |

109 | Real-time learning and control using asynchronous dynamic programming (Technical Report 91-57 - Barto, J, et al. - 1991 |

97 | Generalized polynomial approximations in Markovian decision processes - Schweitzer, Seidmann - 1985 |

59 |
Q-learning
- Watkins, Dayan
- 1992
(Show Context)
Citation Context ...ations of fixed sets of basis functions. More recently, reinforcement learning researchers have developed a number of approaches, including temporal-difference learning (Sutton, 1988) and Q-learning (=-=Watkins and Dayan, 1992-=-), which have been used for dynamic programming with many types of compact representation, especially artificial neural networks. Aside from the work of Whitt (1988) and Reetz (1977), the techniques t... |

29 | Approximations of Dynamic Programs I - Whitt - 1978 |

26 | Functional approximations and dynamic programming - Bellman, Dreyfus - 1959 |

26 | Adaptive Aggregation for Infinite Horizon Dynamic Programming - Bertsekas, Castafion - 1989 |

22 | Counter-Example to Temporal Differences Learning - Bertsekas, P - 1994 |

13 | LNKnet: Neural Network - Lippmann, Kukolich, et al. - 1993 |

11 |
The Convergence of TD(A) for General A
- Dayan
- 1992
(Show Context)
Citation Context ...xed sets of basis functions. More recently, reinforcement learning researchers have developed a number of approaches, including temporal-difference learning (Sutton, 1988) and Q-learning (Watkins and =-=Dayan, 1992-=-), which have been used for dynamic programming with many types of compact representation, especially artificial neural networks. Aside from the work of Whitt (1988) and Reetz (1977), the techniques t... |

10 |
A Multiresolution, Hierarchical Neural Network with Localized Learning
- Bakshi, Stephanopoulos
- 1993
(Show Context)
Citation Context ...alized basis functions have attracted considerable interest as general architectures for function approximation. Two examples are radial basis function (Poggio and Girosi, 1990) and wavelet networks (=-=Bakshi and Stephanopoulos, 1993-=-). With these representations, states are typically contained in a Euclidean space Nd (typically forming a finite grid). Let us continue to view the state space as S = {1, ..., n}. Each state index is... |

9 | Approximate Solutions of a Discounted Markovian Decision Problem - REETZ - 1977 |

6 | Computational Advances in Dynamic Programming - Morin - 1987 |