Reinforcement Learning In Continuous Time and Space (2000)
| Venue: | Neural Computation |
| Citations: | 83 - 4 self |
BibTeX
@ARTICLE{Doya00reinforcementlearning,
author = {Kenji Doya},
title = {Reinforcement Learning In Continuous Time and Space},
journal = {Neural Computation},
year = {2000},
volume = {12},
pages = {219--245}
}
Years of Citing Articles
OpenURL
Abstract
This paper presents a reinforcement learning framework for continuoustime dynamical systems without a priori discretization of time, state, and action. Based on the Hamilton-Jacobi-Bellman (HJB) equation for infinitehorizon, discounted reward problems, we derive algorithms for estimating value functions and for improving policies with the use of function approximators. The process of value function estimation is formulated as the minimization of a continuous-time form of the temporal difference (TD) error. Update methods based on backward Euler approximation and exponential eligibility traces are derived and their correspondences with the conventional residual gradient, TD(0), and TD() algorithms are shown. For policy improvement, two methods, namely, a continuous actor-critic method and a value-gradient based greedy policy, are formulated. As a special case of the latter, a nonlinear feedback control law using the value gradient and the model of the input gain is derived....







