Results 1 
3 of
3
Adaptive Choice of Grid and Time in Reinforcement Learning
 IN NIPS ’97: PROCEEDINGS OF THE 1997 CONFERENCE ON ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 10
, 1997
"... We propose local error estimates together with algorithms for adaptive aposteriori grid and time refinement in reinforcement learning. We consider a deterministic system with continuous state and time with infinite horizon discounted cost functional. For grid refinement we follow the procedure of ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
We propose local error estimates together with algorithms for adaptive aposteriori grid and time refinement in reinforcement learning. We consider a deterministic system with continuous state and time with infinite horizon discounted cost functional. For grid refinement we follow the procedure of numerical methods for the Bellmanequation. For time refinement we propose a new criterion, based on consistency estimates of discrete solutions of the Bellmanequation. We demonstrate, that an optimal ratio of time to space discretization is crucial for optimal learning rates and accuracy of the approximate optimal value function.
MultiGrid Methods for Reinforcement Learning in Controlled Diffusion Processes
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS
, 1996
"... Reinforcement learning methods for discrete and semiMarkov decision problems such as RealTime Dynamic Programming can be generalized for Controlled Diffusion Processes. The optimal control problem reduces to a boundary value problem for a fully nonlinear secondorder elliptic differential equation ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Reinforcement learning methods for discrete and semiMarkov decision problems such as RealTime Dynamic Programming can be generalized for Controlled Diffusion Processes. The optimal control problem reduces to a boundary value problem for a fully nonlinear secondorder elliptic differential equation of HamiltonJacobi Bellman (HJB) type. Numerical analysis provides multigrid methods for this kind of equation. In the case of Learning Control, however, the systems of equations on the various gridlevels are obtained using observed information (transitions and local cost). To ensure consistency, special attention needs to be directed toward the type of time and space discretization during the observation. An algorithm for multigrid observation is proposed. The multigrid algorithm is demonstrated on a simple queuing problem.
Numerical Schemes for the Continuous Qfunction of Reinforcement Learning
"... We develop a theoretical framework for the problem of learning optimal control. We consider a discounted infinite horizon deterministic control problem in the reinforcement learning context. The main objective is to approximate the optimal value function of a fully continuous problem, using only obs ..."
Abstract
 Add to MetaCart
We develop a theoretical framework for the problem of learning optimal control. We consider a discounted infinite horizon deterministic control problem in the reinforcement learning context. The main objective is to approximate the optimal value function of a fully continuous problem, using only observed information as state, control, and cost. With results from the numerical treatment of the Bellman equation we formulate regularity and consistency results for the optimal value function. These results help to construct algorithms for the continuous problem. We propose two approximation schemes for the optimal value function which are based on observed data. The implementation of a simple optimal control learning problem shows the effects of the two approximation schemes.