Results 1 -
5 of
5
Adaptive Choice of Grid and Time in Reinforcement Learning
- IN NIPS ’97: PROCEEDINGS OF THE 1997 CONFERENCE ON ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 10
, 1997
"... We propose local error estimates together with algorithms for adaptive a-posteriori grid and time refinement in reinforcement learning. We consider a deterministic system with continuous state and time with infinite horizon discounted cost functional. For grid refinement we follow the procedure of ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
We propose local error estimates together with algorithms for adaptive a-posteriori grid and time refinement in reinforcement learning. We consider a deterministic system with continuous state and time with infinite horizon discounted cost functional. For grid refinement we follow the procedure of numerical methods for the Bellman-equation. For time refinement we propose a new criterion, based on consistency estimates of discrete solutions of the Bellmanequation. We demonstrate, that an optimal ratio of time to space discretization is crucial for optimal learning rates and accuracy of the approximate optimal value function.
Undiscounted Zero Sum Differential Games With Stopping Times
, 1994
"... We propose a discretization scheme for an undiscounted zero sum differential game with stopping times. The value function of the original problem satisfies an integral inequality of Isaacs type that we can discretize using finite difference or finite element techniques. The fully discrete problem de ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
We propose a discretization scheme for an undiscounted zero sum differential game with stopping times. The value function of the original problem satisfies an integral inequality of Isaacs type that we can discretize using finite difference or finite element techniques. The fully discrete problem defines a stochastic game problem associated to the process, which may have, in general, multiple solutions. Among these solutions there exists one which is naturally associated with the value function of the original problem. We completely characterize the set of solution and we describe a procedure to identify the desired solution. We present accelerated algorithms in order to compute efficiently the discrete solution.
Rate of Convergence of a Numerical Procedure for Impulsive Control Problems
"... : In this paper we consider a deterministic impulsive control problem. We discretize the Hamilton-Jacobi-Bellman equation satised by the optimal cost function and we obtain discrete solutions of the problem. We give an explicit rate of convergence of the approximate solutions to the solution of the ..."
Abstract
- Add to MetaCart
: In this paper we consider a deterministic impulsive control problem. We discretize the Hamilton-Jacobi-Bellman equation satised by the optimal cost function and we obtain discrete solutions of the problem. We give an explicit rate of convergence of the approximate solutions to the solution of the original problem. We consider the optimal switching problem as a special case of impulsive control problem and we apply the same structure of discretization to obtain also a rate of convergence in this case. We present a numerical example. Key-words: Impusive Control. Discretization. Hamilton-Jacobi-Bellman equations. (R#sum# : tsvp) * Department of Mathematics, Universidad de Rosario, Pellegrini 250, 2000 Rosario, ARGENTINA. Unite de recherche INRIA Sophia-Antipolis 2004 route des Lucioles, BP 93, 06902 SOPHIA-ANTIPOLIS Cedex (France) Telephone : (33) 93 65 77 77 -- Telecopie : (33) 93 65 77 65 Vitesse de convergence pour un procedure num#rique d'un probl#me de contr#le impultionnel R#s...
Numerical Schemes for the Continuous Q-function of Reinforcement Learning
"... We develop a theoretical framework for the problem of learning optimal control. We consider a discounted infinite horizon deterministic control problem in the reinforcement learning context. The main objective is to approximate the optimal value function of a fully continuous problem, using only obs ..."
Abstract
- Add to MetaCart
We develop a theoretical framework for the problem of learning optimal control. We consider a discounted infinite horizon deterministic control problem in the reinforcement learning context. The main objective is to approximate the optimal value function of a fully continuous problem, using only observed information as state, control, and cost. With results from the numerical treatment of the Bellman equation we formulate regularity and consistency results for the optimal value function. These results help to construct algorithms for the continuous problem. We propose two approximation schemes for the optimal value function which are based on observed data. The implementation of a simple optimal control learning problem shows the effects of the two approximation schemes.
Local defect correction methods for the Bellman equation
, 1998
"... We present a multi level method for the Bellman equation of optimal control theory. The so called local defect correction (LDC) method uses a local fine grid to correct the solution in a critical area on the coarse grid. We formulate the method assuming a deterministic state equation and prove conve ..."
Abstract
- Add to MetaCart
We present a multi level method for the Bellman equation of optimal control theory. The so called local defect correction (LDC) method uses a local fine grid to correct the solution in a critical area on the coarse grid. We formulate the method assuming a deterministic state equation and prove convergence and consistency results. Numerical results show, that the LDC method provides an accurate solution with little computational requirement. The LDC method shows also good results, if different levels of time discretization are used.

