Results 1  10
of
11
Reinforcement Learning In Continuous Time and Space
 Neural Computation
, 2000
"... This paper presents a reinforcement learning framework for continuoustime dynamical systems without a priori discretization of time, state, and action. Based on the HamiltonJacobiBellman (HJB) equation for infinitehorizon, discounted reward problems, we derive algorithms for estimating value f ..."
Abstract

Cited by 112 (5 self)
 Add to MetaCart
This paper presents a reinforcement learning framework for continuoustime dynamical systems without a priori discretization of time, state, and action. Based on the HamiltonJacobiBellman (HJB) equation for infinitehorizon, discounted reward problems, we derive algorithms for estimating value functions and for improving policies with the use of function approximators. The process of value function estimation is formulated as the minimization of a continuoustime form of the temporal difference (TD) error. Update methods based on backward Euler approximation and exponential eligibility traces are derived and their correspondences with the conventional residual gradient, TD(0), and TD() algorithms are shown. For policy improvement, two methods, namely, a continuous actorcritic method and a valuegradient based greedy policy, are formulated. As a special case of the latter, a nonlinear feedback control law using the value gradient and the model of the input gain is derived....
A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions
, 1999
"... . This paper proposes a study of Reinforcement Learning (RL) for continuous statespace and time control problems, based on the theoretical framework of viscosity solutions (VSs). We use the method of dynamic programming (DP) which introduces the value function (VF), expectation of the best future cu ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
. This paper proposes a study of Reinforcement Learning (RL) for continuous statespace and time control problems, based on the theoretical framework of viscosity solutions (VSs). We use the method of dynamic programming (DP) which introduces the value function (VF), expectation of the best future cumulativereinforcement. In the continuous case, the value function satisfies a nonlinear first (or second) order (depending on the deterministic or stochastic aspect of the process) differential equation called the HamiltonJacobiBellman (HJB) equation. It is well known that there exists an infinity of generalized solutions (differentiable almost everywhere) to this equation, other than the VF. We show that gradientdescent methods may converge to one of these generalized solutions, thus failing to find the optimal control. In order to solve the HJB equation, we use the powerful framework of viscosity solutions and state that there exists a unique viscosity solution to the HJB equation, whi...
Making a Robot Learn to Play Soccer Using Reward and Punishment
"... Abstract In this paper, we show how reinforcement learning can be applied to real robots to achieve optimal robot behavior. As example, we enable an autonomous soccer robot to learn intercepting a rolling ball. Main focus is on how to adapt the Qlearning algorithm to the needs of learning strategie ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Abstract In this paper, we show how reinforcement learning can be applied to real robots to achieve optimal robot behavior. As example, we enable an autonomous soccer robot to learn intercepting a rolling ball. Main focus is on how to adapt the Qlearning algorithm to the needs of learning strategies for real robots and how to transfer strategies learned in simulation onto real robots. 1
Learning to Control at Multiple Time Scales
 Artificial Neural Networks and Neural Information Processing  ICANN/ICONIP 2003, Joint International Conference ICANN/ICONIP 2003
, 2003
"... In reinforcement learning the interaction between the agent and the environment generally takes place on a xed time scale, which means that the control interval is set to a xed time step. In order to determine a suitable xed time scale one has to trade o accuracy in control against learning c ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
In reinforcement learning the interaction between the agent and the environment generally takes place on a xed time scale, which means that the control interval is set to a xed time step. In order to determine a suitable xed time scale one has to trade o accuracy in control against learning complexity. In this paper we present an alternative approach that enables the agent to learn a control policy by using multiple time scales simultaneously. Instead of preselecting a xed time scale, there are several time scales available during learning and the agent can select the appropriate time scale depending on the system state. The dierent time scales are multiples of a nest time scale which is denoted as the primitive time scale. Actions on a coarser time scale consist of several identical actions on the primitive time scale and are called multistep actions (MSAs). The special structure of these actions is eciently exploited in our recent MSAQlearning algorithm. We use the MSAs to learn a control policy for a thermostat control problem. Our algorithm yields a fast and highly accurate control policy; in contrast, the standard Qlearning algorithms without MSAs fails to learn any useful control policy for this problem.
AN ADAPTIVE SPARSE GRID SEMILAGRANGIAN SCHEME FOR FIRST ORDER HAMILTONJACOBI BELLMAN EQUATIONS
, 2012
"... ABSTRACT. We propose a semiLagrangian scheme using a spatially adaptive sparse grid to deal with nonlinear timedependent HamiltonJacobi Bellman equations. We focus in particular on front propagation models in higher dimensions which are related to control problems. We test the numerical efficien ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
ABSTRACT. We propose a semiLagrangian scheme using a spatially adaptive sparse grid to deal with nonlinear timedependent HamiltonJacobi Bellman equations. We focus in particular on front propagation models in higher dimensions which are related to control problems. We test the numerical efficiency of the method on several benchmark problems up to space dimensiond = 8, and give evidence of convergence towards the exact viscosity solution. In addition, we study how the complexity and precision scale with the dimension of the problem. 1.
Satisficing, Fast Implementation, and Generalization for Online Reinforcement Learning
"... Contents 1 Introduction 5 1.1 The background and objectives : : : : : : : : : : : : : : : : : : : : : 5 1.2 The outline : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 2 The achievement of satiscing in online reinforcement learning 9 2.1 Introduction : : : : : : : : : : : : : : : ..."
Abstract
 Add to MetaCart
Contents 1 Introduction 5 1.1 The background and objectives : : : : : : : : : : : : : : : : : : : : : 5 1.2 The outline : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 2 The achievement of satiscing in online reinforcement learning 9 2.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 2.2 Qlearning and conventional exploration strategies : : : : : : : : : : : 10 2.2.1 Qlearning : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 2.2.2 Conventional exploration strategies : : : : : : : : : : : : : : : 11 2.3 Problems to be solved : : : : : : : : : : : : : : : : : : : : : : : : : : 13 2.3.1 Reinforcement learning to satisce : : : : : : : : : : : : : : : 13 2.3.2 Learning framework : : : : : : : : : : : : : : : : : : : : : : : :<F
Speedingup Reinforcement Learning with
 Proceedings of the Twelfth International Conference on Arti Neural Networks (ICANN), Lecture Notes in Computer Science (LNCS) 2415
, 2001
"... In recent years hierarchical concepts of temporal abstraction have been integrated in the reinforcement learning framework to improve scalability. However, existing approaches are limited to domains where a decomposition into subtasks is known a priori. In this paper we propose the concept of explic ..."
Abstract
 Add to MetaCart
In recent years hierarchical concepts of temporal abstraction have been integrated in the reinforcement learning framework to improve scalability. However, existing approaches are limited to domains where a decomposition into subtasks is known a priori. In this paper we propose the concept of explicitly selecting time scale related actions if no subgoalrelated abstract actions are available. This is realised with multistep actions on dierent time scales that are combined in one single action set. The special structure of the action set is exploited in the MSAQ learning algorithm. By learning on dierent explicitly speci ed time scales simultaneously, a considerable improvement of learning speed can be achieved. This is demonstrated on two benchmark problems.
Numerical Schemes for the Continuous Qfunction of Reinforcement Learning
"... We develop a theoretical framework for the problem of learning optimal control. We consider a discounted infinite horizon deterministic control problem in the reinforcement learning context. The main objective is to approximate the optimal value function of a fully continuous problem, using only obs ..."
Abstract
 Add to MetaCart
We develop a theoretical framework for the problem of learning optimal control. We consider a discounted infinite horizon deterministic control problem in the reinforcement learning context. The main objective is to approximate the optimal value function of a fully continuous problem, using only observed information as state, control, and cost. With results from the numerical treatment of the Bellman equation we formulate regularity and consistency results for the optimal value function. These results help to construct algorithms for the continuous problem. We propose two approximation schemes for the optimal value function which are based on observed data. The implementation of a simple optimal control learning problem shows the effects of the two approximation schemes.
September 2012AN ADAPTIVE SPARSE GRID SEMILAGRANGIAN SCHEME FOR FIRST ORDER HAMILTONJACOBI BELLMAN EQUATIONS
"... ABSTRACT. We propose a semiLagrangian scheme using a spatially adaptive sparse grid to deal with nonlinear timedependent HamiltonJacobi Bellman equations. We focus in particular on front propagation models in higher dimensions which are related to control problems. We test the numerical efficien ..."
Abstract
 Add to MetaCart
ABSTRACT. We propose a semiLagrangian scheme using a spatially adaptive sparse grid to deal with nonlinear timedependent HamiltonJacobi Bellman equations. We focus in particular on front propagation models in higher dimensions which are related to control problems. We test the numerical efficiency of the method on several benchmark problems up to space dimensiond = 8, and give evidence of convergence towards the exact viscosity solution. In addition, we study how the complexity and precision scale with the dimension of the problem. 1.