Results 1  10
of
15
Reinforcement Learning In Continuous Time and Space
 Neural Computation
, 2000
"... This paper presents a reinforcement learning framework for continuoustime dynamical systems without a priori discretization of time, state, and action. Based on the HamiltonJacobiBellman (HJB) equation for infinitehorizon, discounted reward problems, we derive algorithms for estimating value f ..."
Abstract

Cited by 129 (5 self)
 Add to MetaCart
(Show Context)
This paper presents a reinforcement learning framework for continuoustime dynamical systems without a priori discretization of time, state, and action. Based on the HamiltonJacobiBellman (HJB) equation for infinitehorizon, discounted reward problems, we derive algorithms for estimating value functions and for improving policies with the use of function approximators. The process of value function estimation is formulated as the minimization of a continuoustime form of the temporal difference (TD) error. Update methods based on backward Euler approximation and exponential eligibility traces are derived and their correspondences with the conventional residual gradient, TD(0), and TD() algorithms are shown. For policy improvement, two methods, namely, a continuous actorcritic method and a valuegradient based greedy policy, are formulated. As a special case of the latter, a nonlinear feedback control law using the value gradient and the model of the input gain is derived....
A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions
, 1999
"... . This paper proposes a study of Reinforcement Learning (RL) for continuous statespace and time control problems, based on the theoretical framework of viscosity solutions (VSs). We use the method of dynamic programming (DP) which introduces the value function (VF), expectation of the best future cu ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
(Show Context)
. This paper proposes a study of Reinforcement Learning (RL) for continuous statespace and time control problems, based on the theoretical framework of viscosity solutions (VSs). We use the method of dynamic programming (DP) which introduces the value function (VF), expectation of the best future cumulativereinforcement. In the continuous case, the value function satisfies a nonlinear first (or second) order (depending on the deterministic or stochastic aspect of the process) differential equation called the HamiltonJacobiBellman (HJB) equation. It is well known that there exists an infinity of generalized solutions (differentiable almost everywhere) to this equation, other than the VF. We show that gradientdescent methods may converge to one of these generalized solutions, thus failing to find the optimal control. In order to solve the HJB equation, we use the powerful framework of viscosity solutions and state that there exists a unique viscosity solution to the HJB equation, whi...
Making a Robot Learn to Play Soccer Using Reward and Punishment
"... Abstract In this paper, we show how reinforcement learning can be applied to real robots to achieve optimal robot behavior. As example, we enable an autonomous soccer robot to learn intercepting a rolling ball. Main focus is on how to adapt the Qlearning algorithm to the needs of learning strategie ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
Abstract In this paper, we show how reinforcement learning can be applied to real robots to achieve optimal robot behavior. As example, we enable an autonomous soccer robot to learn intercepting a rolling ball. Main focus is on how to adapt the Qlearning algorithm to the needs of learning strategies for real robots and how to transfer strategies learned in simulation onto real robots. 1
AN ADAPTIVE SPARSE GRID SEMILAGRANGIAN SCHEME FOR FIRST ORDER HAMILTONJACOBI BELLMAN EQUATIONS
, 2012
"... ABSTRACT. We propose a semiLagrangian scheme using a spatially adaptive sparse grid to deal with nonlinear timedependent HamiltonJacobi Bellman equations. We focus in particular on front propagation models in higher dimensions which are related to control problems. We test the numerical efficien ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
ABSTRACT. We propose a semiLagrangian scheme using a spatially adaptive sparse grid to deal with nonlinear timedependent HamiltonJacobi Bellman equations. We focus in particular on front propagation models in higher dimensions which are related to control problems. We test the numerical efficiency of the method on several benchmark problems up to space dimensiond = 8, and give evidence of convergence towards the exact viscosity solution. In addition, we study how the complexity and precision scale with the dimension of the problem. 1.
Learning to Control at Multiple Time Scales
 Artificial Neural Networks and Neural Information Processing  ICANN/ICONIP 2003, Joint International Conference ICANN/ICONIP 2003
, 2003
"... In reinforcement learning the interaction between the agent and the environment generally takes place on a xed time scale, which means that the control interval is set to a xed time step. In order to determine a suitable xed time scale one has to trade o accuracy in control against learning c ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
In reinforcement learning the interaction between the agent and the environment generally takes place on a xed time scale, which means that the control interval is set to a xed time step. In order to determine a suitable xed time scale one has to trade o accuracy in control against learning complexity. In this paper we present an alternative approach that enables the agent to learn a control policy by using multiple time scales simultaneously. Instead of preselecting a xed time scale, there are several time scales available during learning and the agent can select the appropriate time scale depending on the system state. The dierent time scales are multiples of a nest time scale which is denoted as the primitive time scale. Actions on a coarser time scale consist of several identical actions on the primitive time scale and are called multistep actions (MSAs). The special structure of these actions is eciently exploited in our recent MSAQlearning algorithm. We use the MSAs to learn a control policy for a thermostat control problem. Our algorithm yields a fast and highly accurate control policy; in contrast, the standard Qlearning algorithms without MSAs fails to learn any useful control policy for this problem.
HIERARCHICAL REINFORCEMENT LEARNING WITH FUNCTION APPROXIMATION FOR ADAPTIVE CONTROL
"... ..."
(Show Context)
Metric State Space Reinforcement Learning for a VisionCapable Mobile Robot
, 2003
"... We address the problem of autonomously learning controllers for visioncapable mobile robots. We extend McCallum’s (1995) NearestSequence Memory algorithm to ..."
Abstract
 Add to MetaCart
(Show Context)
We address the problem of autonomously learning controllers for visioncapable mobile robots. We extend McCallum’s (1995) NearestSequence Memory algorithm to
Mapping the Design Space of Reinforcement Learning Problems – a Case Study
"... This paper reports on a case study motivated by a typical reinforcement learning problem in robotics: an overall goal which decomposes into several subgoals has to be reached in a discrete large sized state space. For simplicity, we model this problem in a standard gridworld setting and perform an e ..."
Abstract
 Add to MetaCart
(Show Context)
This paper reports on a case study motivated by a typical reinforcement learning problem in robotics: an overall goal which decomposes into several subgoals has to be reached in a discrete large sized state space. For simplicity, we model this problem in a standard gridworld setting and perform an extensive comparison of different parameter and design choices. During this, we focus on the central role of the representation of the state space. We examine three fundamentally different representations with counterparts in “real life” robotics. We investigate their behaviour with respect to (i) the size and properties of the state space, (ii) different exploration strategies including the recent proposal of multistepactions and (iii) the type and parameters of the reward function. 1
Using Multistep Actions for Faster Reinforcement Learning
"... Machines (HAM) (Parr, 1998) and the MAXQ approach (Dietterich, 2000). They are all based on the notion that the whole task is decomposed into subtasks each of which corresponds to a subgoal. The existing hierarchical RL approaches are able to solve problems of the following two types: 1. Abstract a ..."
Abstract
 Add to MetaCart
(Show Context)
Machines (HAM) (Parr, 1998) and the MAXQ approach (Dietterich, 2000). They are all based on the notion that the whole task is decomposed into subtasks each of which corresponds to a subgoal. The existing hierarchical RL approaches are able to solve problems of the following two types: 1. Abstract actions given: Abstract actions for achieving subgoals are given in terms of actions that are lower in the hierarchy.