Results 1  10
of
12
Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding
 Advances in Neural Information Processing Systems 8
, 1996
"... On large problems, reinforcement learning systems must use parameterized function approximators such as neural networks in order to generalize between similar situations and actions. In these cases there are no strong theoretical results on the accuracy of convergence, and computational results have ..."
Abstract

Cited by 411 (21 self)
 Add to MetaCart
(Show Context)
On large problems, reinforcement learning systems must use parameterized function approximators such as neural networks in order to generalize between similar situations and actions. In these cases there are no strong theoretical results on the accuracy of convergence, and computational results have been mixed. In particular, Boyan and Moore reported at last year's meeting a series of negative results in attempting to apply dynamic programming together with function approximation to simple control problems with continuous state spaces. In this paper, we present positive results for all the control tasks they attempted, and for one that is significantly larger. The most important differences are that we used sparsecoarsecoded function approximators (CMACs) whereas they used mostly global function approximators, and that we learned online whereas they learned offline. Boyan and Moore and others have suggested that the problems they encountered could be solved by using actual outcomes (...
Problem Solving With Reinforcement Learning
, 1995
"... This dissertation is submitted for consideration for the dwree of Doctor' of Philosophy at the Uziver'sity of Cambr'idge Summary This thesis is concerned with practical issues surrounding the application of reinforcement lear'ning techniques to tasks that take place in high di ..."
Abstract

Cited by 53 (0 self)
 Add to MetaCart
This dissertation is submitted for consideration for the dwree of Doctor' of Philosophy at the Uziver'sity of Cambr'idge Summary This thesis is concerned with practical issues surrounding the application of reinforcement lear'ning techniques to tasks that take place in high dimensional continuous statespace environments. In particular, the extension of online updating methods is considered, where the term implies systems that learn as each experience arrives, rather than storing the experiences for use in a separate offline learning phase. Firstly, the use of alternative update rules in place of standard Qlearning (Watkins 1989) is examined to provide faster convergence rates. Secondly, the use of multilayer perceptton (MLP) neural networks (Rumelhart, Hinton and Williams 1986) is investigated to provide suitable generalising function approximators. Finally, consideration is given to the combination of Adaptive Heuristic Critic (AHC) methods and Qlearning to produce systems combining the benefits of realvalued actions and discrete switching
Sparse temporal difference learning using lasso
 In IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning
, 2007
"... Abstract — We consider the problem of online value function estimation in reinforcement learning. We concentrate on the function approximator to use. To try to break the curse of dimensionality, we focus on non parametric function approximators. We propose to fit the use of kernels into the tempora ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
(Show Context)
Abstract — We consider the problem of online value function estimation in reinforcement learning. We concentrate on the function approximator to use. To try to break the curse of dimensionality, we focus on non parametric function approximators. We propose to fit the use of kernels into the temporal difference algorithms by using regression via the LASSO. We introduce the equigradient descent algorithm (EGD) which is a direct adaptation of the one recently introduced in the LARS algorithm family for solving the LASSO. We advocate our choice of the EGD as a judicious algorithm for these tasks. We present the EGD algorithm in details as well as some experimental results.
Function approximation via tile coding: Automating parameter choice
 of Lecture Notes in Artificial Intelligence
, 2005
"... Abstract. Reinforcement learning (RL) is a powerful abstraction of sequential decision making that has an established theoretical foundation and has proven effective in a variety of small, simulated domains. The success of RL on realworld problems with large, often continuous state and action spaces ..."
Abstract

Cited by 25 (8 self)
 Add to MetaCart
Abstract. Reinforcement learning (RL) is a powerful abstraction of sequential decision making that has an established theoretical foundation and has proven effective in a variety of small, simulated domains. The success of RL on realworld problems with large, often continuous state and action spaces hinges on effective function approximation. Of the many function approximation schemes proposed, tile coding strikes an empirically successful balance among representational power, computational cost, and ease of use and has been widely adopted in recent RL work. This paper demonstrates that the performance of tile coding is quite sensitive to parameterization. We present detailed experiments that isolate the effects of parameter choices and provide guidance to their setting. We further illustrate that no single parameterization achieves the best performance throughout the learning curve, and contribute an automated technique for adjusting tilecoding parameters online. Our experimental findings confirm the superiority of adaptive parameterization to fixed settings. This work aims to automate the choice of approximation scheme not only on a problem basis but also throughout the learning process, eliminating the need for a substantial tuning effort. 1
Fast Reinforcement Learning with Large Action Sets using ErrorCorrecting Output Codes for MDP Factorization
, 2012
"... The use of Reinforcement Learning in realworld scenarios is strongly limited by issues of scale. Most RL learning algorithms are unable to deal with problems composed of hundreds or sometimes even dozens of possible actions, and therefore cannot be applied to many realworld problems. We consider t ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
The use of Reinforcement Learning in realworld scenarios is strongly limited by issues of scale. Most RL learning algorithms are unable to deal with problems composed of hundreds or sometimes even dozens of possible actions, and therefore cannot be applied to many realworld problems. We consider the RL problem in the supervised classification framework where the optimal policy is obtained through a multiclass classifier, the set of classes being the set of actions of the problem. We introduce errorcorrecting output codes (ECOCs) in this setting and propose two new methods for reducing complexity when using rolloutsbased approaches. The first method consists in using an ECOCbased classifier as the multiclass classifier, reducing the learning complexity from O(A 2) to O(A log(A)). We then propose a novel method that profits from the ECOC’s coding dictionary to split the initial MDP into O(log(A)) seperate twoaction MDPs. This second method reduces learning complexity even further, from O(A 2) to O(log(A)), thus rendering problems with large action sets tractable. We finish by experimentally demonstrating the advantages of our approach on a set of benchmark problems, both in speed and performance.
On ContinuousAction QLearning Via Tile Coding . . .
 IN UNDER REVIEW
, 2004
"... Reinforcement learning (RL) is a powerful machinelearning methodology that has an established theoretical foundation and has proven effective in a variety of small, simulated domains. There has been considerable work on applying RL, a method originally conceived for discrete stateaction spaces ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Reinforcement learning (RL) is a powerful machinelearning methodology that has an established theoretical foundation and has proven effective in a variety of small, simulated domains. There has been considerable work on applying RL, a method originally conceived for discrete stateaction spaces, to problems with continuous states. The extension of RL to allow continuous actions, on the other hand, has seen relatively little research. One proposed approach to allowing continuous actions is to represent the value function using a tilecoding function approximator. We introduce
MultiAgent Learning for
 Control of Internet Traffic Routing”, Learning Systems for Control, IEE Seminar
, 2000
"... ..."
(Show Context)
Reinforcement Learning In Soccer Simulation
"... Abstract. Being of a high complexity, most multiagent systems are difficult to deal with by a handcoded approach to decision making. In such complicated environments in which decision making processes should be controlled from both the individuals points ' of view and the whole team, the comm ..."
Abstract
 Add to MetaCart
Abstract. Being of a high complexity, most multiagent systems are difficult to deal with by a handcoded approach to decision making. In such complicated environments in which decision making processes should be controlled from both the individuals points ' of view and the whole team, the common approach to the subject is the Reinforcement Learning (RL) method which is mainly based on learning the optimal policy through mapping this task to an episodic reinforcement learning framework. Reinforcement learning is the problem of generating optimal behavior in a sequential decision making environment given the opportunity of interacting with it. Since the Robocop domain is a multiagent dynamic environment, with notable features making it outstanding for multiagent simulation benchmarks, it has been largely used as a basis for international multiagent simulation competitions and research challenges. For this purpose, reinforcement learning problems should be made “understandable ” for “agents ” which are Robocop players in this case. To make the agents “aware ” of what they are intended to do, this paper will provide an overview of different methods and alternative approaches as a starting point for robocupers who are not familiar with reinforcement learning problems in the Robocop domain. 1.
To cite this version:
, 2006
"... HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract
 Add to MetaCart
(Show Context)
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Sparse temporal difference learning using LASSO