Results 1  10
of
23
Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding
 Advances in Neural Information Processing Systems 8
, 1996
"... On large problems, reinforcement learning systems must use parameterized function approximators such as neural networks in order to generalize between similar situations and actions. In these cases there are no strong theoretical results on the accuracy of convergence, and computational results have ..."
Abstract

Cited by 355 (18 self)
 Add to MetaCart
On large problems, reinforcement learning systems must use parameterized function approximators such as neural networks in order to generalize between similar situations and actions. In these cases there are no strong theoretical results on the accuracy of convergence, and computational results have been mixed. In particular, Boyan and Moore reported at last year's meeting a series of negative results in attempting to apply dynamic programming together with function approximation to simple control problems with continuous state spaces. In this paper, we present positive results for all the control tasks they attempted, and for one that is significantly larger. The most important differences are that we used sparsecoarsecoded function approximators (CMACs) whereas they used mostly global function approximators, and that we learned online whereas they learned offline. Boyan and Moore and others have suggested that the problems they encountered could be solved by using actual outcomes (...
Reinforcement learning for RoboCupsoccer keepaway
 Adaptive Behavior
, 2005
"... 1 RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple independent agents learning simultaneously, and long and variable delays in the effects of actions. We describe our application of episodic SMD ..."
Abstract

Cited by 109 (32 self)
 Add to MetaCart
1 RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple independent agents learning simultaneously, and long and variable delays in the effects of actions. We describe our application of episodic SMDP Sarsa(λ) with linear tilecoding function approximation and variable λ to learning higherlevel decisions in a keepaway subtask of RoboCup soccer. In keepaway, one team, “the keepers, ” tries to keep control of the ball for as long as possible despite the efforts of “the takers. ” The keepers learn individually when to hold the ball and when to pass to a teammate. Our agents learned policies that significantly outperform a range of benchmark policies. We demonstrate the generality of our approach by applying it to a number of task variations including different field sizes and different numbers of players on each team.
Function approximation via tile coding: Automating parameter choice
 of Lecture Notes in Artificial Intelligence
, 2005
"... Abstract. Reinforcement learning (RL) is a powerful abstraction of sequential decision making that has an established theoretical foundation and has proven effective in a variety of small, simulated domains. The success of RL on realworld problems with large, often continuous state and action spaces ..."
Abstract

Cited by 18 (8 self)
 Add to MetaCart
Abstract. Reinforcement learning (RL) is a powerful abstraction of sequential decision making that has an established theoretical foundation and has proven effective in a variety of small, simulated domains. The success of RL on realworld problems with large, often continuous state and action spaces hinges on effective function approximation. Of the many function approximation schemes proposed, tile coding strikes an empirically successful balance among representational power, computational cost, and ease of use and has been widely adopted in recent RL work. This paper demonstrates that the performance of tile coding is quite sensitive to parameterization. We present detailed experiments that isolate the effects of parameter choices and provide guidance to their setting. We further illustrate that no single parameterization achieves the best performance throughout the learning curve, and contribute an automated technique for adjusting tilecoding parameters online. Our experimental findings confirm the superiority of adaptive parameterization to fixed settings. This work aims to automate the choice of approximation scheme not only on a problem basis but also throughout the learning process, eliminating the need for a substantial tuning effort. 1
A Tutorial Survey of Reinforcement Learning
"... This paper gives a compact, selfcontained tutorial survey of reinforcement learning, a tool that is increasingly nding application in the development of intelligent dynamic systems. Research on reinforcement learning during the past decade has led to the development of a variety of useful algorit ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
This paper gives a compact, selfcontained tutorial survey of reinforcement learning, a tool that is increasingly nding application in the development of intelligent dynamic systems. Research on reinforcement learning during the past decade has led to the development of a variety of useful algorithms. This paper surveys the literature and presents the algorithms in a cohesive framework.
A Survey of Current Techniques for Reinforcement Learning
 Report LiTHISYI1391, Computer Vision Laboratory, S581 83 Linkoping
, 1992
"... This survey considers response generating systems that improve their behaviour using reinforcement learning. The difference between unsupervised learning, supervised learning, and reinforcement learning is described. Two general problems concerning learning systems are presented; the credit assignme ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
This survey considers response generating systems that improve their behaviour using reinforcement learning. The difference between unsupervised learning, supervised learning, and reinforcement learning is described. Two general problems concerning learning systems are presented; the credit assignment problem and the problem of perceptual aliasing. Notations and some general issues concerning reinforcement learning systems are presented. Reinforcement learning systems are further divided into two main classes; memory mapping and projective mapping systems. Each of these classes is described and some examples are presented. Some other approaches are mentioned that do not fit into the two main classes. Finally some issues not covered by the surveyed articles are discussed, and some comments on the subject are made. Contents 1 Introduction 2 1.1 The CreditAssignment Problem : : : : : : : : : : : : : : : : : 4 1.1.1 Temporal Difference Methods : : : : : : : : : : : : : : 5 1.1.2 Dynami...
A binary competition tree for reinforcement learning
 Report LiTHISYR1623, Computer Vision Laboratory, S581 83 Linkoping
, 1994
"... A robust, general and computationally simple reinforcement learning system is presented. It uses channel a representation which is robust and continuous. The accumulated knowledge is represented as a reward prediction function in the outer product space of the input and output channel vectors. Each ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
A robust, general and computationally simple reinforcement learning system is presented. It uses channel a representation which is robust and continuous. The accumulated knowledge is represented as a reward prediction function in the outer product space of the input and output channel vectors. Each computational unit generates an output simply by avectormatrix multiplication and the response can therefore be calculated fast. The response and a prediction of the reward are calculated simultaneously by the same system, which makes TDmethods easy to implement if needed. Several units can cooperate to solve more complicated problems. A dynamic tree structure of linear units is grown in order to divide the knowledge space into a su ciently number of regions in which the reward function can be properly described. The tree continuously tests split and prune criteria in order to adapt its size to the complexity of the problem. 1
UNH_CMAC Version 2.1  The University of New Hampshire Implementation of the Cerebellar Model Arithmetic Computer  CMAC
, 1996
"... this document as "layers" (the layers represent parallel Ndimensional hyperspaces for a network with N inputs). The receptive fields in each of the layers have rectangular boundaries and are organized so as to span the input space without overlap. Any input vector excites one receptive field from e ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
this document as "layers" (the layers represent parallel Ndimensional hyperspaces for a network with N inputs). The receptive fields in each of the layers have rectangular boundaries and are organized so as to span the input space without overlap. Any input vector excites one receptive field from each layer, for a total of C exited receptive fields for any input. Each of the layers of receptive fields is identical in organization, but each layer is offset relative to the others in the input hyperspace. The width of the receptive fields produces input generalization, while the offset of the adjacent layers of receptive fields produces input quantization. The ratio of the width of each receptive field (input generalization) to the offset between adjacent layers of receptive fields (input quantization) must be equal to C for all dimensions of the input space. The integer parameter C is referred to as the generalization parameter. This organization of the receptive fields guarantees that only a fixed number, C, of receptive fields is excited by any input. However, the total number of receptive fields required to span the input space can still be large for many practical problems. On the other hand, it is unlikely that the entire input state space of a large system would be visited in solving a specific problem. Thus it is not necessary to store unique information for each receptive field. Following this logic, most implementations of the Albus CMAC include some form of pseudorandom hashing, so 2 that only information about receptive fields that have been excited during previous training is actually stored. Each receptive field in the Albus CMAC is assumed to be an onoff type of entity. If a receptive field is excited, its response is equal to the magnitude of a single ...
New Techniques In Intelligent Information Filtering
, 2003
"... OF THE DISSERTATION New Techniques in Intelligent Information Filtering by Sofus Attila Macskassy Dissertation Director: Dr. Haym Hirsh Intelligent Information Filtering is the process of receiving or monitoring large amounts of dynamically generated information and extracting the subset of informat ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
OF THE DISSERTATION New Techniques in Intelligent Information Filtering by Sofus Attila Macskassy Dissertation Director: Dr. Haym Hirsh Intelligent Information Filtering is the process of receiving or monitoring large amounts of dynamically generated information and extracting the subset of information that would be of interest to a user based on some specified information need. Historically, this need has been based on user profiles that are directly evaluablethe information can be immediately classified as interesting or not. In this thesis I introduce a new type of user interestingness criterion which is prospectivethe criterion defines the interestingness of an information item based on events that happen subsequent to the information item appearing. Hence, the interestingness cannot be directly evaluated. A new technique is described which takes such a criterion and operationalizes it, using machine learning to generate a predictive model that can directly evaluate a piece of information. I show that this technique works statistically significantly better than the baseline of predicting based on class distribution on five information filtering case studies.