Results 1 - 10
of
18
Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding
- Advances in Neural Information Processing Systems 8
, 1996
"... On large problems, reinforcement learning systems must use parameterized function approximators such as neural networks in order to generalize between similar situations and actions. In these cases there are no strong theoretical results on the accuracy of convergence, and computational results have ..."
Abstract
-
Cited by 300 (17 self)
- Add to MetaCart
On large problems, reinforcement learning systems must use parameterized function approximators such as neural networks in order to generalize between similar situations and actions. In these cases there are no strong theoretical results on the accuracy of convergence, and computational results have been mixed. In particular, Boyan and Moore reported at last year's meeting a series of negative results in attempting to apply dynamic programming together with function approximation to simple control problems with continuous state spaces. In this paper, we present positive results for all the control tasks they attempted, and for one that is significantly larger. The most important differences are that we used sparse-coarse-coded function approximators (CMACs) whereas they used mostly global function approximators, and that we learned online whereas they learned offline. Boyan and Moore and others have suggested that the problems they encountered could be solved by using actual outcomes (...
Reinforcement learning for RoboCup-soccer keepaway
- Adaptive Behavior
, 2005
"... 1 RoboCup simulated soccer presents many challenges to reinforcement learning methods, in-cluding a large state space, hidden and uncertain state, multiple independent agents learning simultaneously, and long and variable delays in the effects of actions. We describe our appli-cation of episodic SMD ..."
Abstract
-
Cited by 85 (31 self)
- Add to MetaCart
1 RoboCup simulated soccer presents many challenges to reinforcement learning methods, in-cluding a large state space, hidden and uncertain state, multiple independent agents learning simultaneously, and long and variable delays in the effects of actions. We describe our appli-cation of episodic SMDP Sarsa(λ) with linear tile-coding function approximation and variable λ to learning higher-level decisions in a keepaway subtask of RoboCup soccer. In keepaway, one team, “the keepers, ” tries to keep control of the ball for as long as possible despite the efforts of “the takers. ” The keepers learn individually when to hold the ball and when to pass to a teammate. Our agents learned policies that significantly outperform a range of benchmark policies. We demonstrate the generality of our approach by applying it to a number of task variations including different field sizes and different numbers of players on each team.
Function approximation via tile coding: Automating parameter choice
- of Lecture Notes in Artificial Intelligence
, 2005
"... Abstract. Reinforcement learning (RL) is a powerful abstraction of sequential decision making that has an established theoretical foundation and has proven effective in a variety of small, simulated domains. The success of RL on realworld problems with large, often continuous state and action spaces ..."
Abstract
-
Cited by 16 (8 self)
- Add to MetaCart
Abstract. Reinforcement learning (RL) is a powerful abstraction of sequential decision making that has an established theoretical foundation and has proven effective in a variety of small, simulated domains. The success of RL on realworld problems with large, often continuous state and action spaces hinges on effective function approximation. Of the many function approximation schemes proposed, tile coding strikes an empirically successful balance among representational power, computational cost, and ease of use and has been widely adopted in recent RL work. This paper demonstrates that the performance of tile coding is quite sensitive to parameterization. We present detailed experiments that isolate the effects of parameter choices and provide guidance to their setting. We further illustrate that no single parameterization achieves the best performance throughout the learning curve, and contribute an automated technique for adjusting tile-coding parameters online. Our experimental findings confirm the superiority of adaptive parameterization to fixed settings. This work aims to automate the choice of approximation scheme not only on a problem basis but also throughout the learning process, eliminating the need for a substantial tuning effort. 1
A Tutorial Survey of Reinforcement Learning
"... This paper gives a compact, self-contained tutorial survey of reinforcement learning, a tool that is increasingly nding application in the development of intelligent dynamic systems. Research on reinforcement learning during the past decade has led to the development of a variety of useful algorit ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
This paper gives a compact, self-contained tutorial survey of reinforcement learning, a tool that is increasingly nding application in the development of intelligent dynamic systems. Research on reinforcement learning during the past decade has led to the development of a variety of useful algorithms. This paper surveys the literature and presents the algorithms in a cohesive framework.
A Survey of Current Techniques for Reinforcement Learning
- Report LiTH-ISY-I-1391, Computer Vision Laboratory, S--581 83 Linkoping
, 1992
"... This survey considers response generating systems that improve their behaviour using reinforcement learning. The difference between unsupervised learning, supervised learning, and reinforcement learning is described. Two general problems concerning learning systems are presented; the credit assignme ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This survey considers response generating systems that improve their behaviour using reinforcement learning. The difference between unsupervised learning, supervised learning, and reinforcement learning is described. Two general problems concerning learning systems are presented; the credit assignment problem and the problem of perceptual aliasing. Notations and some general issues concerning reinforcement learning systems are presented. Reinforcement learning systems are further divided into two main classes; memory mapping and projective mapping systems. Each of these classes is described and some examples are presented. Some other approaches are mentioned that do not fit into the two main classes. Finally some issues not covered by the surveyed articles are discussed, and some comments on the subject are made. Contents 1 Introduction 2 1.1 The Credit-Assignment Problem : : : : : : : : : : : : : : : : : 4 1.1.1 Temporal Difference Methods : : : : : : : : : : : : : : 5 1.1.2 Dynami...
A binary competition tree for reinforcement learning
- Report LiTH-ISY-R-1623, Computer Vision Laboratory, S--581 83 Linkoping
, 1994
"... A robust, general and computationally simple reinforcement learning system is presented. It uses channel a representation which is robust and continuous. The accumulated knowledge is represented as a reward prediction function in the outer product space of the input- and output channel vectors. Each ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
A robust, general and computationally simple reinforcement learning system is presented. It uses channel a representation which is robust and continuous. The accumulated knowledge is represented as a reward prediction function in the outer product space of the input- and output channel vectors. Each computational unit generates an output simply by avector-matrix multiplication and the response can therefore be calculated fast. The response and a prediction of the reward are calculated simultaneously by the same system, which makes TD-methods easy to implement if needed. Several units can cooperate to solve more complicated problems. A dynamic tree structure of linear units is grown in order to divide the knowledge space into a su ciently number of regions in which the reward function can be properly described. The tree continuously tests split- and prune criteria in order to adapt its size to the complexity of the problem. 1
UNH_CMAC Version 2.1 - The University of New Hampshire Implementation of the Cerebellar Model Arithmetic Computer - CMAC
, 1996
"... this document as "layers" (the layers represent parallel N-dimensional hyperspaces for a network with N inputs). The receptive fields in each of the layers have rectangular boundaries and are organized so as to span the input space without overlap. Any input vector excites one receptive field from e ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
this document as "layers" (the layers represent parallel N-dimensional hyperspaces for a network with N inputs). The receptive fields in each of the layers have rectangular boundaries and are organized so as to span the input space without overlap. Any input vector excites one receptive field from each layer, for a total of C exited receptive fields for any input. Each of the layers of receptive fields is identical in organization, but each layer is offset relative to the others in the input hyperspace. The width of the receptive fields produces input generalization, while the offset of the adjacent layers of receptive fields produces input quantization. The ratio of the width of each receptive field (input generalization) to the offset between adjacent layers of receptive fields (input quantization) must be equal to C for all dimensions of the input space. The integer parameter C is referred to as the generalization parameter. This organization of the receptive fields guarantees that only a fixed number, C, of receptive fields is excited by any input. However, the total number of receptive fields required to span the input space can still be large for many practical problems. On the other hand, it is unlikely that the entire input state space of a large system would be visited in solving a specific problem. Thus it is not necessary to store unique information for each receptive field. Following this logic, most implementations of the Albus CMAC include some form of pseudo-random hashing, so 2 that only information about receptive fields that have been excited during previous training is actually stored. Each receptive field in the Albus CMAC is assumed to be an on-off type of entity. If a receptive field is excited, its response is equal to the magnitude of a single ...
New Techniques In Intelligent Information Filtering
, 2003
"... OF THE DISSERTATION New Techniques in Intelligent Information Filtering by Sofus Attila Macskassy Dissertation Director: Dr. Haym Hirsh Intelligent Information Filtering is the process of receiving or monitoring large amounts of dynamically generated information and extracting the subset of informat ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
OF THE DISSERTATION New Techniques in Intelligent Information Filtering by Sofus Attila Macskassy Dissertation Director: Dr. Haym Hirsh Intelligent Information Filtering is the process of receiving or monitoring large amounts of dynamically generated information and extracting the subset of information that would be of interest to a user based on some specified information need. Historically, this need has been based on user profiles that are directly evaluable---the information can be immediately classified as interesting or not. In this thesis I introduce a new type of user interestingness criterion which is prospective---the criterion defines the interestingness of an information item based on events that happen subsequent to the information item appearing. Hence, the interestingness cannot be directly evaluated. A new technique is described which takes such a criterion and operationalizes it, using machine learning to generate a predictive model that can directly evaluate a piece of information. I show that this technique works statistically significantly better than the baseline of predicting based on class distribution on five information filtering case studies.
Design of fuzzy sliding controller based on cerebellar learning model. Multiple Approaches to Intelligent Systems
- Proceedings Lecture Notes in Arti�cial Intelligence 1611
, 1999
"... Abstract. As to the control of fuzzy sliding mode, this paper proposes a cerebellar learning model for on-line learning of the controller. Fuzzy sliding mode has excellent robustness to the system uncertainty and immunity to the noise of the external noise. As for the cerebellar learning mode, it po ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. As to the control of fuzzy sliding mode, this paper proposes a cerebellar learning model for on-line learning of the controller. Fuzzy sliding mode has excellent robustness to the system uncertainty and immunity to the noise of the external noise. As for the cerebellar learning mode, it possesses the advantages of easy and fast correction. The combination of the two leads to the design of a fuzzy sliding mode controller with self-learning capability to improve the short-comings of difficulties in setting up the regulations of fuzzy control. It also improves the system stability and enhances the effectiveness of the controller.This paper describes the implementation of a fuzzy sliding controller with cerebellar learning mode. After system simulation and capability test, it is applied to the control of slew-up, stand-on and positioning of a 360 ° inverted pendulum.

