Results 1 -
8 of
8
Reinforcement Learning I: Introduction
, 1998
"... In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search ..."
Abstract
-
Cited by 2829 (76 self)
- Add to MetaCart
In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search) plus learning (association, memory). We argue that RL is the only field that seriously addresses the special features of the problem of learning from interaction to achieve long-term goals.
Reinforcement Learning with Replacing Eligibility Traces
- Machine Learning
, 1996
"... . The eligibility trace is one of the basic mechanisms used in reinforcement learning to handle delayed reward. In this paper we introduce a new kind of eligibility trace, the replacing trace, analyze it theoretically, and show that it results in faster, more reliable learning than the conventional ..."
Abstract
-
Cited by 168 (8 self)
- Add to MetaCart
. The eligibility trace is one of the basic mechanisms used in reinforcement learning to handle delayed reward. In this paper we introduce a new kind of eligibility trace, the replacing trace, analyze it theoretically, and show that it results in faster, more reliable learning than the conventional trace. Both kinds of trace assign credit to prior events according to how recently they occurred, but only the conventional trace gives greater credit to repeated events. Our analysis is for conventional and replace-trace versions of the offline TD(1) algorithm applied to undiscounted absorbing Markov chains. First, we show that these methods converge under repeated presentations of the training set to the same predictions as two well known Monte Carlo methods. We then analyze the relative efficiency of the two Monte Carlo methods. We show that the method corresponding to conventional TD is biased, whereas the method corresponding to replace-trace TD is unbiased. In addition, we show that t...
Adaptive Critics and the Basal Ganglia
- In
, 1995
"... One of the most active areas of research in artificial intelligence is the study of learning methods by which “embedded agents ” can improve performance while acting in complex dynamic environments. An agent, or decision maker, is embedded in an environment when it receives information from, and act ..."
Abstract
-
Cited by 53 (5 self)
- Add to MetaCart
One of the most active areas of research in artificial intelligence is the study of learning methods by which “embedded agents ” can improve performance while acting in complex dynamic environments. An agent, or decision maker, is embedded in an environment when it receives information from, and acts on, that environment in an ongoing closed-loop interaction. An embedded agent has to make decisions under time pressure and uncertainty and has to learn without the help of an ever-present knowledgeable teacher. Although the novelty of this emphasis may be inconspicuous to a biologist, animals being the prototypical embedded agents, this emphasis is a significant departure from the more traditional focus in artificial intelligence on reasoning within circumscribed domains removed from the flow of real-world events. One consequence of the embedded agent view is the increasing interest in the learning paradigm called reinforcement learning (RL). Unlike the more widely studied supervised learning systems, which learn from a set of examples of correct input/output behavior, RL systems adjust their behavior with the goal of maximizing the frequency and/or magnitude of the reinforcing events they encounter over time. While the core ideas of modern RL come from theories of animal classical and instrumental
Learning and Problem Solving with Multilayer Connectionist Systems
, 1986
"... Learning and Problem Solving with Multilayer Connectionist Systems September 1986 Charles William Anderson B.S., University of Nebraska M.S., University of Massachusetts Ph.D., University of Massachusetts Directed by: Professor Andrew G. Barto The di#culties of learning in multilayered netwo ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
Learning and Problem Solving with Multilayer Connectionist Systems September 1986 Charles William Anderson B.S., University of Nebraska M.S., University of Massachusetts Ph.D., University of Massachusetts Directed by: Professor Andrew G. Barto The di#culties of learning in multilayered networks of computational units has limited the use of connectionist systems in complex domains. This dissertation elucidates the issues of learning in a network's hidden units, and reviews methods for addressing these issues that have been developed through the years. Issues of learning in hidden units are shown to be analogous to learning issues for multilayer systems employing symbolic representations.
A cerebellar model of timing and prediction in the control of reaching
- Neural Computation
, 1999
"... A simplified model of the cerebellum was developed to explore its potential for adaptive, predictive control based on delayed feedback information. An abstract representation of a single Purkinje cell with multistable properties was interfaced, via a formalized premotor network, with a simulated sin ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
A simplified model of the cerebellum was developed to explore its potential for adaptive, predictive control based on delayed feedback information. An abstract representation of a single Purkinje cell with multistable properties was interfaced, via a formalized premotor network, with a simulated single degree-of-freedom limb. The limb actuator was a nonlinear spring-mass system based on the nonlinear velocity dependence of the stretch reflex. By including realistic mossy fiber signals, as well as realistic conduction delays in afferent and efferent pathways, the model allowed the investigation of timing and predictive processes relevant to cerebellar involvement in the control of movement. The model regulates movement by learning to react in an anticipatory fashion to sensory feedback. Learning depends on training information generated from corrective movements and uses a temporally-asymmetric form of plasticity for the parallel fiber synapses on Purkinje cells. 1
The convergence of TD(X) for general k
- Machine Learning
, 1992
"... Abstract. The method of temporal differences (TD) is one way of making consistent predictions about the futgre. ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract. The method of temporal differences (TD) is one way of making consistent predictions about the futgre.
Reinforcement Learning and Artificial Intelligence
, 2003
"... Knowledge Fundamental to artificial intelligence, as well as to the theory of systems and control, is the problem of representing knowledge about the system and about possible courses of action at a multiplicity of interrelated temporal scales. For example, a human traveler must decide 6 which cit ..."
Abstract
- Add to MetaCart
Knowledge Fundamental to artificial intelligence, as well as to the theory of systems and control, is the problem of representing knowledge about the system and about possible courses of action at a multiplicity of interrelated temporal scales. For example, a human traveler must decide 6 which cities to go to, whether to fly, drive, or walk, and the individual muscle contractions involved in each step. We propose to develop further an approach to this problem based on the theory of options [49, 29]. Options are a generic concept of "courses of action" which includes both primitive actions such as muscle contractions and temporally extended actions such as traveling to a distant city. The theory of options is based on the theories of Markov and semi-Markov decision processes (SMDPs), but extends these in significant ways. Options can be used in place of actions in all of the planning and learning methods conventionally used in RL. Options and models of options can be learned for a wide variety of di#erent subtasks, and then rapidly combined to solve new tasks. Options provide a bridge between the two most important existing theoretical frameworks used in reinforcement learning---MDPs and SMDPs. Options permit planning and learning simultaneously at a wide variety of times scales, and toward a wide variety of subtasks, which substantially increases the e#ciency and abilities of RL methods.
Approximately as appeared in: Learning and Computational Neuroscience: Foundations of Adaptive Networks, M. Gabriel and J. Moore, Eds., pp. 497--537. MIT Press, 1990.
- Learning and Computational Neuroscience: Foundations of Adaptive Networks
, 1990
"... this paper, however, we analyze it from the point of view of animal learning theory. Our intended audience is both animal learning researchers interested in computational theories of behavior and machine learning researchers interested in how their learning algorithms relate to, and may be constrain ..."
Abstract
- Add to MetaCart
this paper, however, we analyze it from the point of view of animal learning theory. Our intended audience is both animal learning researchers interested in computational theories of behavior and machine learning researchers interested in how their learning algorithms relate to, and may be constrained by, animal learning studies. For an exposition of the TD model from an engineering point of view, see Chapter 13 of this volume

