Results 1 - 10
of
23
Reinforcement Learning I: Introduction
, 1998
"... In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search ..."
Abstract
-
Cited by 2829 (76 self)
- Add to MetaCart
In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search) plus learning (association, memory). We argue that RL is the only field that seriously addresses the special features of the problem of learning from interaction to achieve long-term goals.
Connectionist Learning Procedures
- ARTIFICIAL INTELLIGENCE
, 1989
"... A major goal of research on networks of neuron-like processing units is to discover efficient learning procedures that allow these networks to construct complex internal representations of their environment. The learning procedures must be capable of modifying the connection strengths in such a way ..."
Abstract
-
Cited by 290 (6 self)
- Add to MetaCart
A major goal of research on networks of neuron-like processing units is to discover efficient learning procedures that allow these networks to construct complex internal representations of their environment. The learning procedures must be capable of modifying the connection strengths in such a way that internal units which are not part of the input or output come to represent important features of the task domain. Several interesting gradient-descent procedures have recently been discovered. Each connection computes the derivative, with respect to the connection strength, of a global measure of the error in the performance of the network. The strength is then adjusted in the direction that decreases the error. These relatively simple, gradient-descent learning procedures work well for small tasks and the new challenge is to find ways of improving their convergence rate and their generalization abilities so that they can be applied to larger, more realistic tasks.
Learning and Sequential Decision Making
- LEARNING AND COMPUTATIONAL NEUROSCIENCE
, 1989
"... In this report we show how the class of adaptive prediction methods that Sutton called "temporal difference," or TD, methods are related to the theory of squential decision making. TD methods have been used as "adaptive critics" in connectionist learning systems, and have been proposed as models of ..."
Abstract
-
Cited by 185 (10 self)
- Add to MetaCart
In this report we show how the class of adaptive prediction methods that Sutton called "temporal difference," or TD, methods are related to the theory of squential decision making. TD methods have been used as "adaptive critics" in connectionist learning systems, and have been proposed as models of animal learning in classical conditioning experiments. Here we relate TD methods to decision tasks formulated in terms of a stochastic dynamical system whose behavior unfolds over time under the influence of a decision maker's actions. Strategies are sought for selecting actions so as to maximize a measure of long-term payoff gain. Mathematically, tasks such as this can be formulated as Markovian decision problems, and numerous methods have been proposed for learning how to solve such problems. We show how a TD method can be understood as a novel synthesis of concepts from the theory of stochastic dynamic programming, which comprises the standard method for solving such tasks when a model of the dynamical system is available, and the theory of parameter estimation, which provides the appropriate context for studying learning rules in the form of equations for updating associative strengths in behavioral models, or connection weights in connectionist networks. Because this report is oriented primarily toward the non-engineer interested in animal learning, it presents tutorials on stochastic sequential decision tasks, stochastic dynamic programming, and parameter estimation.
Interaction and Intelligent Behavior
, 1994
"... This thesis addresses situated, embodied agents interacting in complex domains. It focuses on two problems: 1) synthesis and analysis of intelligent group behavior, and 2) learning in complex group environments. Basic behaviors, control laws that cluster constraints to achieve particular goals and h ..."
Abstract
-
Cited by 139 (20 self)
- Add to MetaCart
This thesis addresses situated, embodied agents interacting in complex domains. It focuses on two problems: 1) synthesis and analysis of intelligent group behavior, and 2) learning in complex group environments. Basic behaviors, control laws that cluster constraints to achieve particular goals and have the appropriate compositional properties, are proposed as effective primitives for control and learning. The thesis describes the process of selecting such basic behaviors, formally specifying them, algorithmically implementing them, and empirically evaluating them. All of the proposed ideas are validated with a group of up to 20 mobile robots using a basic behavior set consisting of: safe--wandering, following, aggregation, dispersion, and homing. The set of basic behaviors acts as a substrate for achieving more complex high--level goals and tasks. Two behavior combination operators are introduced, and verified by combining subsets of the above basic behavior set to implement collective flocking, foraging, and docking. A methodology is introduced for automatically constructing higher--level behaviors
Recent advances in hierarchical reinforcement learning
, 2003
"... A preliminary unedited version of this paper was incorrectly published as part of Volume ..."
Abstract
-
Cited by 119 (18 self)
- Add to MetaCart
A preliminary unedited version of this paper was incorrectly published as part of Volume
Learning and Problem Solving with Multilayer Connectionist Systems
, 1986
"... Learning and Problem Solving with Multilayer Connectionist Systems September 1986 Charles William Anderson B.S., University of Nebraska M.S., University of Massachusetts Ph.D., University of Massachusetts Directed by: Professor Andrew G. Barto The di#culties of learning in multilayered netwo ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
Learning and Problem Solving with Multilayer Connectionist Systems September 1986 Charles William Anderson B.S., University of Nebraska M.S., University of Massachusetts Ph.D., University of Massachusetts Directed by: Professor Andrew G. Barto The di#culties of learning in multilayered networks of computational units has limited the use of connectionist systems in complex domains. This dissertation elucidates the issues of learning in a network's hidden units, and reviews methods for addressing these issues that have been developed through the years. Issues of learning in hidden units are shown to be analogous to learning issues for multilayer systems employing symbolic representations.
Reinforcement Learning And Its Application To Control
, 1992
"... Learning control involves modifying a controller's behavior to improve its performance as measured by some predefined index of performance (IP). If control actions that improve performance as measured by the IP are known, supervised learning methods, or methods for learning from examples, can be us ..."
Abstract
-
Cited by 49 (2 self)
- Add to MetaCart
Learning control involves modifying a controller's behavior to improve its performance as measured by some predefined index of performance (IP). If control actions that improve performance as measured by the IP are known, supervised learning methods, or methods for learning from examples, can be used to train the controller. But when such control actions are not known a priori, appropriate control behavior has to be inferred from observations of the IP. One can distinguish between two classes of methods for training controllers under such circumstances. Indirect methods involve constructing a model of the problem's IP and using the model to obtain training information for the controller. On the other hand, direct, or model-free,...
Learning in spiking neural networks by reinforcement of stochastic synaptic transmission
- Neuron
, 2003
"... prising and potentially detrimental to brain function. But another possibility is that synaptic unreliability is used by the brain for the purposes of learning (Minsky, 1954; Hinton, 1989), in analogy to the way in which unreliable genetic replication is used for evolution. Here I propose a specific ..."
Abstract
-
Cited by 29 (6 self)
- Add to MetaCart
prising and potentially detrimental to brain function. But another possibility is that synaptic unreliability is used by the brain for the purposes of learning (Minsky, 1954; Hinton, 1989), in analogy to the way in which unreliable genetic replication is used for evolution. Here I propose a specific implementation of this idea. According to the proposal, synapses are “hedonistic,” responding to a global reward signal by increasing their probabilities of release or failure, depending on which action immediately preceded reward. Remarkably, if each synapse in a network behaves hedonistically, selfishly seeking reward, then the network as a whole be-haves hedonistically, learning to increase its average reward by generating appropriate collective actions. This statement can be formulated and justified mathematically
Learning to Solve Multiple Goals
, 1997
"... In many domains, the task can be decomposed into a set of independent subgoals. Often, such tasks are too complex to be learned using standard techniques such as Reinforcement Learning. The complexity is caused by the learning system having to keep track of the status of all sub-goals concurrently. ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
In many domains, the task can be decomposed into a set of independent subgoals. Often, such tasks are too complex to be learned using standard techniques such as Reinforcement Learning. The complexity is caused by the learning system having to keep track of the status of all sub-goals concurrently. Thus, if the solution to one sub-goal is known when another sub-goal is in some given state, the known solution must be relearned when the status of the other sub-goal changes. This dissertation presents a modular approach to reinforcement learning that takes advantage of task decomposition to avoid unnecessary relearning. In the modular approach, modules are created to learn each sub-goal. Each module receives only those inputs relevant to its associated sub-goal, and can therefore learn without being affected by the state of other sub-goals. Furthermore, each module searches a much smaller space than that defined by all inputs considered together, thereby greatly reducing learning time. Si...

