Results 1 -
8 of
8
Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Machine Learning
, 1992
"... Abstract. This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinfor ..."
Abstract
-
Cited by 262 (0 self)
- Add to MetaCart
Abstract. This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.
Learning and Problem Solving with Multilayer Connectionist Systems
, 1986
"... Learning and Problem Solving with Multilayer Connectionist Systems September 1986 Charles William Anderson B.S., University of Nebraska M.S., University of Massachusetts Ph.D., University of Massachusetts Directed by: Professor Andrew G. Barto The di#culties of learning in multilayered netwo ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
Learning and Problem Solving with Multilayer Connectionist Systems September 1986 Charles William Anderson B.S., University of Nebraska M.S., University of Massachusetts Ph.D., University of Massachusetts Directed by: Professor Andrew G. Barto The di#culties of learning in multilayered networks of computational units has limited the use of connectionist systems in complex domains. This dissertation elucidates the issues of learning in a network's hidden units, and reviews methods for addressing these issues that have been developed through the years. Issues of learning in hidden units are shown to be analogous to learning issues for multilayer systems employing symbolic representations.
Toward a Unified Theory of Learning: Multistrategy Task-Adaptive Learning
- IN: READINGS IN KNOWLEDGE ACQUISITION AND
, 1993
"... Any learning process can be viewed as a self-modification of the leaxnefs current knowledge tArough an. interaction with some information source. Such knowledge modification is guided by the learner's deshe to achieve a certain outcome, and can engage any kind of inference. The type of inference inv ..."
Abstract
-
Cited by 28 (9 self)
- Add to MetaCart
Any learning process can be viewed as a self-modification of the leaxnefs current knowledge tArough an. interaction with some information source. Such knowledge modification is guided by the learner's deshe to achieve a certain outcome, and can engage any kind of inference. The type of inference involved depends on he input information, the current (background) knowledge and the learneFs task ax hand. Based on such a view of learning, several fundamental concepts are analized and clarified, in paxticular, analytic and synthetic learning, derivm:ional and hypothetical explanation, constnictive induction, abduction, abstraction and deductive generalization. It is shown that inductive generalization and abduction can be viewed as two basic forms of general induction, and that abstraction and deductive generalization axe two related forms of constructive deduction. Using this conceptual framework, a methodology for multistrategy task-adaptive learning (MTL) is outlined, in which learning strategies axe combined dynamically, depending on the current learning situation. Speccally, an MTL learner anaLizes a "wiad" relationship among the input information, the background knowledge and the learning task, and on that basis determines which strategy, or. a combination thereof, is most appropriate at a given learning step. To implement the MTL methodology, a new knowledge representation is proposed, based on the parametric association rules (PARs). Basic ideas of MTL are illustrated by means of the well-known "cup" example, through which is shown how an MTL learner can employ, depending the above mad relationship, emprical learning, constructive inductive generalization, abduction, explanation-based learning and absuaction.
Learning symmetry groups with hidden units: Beyond the perceptron
- Physica
, 1986
"... Learning to recognize mirror, rotational and translational symmetries is a difficult problem for massively-parallel network models. These symmetries cannot be learned by first-order perceptrons or Hopfield networks, which have no means for incorporating additional adaptive units that are hidden from ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
Learning to recognize mirror, rotational and translational symmetries is a difficult problem for massively-parallel network models. These symmetries cannot be learned by first-order perceptrons or Hopfield networks, which have no means for incorporating additional adaptive units that are hidden from the input and output layers. We demonstrate that the Boltzmann learning algorithm is capable of finding sets of weights which turn hidden units into useful higher-order feature detectors capable of solving symmetry problems. 1.
Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity
- Neural Computation
, 2007
"... The persistent modification of synaptic efficacy as a function of the relative timing of pre- and postsynaptic spikes is a phenomenon known as spiketiming-dependent plasticity (STDP). Here we show that the modulation of STDP by a global reward signal leads to reinforcement learning. We first derive ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
The persistent modification of synaptic efficacy as a function of the relative timing of pre- and postsynaptic spikes is a phenomenon known as spiketiming-dependent plasticity (STDP). Here we show that the modulation of STDP by a global reward signal leads to reinforcement learning. We first derive analytically learning rules involving reward-modulated spike-timing-dependent synaptic and intrinsic plasticity, by applying a reinforcement learning algorithm to the stochastic Spike Response Model of spiking neurons. These rules have several features common to plasticity mechanisms experimentally found in the brain. We then demonstrate in simulations of networks of integrateand-fire neurons the efficacy of two simple learning rules involving modulated STDP. One rule is a direct extension of the standard STDP model (modulated STDP), while the other one involves an eligibility trace stored at each synapse that keeps a decaying memory of the relationships between the recent pairs of pre- and postsynaptic spike pairs (modulated STDP with eligibility trace). This latter rule permits learning even if the reward signal is delayed. The proposed rules are able to solve the XOR problem with both rate coded and temporally coded input and to learn a target output firing rate pattern. These learning rules are biologically-plausible, may be used for training generic artificial spiking neural networks, regardless of the neural model used, and suggest the experimental investigation in animals of the existence of reward-modulated
Function Optimization Using Connectionist Reinforcement Learning Algorithms
- Connection Science
, 1991
"... Any nonassociative reinforcement learning algorithm can be viewed as a method for performing function optimization through (possibly noise-corrupted) sampling of function values. We describe the results of simulations in which the optima of several deterministic functions studied by Ackley (1987) we ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Any nonassociative reinforcement learning algorithm can be viewed as a method for performing function optimization through (possibly noise-corrupted) sampling of function values. We describe the results of simulations in which the optima of several deterministic functions studied by Ackley (1987) were sought using variants of REINFORCE algorithms (Williams, 1987; 1988). Some of the algorithms used here incorporated additional heuristic features resembling certain aspects of some of the algorithms used in Ackley's studies. Differing levels of performance were achieved by the various algorithms investigated, but a number of them performed at a level comparable to the best found in Ackley's studies on a number of the tasks, in spite of their simplicity. One of these variants, called REINFORCE/MENT, represents a novel but principled approach to reinforcement learning in nontrivial networks which incorporates an entropy maximization strategy. This was found to perform especially well on more hierarchically organized tasks.
Reinforcement Learning Algorithms as Function Optimizers
- In International Joint Conference on Neural Networks (Washington
, 1989
"... Any nonassociative reinforcement learning algorithm can be viewed as a method for performing function optimization through (possibly noise-corrupted) sampling of function values. We describe the results of simulations in which the optima of several deterministic functions studied by Ackley [1] were ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Any nonassociative reinforcement learning algorithm can be viewed as a method for performing function optimization through (possibly noise-corrupted) sampling of function values. We describe the results of simulations in which the optima of several deterministic functions studied by Ackley [1] were sought using variants of REINFORCE algorithms [19], [20]. Results obtained for certain of these algorithms compare favorably to the best results found by Ackley.
Multistrategy Constructive Learning: Toward a Unified Theory of Learning
- IN: READINGS IN KNOWLEDGE ACQUISITION AND
, 1993
"... Any learning process can be viewed as a self-modification of the leamer's current knowledge through an interaction with some information source. Such knowledge modification s graded by the learner s destre to achieve a certain outcome, and can engage any kind of inference. The typ0 of inference i ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Any learning process can be viewed as a self-modification of the leamer's current knowledge through an interaction with some information source. Such knowledge modification s graded by the learner s destre to achieve a certain outcome, and can engage any kind of inference. The typ0 of inference involved depends on the input information, the current (background) knowledge and the learne's task,.at h, and: Based on such a view of learning, several fundamental concepts are ananzeu ano clarified, in particular, analytic and synthetic learning, derivational and hypothetical explanation, constructive induction, abduction, abstraction and deductive generalization. It is shown that inductive generalization and abduction can be viewed as two basic forms of general induction, and that abstraction and deductive generalization are two related forms of constructive deduction. Using this conceptual framework, a methodology for multistrategy task-adaptive learning (MTL) is outlined, in which learning strategies are combined dynamically, depending on the current learning situation. Specifically, an MTL learner anali?es a "triad" relationship among the input information, the background knowledge and the learning task, and on that basis determines which strategy, or a combination thereof, is most appropriate at a given learning step. To implement the MTL methodology, a new knowledge representation is proposed, based on the parametric association rules (PARs). Basic ideas of MTL are illustrated by means of the well-known "cup" example, through which is shown how an MTL leamer can employ, depending on the above triad relationship, emprical learning, constructive inductive generalization, abduction, explanation-based learning and abstraction.

