Results 1 
9 of
9
Simple statistical gradientfollowing algorithms for connectionist reinforcement learning
 Machine Learning
, 1992
"... Abstract. This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinfor ..."
Abstract

Cited by 321 (0 self)
 Add to MetaCart
Abstract. This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediatereinforcement tasks and certain limited forms of delayedreinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.
Learning and Problem Solving with Multilayer Connectionist Systems
, 1986
"... The difficulties of learning in multilayered networks of computational units has limited the use of connectionist systems in complex domains. This dissertation elucidates the issues of learning in a network's hidden units, and reviews methods for addressing these issues that have been developed ..."
Abstract

Cited by 53 (1 self)
 Add to MetaCart
The difficulties of learning in multilayered networks of computational units has limited the use of connectionist systems in complex domains. This dissertation elucidates the issues of learning in a network's hidden units, and reviews methods for addressing these issues that have been developed through the years. Issues of learning in hidden units are shown to be analogous to learning issues for multilayer systems employing symbolic representations. Comparisons of
Toward a Unified Theory of Learning: Multistrategy TaskAdaptive Learning
 IN: READINGS IN KNOWLEDGE ACQUISITION AND
, 1993
"... Any learning process can be viewed as a selfmodification of the leaxnefs current knowledge tArough an. interaction with some information source. Such knowledge modification is guided by the learner's deshe to achieve a certain outcome, and can engage any kind of inference. The type of inferenc ..."
Abstract

Cited by 28 (10 self)
 Add to MetaCart
Any learning process can be viewed as a selfmodification of the leaxnefs current knowledge tArough an. interaction with some information source. Such knowledge modification is guided by the learner's deshe to achieve a certain outcome, and can engage any kind of inference. The type of inference involved depends on he input information, the current (background) knowledge and the learneFs task ax hand. Based on such a view of learning, several fundamental concepts are analized and clarified, in paxticular, analytic and synthetic learning, derivm:ional and hypothetical explanation, constnictive induction, abduction, abstraction and deductive generalization. It is shown that inductive generalization and abduction can be viewed as two basic forms of general induction, and that abstraction and deductive generalization axe two related forms of constructive deduction. Using this conceptual framework, a methodology for multistrategy taskadaptive learning (MTL) is outlined, in which learning strategies axe combined dynamically, depending on the current learning situation. Speccally, an MTL learner anaLizes a "wiad" relationship among the input information, the background knowledge and the learning task, and on that basis determines which strategy, or. a combination thereof, is most appropriate at a given learning step. To implement the MTL methodology, a new knowledge representation is proposed, based on the parametric association rules (PARs). Basic ideas of MTL are illustrated by means of the wellknown "cup" example, through which is shown how an MTL learner can employ, depending the above mad relationship, emprical learning, constructive inductive generalization, abduction, explanationbased learning and absuaction.
Reinforcement learning through modulation of spiketimingdependent synaptic plasticity
 Neural Computation
, 2007
"... The persistent modification of synaptic efficacy as a function of the relative timing of pre and postsynaptic spikes is a phenomenon known as spiketimingdependent plasticity (STDP). Here we show that the modulation of STDP by a global reward signal leads to reinforcement learning. We first derive ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
The persistent modification of synaptic efficacy as a function of the relative timing of pre and postsynaptic spikes is a phenomenon known as spiketimingdependent plasticity (STDP). Here we show that the modulation of STDP by a global reward signal leads to reinforcement learning. We first derive analytically learning rules involving rewardmodulated spiketimingdependent synaptic and intrinsic plasticity, by applying a reinforcement learning algorithm to the stochastic Spike Response Model of spiking neurons. These rules have several features common to plasticity mechanisms experimentally found in the brain. We then demonstrate in simulations of networks of integrateandfire neurons the efficacy of two simple learning rules involving modulated STDP. One rule is a direct extension of the standard STDP model (modulated STDP), while the other one involves an eligibility trace stored at each synapse that keeps a decaying memory of the relationships between the recent pairs of pre and postsynaptic spike pairs (modulated STDP with eligibility trace). This latter rule permits learning even if the reward signal is delayed. The proposed rules are able to solve the XOR problem with both rate coded and temporally coded input and to learn a target output firing rate pattern. These learning rules are biologicallyplausible, may be used for training generic artificial spiking neural networks, regardless of the neural model used, and suggest the experimental investigation in animals of the existence of rewardmodulated
Learning symmetry groups with hidden units: Beyond the perceptron
 Physica
, 1986
"... Learning to recognize mirror, rotational and translational symmetries is a difficult problem for massivelyparallel network models. These symmetries cannot be learned by firstorder perceptrons or Hopfield networks, which have no means for incorporating additional adaptive units that are hidden from ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
Learning to recognize mirror, rotational and translational symmetries is a difficult problem for massivelyparallel network models. These symmetries cannot be learned by firstorder perceptrons or Hopfield networks, which have no means for incorporating additional adaptive units that are hidden from the input and output layers. We demonstrate that the Boltzmann learning algorithm is capable of finding sets of weights which turn hidden units into useful higherorder feature detectors capable of solving symmetry problems. 1.
Function Optimization Using Connectionist Reinforcement Learning Algorithms
 Connection Science
, 1991
"... Any nonassociative reinforcement learning algorithm can be viewed as a method for performing function optimization through (possibly noisecorrupted) sampling of function values. We describe the results of simulations in which the optima of several deterministic functions studied by Ackley (1987) we ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Any nonassociative reinforcement learning algorithm can be viewed as a method for performing function optimization through (possibly noisecorrupted) sampling of function values. We describe the results of simulations in which the optima of several deterministic functions studied by Ackley (1987) were sought using variants of REINFORCE algorithms (Williams, 1987; 1988). Some of the algorithms used here incorporated additional heuristic features resembling certain aspects of some of the algorithms used in Ackley's studies. Differing levels of performance were achieved by the various algorithms investigated, but a number of them performed at a level comparable to the best found in Ackley's studies on a number of the tasks, in spite of their simplicity. One of these variants, called REINFORCE/MENT, represents a novel but principled approach to reinforcement learning in nontrivial networks which incorporates an entropy maximization strategy. This was found to perform especially well on more hierarchically organized tasks.
Multistrategy Constructive Learning: Toward a Unified Theory of Learning
 IN: READINGS IN KNOWLEDGE ACQUISITION AND
, 1993
"... Any learning process can be viewed as a selfmodification of the leamer's current knowledge through an interaction with some information source. Such knowledge modification s graded by the learner s destre to achieve a certain outcome, and can engage any kind of inference. The typ0 of infere ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Any learning process can be viewed as a selfmodification of the leamer's current knowledge through an interaction with some information source. Such knowledge modification s graded by the learner s destre to achieve a certain outcome, and can engage any kind of inference. The typ0 of inference involved depends on the input information, the current (background) knowledge and the learne's task,.at h, and: Based on such a view of learning, several fundamental concepts are ananzeu ano clarified, in particular, analytic and synthetic learning, derivational and hypothetical explanation, constructive induction, abduction, abstraction and deductive generalization. It is shown that inductive generalization and abduction can be viewed as two basic forms of general induction, and that abstraction and deductive generalization are two related forms of constructive deduction. Using this conceptual framework, a methodology for multistrategy taskadaptive learning (MTL) is outlined, in which learning strategies are combined dynamically, depending on the current learning situation. Specifically, an MTL learner anali?es a "triad" relationship among the input information, the background knowledge and the learning task, and on that basis determines which strategy, or a combination thereof, is most appropriate at a given learning step. To implement the MTL methodology, a new knowledge representation is proposed, based on the parametric association rules (PARs). Basic ideas of MTL are illustrated by means of the wellknown "cup" example, through which is shown how an MTL leamer can employ, depending on the above triad relationship, emprical learning, constructive inductive generalization, abduction, explanationbased learning and abstraction.
Reinforcement Learning Algorithms as Function Optimizers
 In International Joint Conference on Neural Networks (Washington
, 1989
"... Any nonassociative reinforcement learning algorithm can be viewed as a method for performing function optimization through (possibly noisecorrupted) sampling of function values. We describe the results of simulations in which the optima of several deterministic functions studied by Ackley [1] were ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Any nonassociative reinforcement learning algorithm can be viewed as a method for performing function optimization through (possibly noisecorrupted) sampling of function values. We describe the results of simulations in which the optima of several deterministic functions studied by Ackley [1] were sought using variants of REINFORCE algorithms [19], [20]. Results obtained for certain of these algorithms compare favorably to the best results found by Ackley.
Application of connectionist learning methods to manufacturing process monitoring
 In Proc. of IEEE International Symposium on Intelligent Control
, 1988
"... Laboratories involves studying quality control of a fluorescent bulb manufacturing line. All manufacturing processes are subject to incompletely understood changes due to variations in raw materials, ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Laboratories involves studying quality control of a fluorescent bulb manufacturing line. All manufacturing processes are subject to incompletely understood changes due to variations in raw materials,