Results 1  10
of
81
Simple statistical gradientfollowing algorithms for connectionist reinforcement learning
 Machine Learning
, 1992
"... Abstract. This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinfor ..."
Abstract

Cited by 430 (0 self)
 Add to MetaCart
Abstract. This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediatereinforcement tasks and certain limited forms of delayedreinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.
Connectionist Learning Procedures
 ARTIFICIAL INTELLIGENCE
, 1989
"... A major goal of research on networks of neuronlike processing units is to discover efficient learning procedures that allow these networks to construct complex internal representations of their environment. The learning procedures must be capable of modifying the connection strengths in such a way ..."
Abstract

Cited by 388 (8 self)
 Add to MetaCart
A major goal of research on networks of neuronlike processing units is to discover efficient learning procedures that allow these networks to construct complex internal representations of their environment. The learning procedures must be capable of modifying the connection strengths in such a way that internal units which are not part of the input or output come to represent important features of the task domain. Several interesting gradientdescent procedures have recently been discovered. Each connection computes the derivative, with respect to the connection strength, of a global measure of the error in the performance of the network. The strength is then adjusted in the direction that decreases the error. These relatively simple, gradientdescent learning procedures work well for small tasks and the new challenge is to find ways of improving their convergence rate and their generalization abilities so that they can be applied to larger, more realistic tasks.
The Helmholtz Machine
, 1995
"... Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative model ..."
Abstract

Cited by 232 (23 self)
 Add to MetaCart
Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative models, each pattern can be generated in exponentially many ways. It is thus intractable to adjust the parameters to maximize the probability of the observed patterns. We describe a way of finessing this combinatorial explosion by maximizing an easily computed lower bound on the probability of the observations. Our method can be viewed as a form of hierarchical selfsupervised learning that may relate to the function of bottomup and topdown cortical processing pathways.
Task Decomposition Through Competition in a Modular Connectionist Architecture
 COGNITIVE SCIENCE
, 1990
"... A novel modular connectionist architecture is presented in which the networks composing the architecture compete to learn the training patterns. As a result of the competition, different networks learn different training patterns and, thus, learn to compute different functions. The architecture pe ..."
Abstract

Cited by 206 (6 self)
 Add to MetaCart
A novel modular connectionist architecture is presented in which the networks composing the architecture compete to learn the training patterns. As a result of the competition, different networks learn different training patterns and, thus, learn to compute different functions. The architecture performs task decomposition in the sense that it learns to partition a task into two or more functionally independent vii tasks and allocates distinct networks to learn each task. In addition, the architecture tends to allocate to each task the network whose topology is most appropriate to that task, and tends to allocate the same network to similar tasks and distinct networks to dissimilar tasks. Furthermore, it can be easily modified so as to...
Learning and Sequential Decision Making
 LEARNING AND COMPUTATIONAL NEUROSCIENCE
, 1989
"... In this report we show how the class of adaptive prediction methods that Sutton called "temporal difference," or TD, methods are related to the theory of squential decision making. TD methods have been used as "adaptive critics" in connectionist learning systems, and have been pr ..."
Abstract

Cited by 203 (11 self)
 Add to MetaCart
(Show Context)
In this report we show how the class of adaptive prediction methods that Sutton called "temporal difference," or TD, methods are related to the theory of squential decision making. TD methods have been used as "adaptive critics" in connectionist learning systems, and have been proposed as models of animal learning in classical conditioning experiments. Here we relate TD methods to decision tasks formulated in terms of a stochastic dynamical system whose behavior unfolds over time under the influence of a decision maker's actions. Strategies are sought for selecting actions so as to maximize a measure of longterm payoff gain. Mathematically, tasks such as this can be formulated as Markovian decision problems, and numerous methods have been proposed for learning how to solve such problems. We show how a TD method can be understood as a novel synthesis of concepts from the theory of stochastic dynamic programming, which comprises the standard method for solving such tasks when a model of the dynamical system is available, and the theory of parameter estimation, which provides the appropriate context for studying learning rules in the form of equations for updating associative strengths in behavioral models, or connection weights in connectionist networks. Because this report is oriented primarily toward the nonengineer interested in animal learning, it presents tutorials on stochastic sequential decision tasks, stochastic dynamic programming, and parameter estimation.
Learning Without StateEstimation in Partially Observable Markovian Decision Processes
 In Proceedings of the Eleventh International Conference on Machine Learning
, 1994
"... Reinforcement learning (RL) algorithms provide a sound theoretical basis for building learning control architectures for embedded agents. Unfortunately all of the theory and much of the practice (see Barto et al., 1983, for an exception) of RL is limited to Markovian decision processes (MDPs). Many ..."
Abstract

Cited by 150 (5 self)
 Add to MetaCart
Reinforcement learning (RL) algorithms provide a sound theoretical basis for building learning control architectures for embedded agents. Unfortunately all of the theory and much of the practice (see Barto et al., 1983, for an exception) of RL is limited to Markovian decision processes (MDPs). Many realworld decision tasks, however, are inherently nonMarkovian, i.e., the state of the environment is only incompletely known to the learning agent. In this paper we consider only partially observable MDPs (POMDPs), a useful class of nonMarkovian decision processes. Most previous approaches to such problems have combined computationally expensive stateestimation techniques with learning control. This paper investigates learning in POMDPs without resorting to any form of state estimation. We present results about what TD(0) and Qlearning will do when applied to POMDPs. It is shown that the conventional discounted RL framework is inadequate to deal with POMDPs. Finally we develop a new fr...
Sastry,”Varieties of Learning Automata: An Overview
 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS
, 2002
"... Abstract—Automata models of learning systems introduced in the 1960s were popularized as learning automata (LA) in a survey paper in 1974 [1]. Since then, there have been many fundamental advances in the theory as well as applications of these learning models. In the past few years, the structure of ..."
Abstract

Cited by 71 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Automata models of learning systems introduced in the 1960s were popularized as learning automata (LA) in a survey paper in 1974 [1]. Since then, there have been many fundamental advances in the theory as well as applications of these learning models. In the past few years, the structure of LA has been modified in several directions to suit different applications. Concepts such as parameterized learning automata (PLA), generalized learning automata (GLA), and continuous actionset learning automata (CALA) have been proposed, analyzed, and applied to solve many significant learning problems. Furthermore, groups of LA forming teams and feedforward networks have been shown to converge to desired solutions under appropriate learning algorithms. Modules of LA have been used for parallel operation with consequent increase in speed of convergence. All of these concepts and results are relatively new and are scattered in technical literature. An attempt has been made in this paper to bring together the main ideas involved in a unified framework and provide pointers to relevant references. Index Terms—Continuous actionset learning automata (CALA), generalized learning automata (GLA), modules of learning automata, parameterized learning automata (PLA), teams and networks of learning automata. I.
Reinforcement learning is direct adaptive optimal control
 In Proceedings of the American control conference
, 1992
"... ..."
(Show Context)
Learning and Problem Solving with Multilayer Connectionist Systems
, 1986
"... The difficulties of learning in multilayered networks of computational units has limited the use of connectionist systems in complex domains. This dissertation elucidates the issues of learning in a network's hidden units, and reviews methods for addressing these issues that have been developed ..."
Abstract

Cited by 58 (3 self)
 Add to MetaCart
The difficulties of learning in multilayered networks of computational units has limited the use of connectionist systems in complex domains. This dissertation elucidates the issues of learning in a network's hidden units, and reviews methods for addressing these issues that have been developed through the years. Issues of learning in hidden units are shown to be analogous to learning issues for multilayer systems employing symbolic representations. Comparisons of