Results 1 - 10
of
54
Connectionist Learning Procedures
- ARTIFICIAL INTELLIGENCE
, 1989
"... A major goal of research on networks of neuron-like processing units is to discover efficient learning procedures that allow these networks to construct complex internal representations of their environment. The learning procedures must be capable of modifying the connection strengths in such a way ..."
Abstract
-
Cited by 290 (6 self)
- Add to MetaCart
A major goal of research on networks of neuron-like processing units is to discover efficient learning procedures that allow these networks to construct complex internal representations of their environment. The learning procedures must be capable of modifying the connection strengths in such a way that internal units which are not part of the input or output come to represent important features of the task domain. Several interesting gradient-descent procedures have recently been discovered. Each connection computes the derivative, with respect to the connection strength, of a global measure of the error in the performance of the network. The strength is then adjusted in the direction that decreases the error. These relatively simple, gradient-descent learning procedures work well for small tasks and the new challenge is to find ways of improving their convergence rate and their generalization abilities so that they can be applied to larger, more realistic tasks.
Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Machine Learning
, 1992
"... Abstract. This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinfor ..."
Abstract
-
Cited by 262 (0 self)
- Add to MetaCart
Abstract. This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.
Learning and Sequential Decision Making
- LEARNING AND COMPUTATIONAL NEUROSCIENCE
, 1989
"... In this report we show how the class of adaptive prediction methods that Sutton called "temporal difference," or TD, methods are related to the theory of squential decision making. TD methods have been used as "adaptive critics" in connectionist learning systems, and have been proposed as models of ..."
Abstract
-
Cited by 185 (10 self)
- Add to MetaCart
In this report we show how the class of adaptive prediction methods that Sutton called "temporal difference," or TD, methods are related to the theory of squential decision making. TD methods have been used as "adaptive critics" in connectionist learning systems, and have been proposed as models of animal learning in classical conditioning experiments. Here we relate TD methods to decision tasks formulated in terms of a stochastic dynamical system whose behavior unfolds over time under the influence of a decision maker's actions. Strategies are sought for selecting actions so as to maximize a measure of long-term payoff gain. Mathematically, tasks such as this can be formulated as Markovian decision problems, and numerous methods have been proposed for learning how to solve such problems. We show how a TD method can be understood as a novel synthesis of concepts from the theory of stochastic dynamic programming, which comprises the standard method for solving such tasks when a model of the dynamical system is available, and the theory of parameter estimation, which provides the appropriate context for studying learning rules in the form of equations for updating associative strengths in behavioral models, or connection weights in connectionist networks. Because this report is oriented primarily toward the non-engineer interested in animal learning, it presents tutorials on stochastic sequential decision tasks, stochastic dynamic programming, and parameter estimation.
Task Decomposition Through Competition in a Modular Connectionist Architecture
- COGNITIVE SCIENCE
, 1990
"... A novel modular connectionist architecture is presented in which the networks composing the architecture compete to learn the training patterns. As a result of the competition, different networks learn different training patterns and, thus, learn to compute different functions. The architecture pe ..."
Abstract
-
Cited by 167 (4 self)
- Add to MetaCart
A novel modular connectionist architecture is presented in which the networks composing the architecture compete to learn the training patterns. As a result of the competition, different networks learn different training patterns and, thus, learn to compute different functions. The architecture performs task decomposition in the sense that it learns to partition a task into two or more functionally independent vii tasks and allocates distinct networks to learn each task. In addition, the architecture tends to allocate to each task the network whose topology is most appropriate to that task, and tends to allocate the same network to similar tasks and distinct networks to dissimilar tasks. Furthermore, it can be easily modified so as to...
The Helmholtz Machine
, 1995
"... Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative model ..."
Abstract
-
Cited by 165 (22 self)
- Add to MetaCart
Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative models, each pattern can be generated in exponentially many ways. It is thus intractable to adjust the parameters to maximize the probability of the observed patterns. We describe a way of finessing this combinatorial explosion by maximizing an easily computed lower bound on the probability of the observations. Our method can be viewed as a form of hierarchical self-supervised learning that may relate to the function of bottom-up and top-down cortical processing pathways.
Learning Without State-Estimation in Partially Observable Markovian Decision Processes
- In Proceedings of the Eleventh International Conference on Machine Learning
, 1994
"... Reinforcement learning (RL) algorithms provide a sound theoretical basis for building learning control architectures for embedded agents. Unfortunately all of the theory and much of the practice (see Barto et al., 1983, for an exception) of RL is limited to Markovian decision processes (MDPs). Many ..."
Abstract
-
Cited by 115 (5 self)
- Add to MetaCart
Reinforcement learning (RL) algorithms provide a sound theoretical basis for building learning control architectures for embedded agents. Unfortunately all of the theory and much of the practice (see Barto et al., 1983, for an exception) of RL is limited to Markovian decision processes (MDPs). Many realworld decision tasks, however, are inherently non-Markovian, i.e., the state of the environment is only incompletely known to the learning agent. In this paper we consider only partially observable MDPs (POMDPs), a useful class of non-Markovian decision processes. Most previous approaches to such problems have combined computationally expensive state-estimation techniques with learning control. This paper investigates learning in POMDPs without resorting to any form of state estimation. We present results about what TD(0) and Q-learning will do when applied to POMDPs. It is shown that the conventional discounted RL framework is inadequate to deal with POMDPs. Finally we develop a new fr...
Learning and Problem Solving with Multilayer Connectionist Systems
, 1986
"... Learning and Problem Solving with Multilayer Connectionist Systems September 1986 Charles William Anderson B.S., University of Nebraska M.S., University of Massachusetts Ph.D., University of Massachusetts Directed by: Professor Andrew G. Barto The di#culties of learning in multilayered netwo ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
Learning and Problem Solving with Multilayer Connectionist Systems September 1986 Charles William Anderson B.S., University of Nebraska M.S., University of Massachusetts Ph.D., University of Massachusetts Directed by: Professor Andrew G. Barto The di#culties of learning in multilayered networks of computational units has limited the use of connectionist systems in complex domains. This dissertation elucidates the issues of learning in a network's hidden units, and reviews methods for addressing these issues that have been developed through the years. Issues of learning in hidden units are shown to be analogous to learning issues for multilayer systems employing symbolic representations.
Reinforcement Learning And Its Application To Control
, 1992
"... Learning control involves modifying a controller's behavior to improve its performance as measured by some predefined index of performance (IP). If control actions that improve performance as measured by the IP are known, supervised learning methods, or methods for learning from examples, can be us ..."
Abstract
-
Cited by 49 (2 self)
- Add to MetaCart
Learning control involves modifying a controller's behavior to improve its performance as measured by some predefined index of performance (IP). If control actions that improve performance as measured by the IP are known, supervised learning methods, or methods for learning from examples, can be used to train the controller. But when such control actions are not known a priori, appropriate control behavior has to be inferred from observations of the IP. One can distinguish between two classes of methods for training controllers under such circumstances. Indirect methods involve constructing a model of the problem's IP and using the model to obtain training information for the controller. On the other hand, direct, or model-free,...
Learning to Solve Markovian Decision Processes
, 1994
"... This dissertation is about building learning control architectures for agents embedded in finite, stationary, and Markovian environments. Such architectures give embedded agents the ability to improve autonomously the efficiency with which they can achieve goals. Machine learning researchers have d ..."
Abstract
-
Cited by 43 (3 self)
- Add to MetaCart
This dissertation is about building learning control architectures for agents embedded in finite, stationary, and Markovian environments. Such architectures give embedded agents the ability to improve autonomously the efficiency with which they can achieve goals. Machine learning researchers have developed reinforcement learning (RL) algorithms based on dynamic programming (DP) that use the agent's experience in its environment to improve its decision policy incrementally. This is achieved by adapting an evaluation function in such a way that the decision policy that is "greedy" with respect to it improves with experience. This dissertation focuses on finite, stationary and Markovian environments for two reasons: it allows the develop...

