Results 1 - 10
of
33
Learning and Sequential Decision Making
- LEARNING AND COMPUTATIONAL NEUROSCIENCE
, 1989
"... In this report we show how the class of adaptive prediction methods that Sutton called "temporal difference," or TD, methods are related to the theory of squential decision making. TD methods have been used as "adaptive critics" in connectionist learning systems, and have been proposed as models of ..."
Abstract
-
Cited by 185 (10 self)
- Add to MetaCart
In this report we show how the class of adaptive prediction methods that Sutton called "temporal difference," or TD, methods are related to the theory of squential decision making. TD methods have been used as "adaptive critics" in connectionist learning systems, and have been proposed as models of animal learning in classical conditioning experiments. Here we relate TD methods to decision tasks formulated in terms of a stochastic dynamical system whose behavior unfolds over time under the influence of a decision maker's actions. Strategies are sought for selecting actions so as to maximize a measure of long-term payoff gain. Mathematically, tasks such as this can be formulated as Markovian decision problems, and numerous methods have been proposed for learning how to solve such problems. We show how a TD method can be understood as a novel synthesis of concepts from the theory of stochastic dynamic programming, which comprises the standard method for solving such tasks when a model of the dynamical system is available, and the theory of parameter estimation, which provides the appropriate context for studying learning rules in the form of equations for updating associative strengths in behavioral models, or connection weights in connectionist networks. Because this report is oriented primarily toward the non-engineer interested in animal learning, it presents tutorials on stochastic sequential decision tasks, stochastic dynamic programming, and parameter estimation.
A framework for mesencephalic dopamine systems based on predictive Hebbian learning
- J. Neurosci
, 1996
"... We develop a theoretical framework that shows how mesencephalic dopamine systems could distribute to their targets a signal that represents information about future expectations. In particular, we show how activity in the cerebral cortex can make predictions about future receipt of reward and how fl ..."
Abstract
-
Cited by 150 (19 self)
- Add to MetaCart
We develop a theoretical framework that shows how mesencephalic dopamine systems could distribute to their targets a signal that represents information about future expectations. In particular, we show how activity in the cerebral cortex can make predictions about future receipt of reward and how fluctuations in the activity levels of neurons in diffuse dopamine systems above and below baseline levels would represent errors in these predictions that are delivered to cortical and subcottical targets. We present a model for how such errors could be constructed in a real brain that is consistent with physiological results for a subset of dopaminergic neurons located in the ventral tegmental area and surrounding dopaminergic neurons. The theory also makes testable predictions about human choice behavior on a simple decision-making task. Furthermore, we show that, through a simple influence on synaptic plasticity, fluctuations in dopamine release can act to change the predictions in an appropriate manner. Key words: prediction; dopamine; diffuse ascending systems; synaptic plasticity; reinforcement learning; reward In mammals, mesencephalic dopamine neurons participate in a number of important cognitive and physiological functions including motivational processes (Wise, 1982; Fibiger and Phillips, 1986; Koob and Bloom, 1988) reward processing (Wise, 1982) working
Synthetic grammar learning: Implicit rule abstraction or explicit fragmentary knowledge
- Journal of Experimental Psychology: General
, 1990
"... 3 experiments were designed to demonstrate that classifying new letter strings as grammatical (i.e., conforming to a set of rules called a synthetic grammar) or ungrammatical may proceed from fragmentary conscious knowledge of the bigrams constituting the grammatical strings displayed in the study p ..."
Abstract
-
Cited by 42 (2 self)
- Add to MetaCart
3 experiments were designed to demonstrate that classifying new letter strings as grammatical (i.e., conforming to a set of rules called a synthetic grammar) or ungrammatical may proceed from fragmentary conscious knowledge of the bigrams constituting the grammatical strings displayed in the study phase, rather than from an unconscious structured representation of the grammar, as Reber (1989) contended. In Experiment 1, grammaticality judgments of subjects initially studying grammatical letter strings did not differ from judgments by subjects learning from a list of the bigrams making up these strings. In Experiment 2, judgments about nongram-matical strings composed of valid bigrams placed in invalid locations were extremely poor, although better than chance. In Experiment 3 the explicit knowledge of bigrams as assessed by a recognition procedure appeared sufficient to account for observed performance on a standard test of grammaticality. A widely held model of cognition endows human subjects with the ability to implicitly abstract the regularities or high-level rules embodied in richly structured stimulus domains. Over the last 20 years, this general model has received strong
Toward a unified model of attention in associative learning
- Journal of Mathematical Psychology
, 2001
"... Two connectionist models of attention in associative learning, previously used to model human category learning, are shown to have special cases that are essentially equivalent to N. J. Mackintosh's (1975, Psychological Review, 82, 276 298) classic model of attention in animal learning. The models u ..."
Abstract
-
Cited by 37 (1 self)
- Add to MetaCart
Two connectionist models of attention in associative learning, previously used to model human category learning, are shown to have special cases that are essentially equivalent to N. J. Mackintosh's (1975, Psychological Review, 82, 276 298) classic model of attention in animal learning. The models unify formulas for associative weight change with formulas for attentional change, under a common goal of error reduction. Error-driven attentional shifting accelerates learning of new associations but also protects previously learned associations from retroactive interference. The models are fit to data from a recent experiment in human associative learning (J. K. Kruschke 6 N. J. Blair, 2000, Psychonomic Bulletin 6 Review, 7, 636 645), which shows that blocking of learning involves learned inattention. The approach also provides a novel and unifying theory of latent inhibition (the preexposure effect) in terms of blocking. The discussion summarizes how the approach accounts for a variety of other ``irrational' ' phenomena in associative learning, including base rate effects, perseveration of attention through relevance
The Predictive Brain: Temporal Coincidence and Temporal Order in Synaptic . . .
, 1994
"... Some forms of synaptic plasticity depend on the temporal coincidence of presynaptic activity and postsynaptic response. This requirement is consistent with the Hebbian, or correlational, type of learning rule used in many neural network models. Recent evidence suggests that synaptic plasticity may d ..."
Abstract
-
Cited by 31 (7 self)
- Add to MetaCart
Some forms of synaptic plasticity depend on the temporal coincidence of presynaptic activity and postsynaptic response. This requirement is consistent with the Hebbian, or correlational, type of learning rule used in many neural network models. Recent evidence suggests that synaptic plasticity may depend in part on the production of a membrane permeant-diffusible signal so that spatial volume may also be involved in correlational learning rules. This latter form of synaptic change has been called volume learning. In both Hebbian and volume learning rules, interaction among synaptic inputs depends on the degree of coincidence of the inputs and is otherwise insensitive to their exact temporal order. Conditioning experiments and psychophysical studies have shown, however, that most animals are highly sensitive to the temporal order of the sensory inputs. Although these experiments assay the behavior of the entire animal or perceptual system, they raise the possibility that nervous systems may be sensitive to temporally ordered events at many spatial and temporal scales. We suggest here the existence of a new class of learning rule, called apredictiue Hebbian learning rule, that is sensitive to the temporal ordering of synaptic inputs. We show how this predictive learning rule could act at single synaptic connections and through diffuse neuromodulatory systems.
Temporal Difference Model Reproduces Anticipatory Neural Activity
, 2000
"... Introduction In a famous experiment by Pavlov (1927), a dog was trained with the ringing of a bell (stimulus) followed by food delivery (reinforcer). In the first trial, the animal salivated when food was presented. After several trials, salivation started when the bell was rung. This finding sugge ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
Introduction In a famous experiment by Pavlov (1927), a dog was trained with the ringing of a bell (stimulus) followed by food delivery (reinforcer). In the first trial, the animal salivated when food was presented. After several trials, salivation started when the bell was rung. This finding suggests that the salivation response following the bell ring reflects anticipation of food delivery. A large body of experimental evidence led to the hypothesis that Pavlovian learning is dependent upon the degree of unpredictability of the reinforcer (Rescorla & Wagner, 1972; Dickinson, 1980). According to this hypothesis, reinforcers become progressively less efficient for behavioral adaptation as their predictability grows during the course of learning. The difference between the actual occurrence and the prediction of the reinforcer is usually referred to as the "error" in the reinforcer prediction. This concept has been employed in the temporal-difference model (TD model) of Pavlovi
Landmark Stability: Studies Exploring Whether the Perceived Stability of the Environment Influences Spatial Representation
- The Journal of Experimental Biology
, 1996
"... To investigate whether spatial learning complies with associative learning theories or with theories of cognitive mapping, rats were trained in three experiments exploring the effect of variations in spatial predictive relationships. In experiment 1, it was found that making one of two landmarks the ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
To investigate whether spatial learning complies with associative learning theories or with theories of cognitive mapping, rats were trained in three experiments exploring the effect of variations in spatial predictive relationships. In experiment 1, it was found that making one of two landmarks the sole spatial predictor of reward, by varying the spatial relationship between reward and other cues, reduced the control over search exerted by that landmark compared with that observed when the landmark and context cues were both reliable predictors of reward location. This requirement for landmark stability rather than predictive power appears to contradict results obtained in conventional conditioning paradigms. Discrimination learning was unaffected, suggesting a dissociation between discrimination and spatial learning with respect to the influence of geometric stability. Further experiments used arrays of both single and multiple landmarks. Experiment 2 revealed that the stability of a
Pavlovian conditioning: It’s not what you think it is
- American Psychologist
, 1988
"... Abstract: Current thinking about Pavlovian conditioning differs substantially from that of 20 years ago. Yet the changes that have taken place remain poorly appreciated by psychologists generally. Traditional descriptions of conditioning as the acquired ability of one stimulus to evoke the original ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Abstract: Current thinking about Pavlovian conditioning differs substantially from that of 20 years ago. Yet the changes that have taken place remain poorly appreciated by psychologists generally. Traditional descriptions of conditioning as the acquired ability of one stimulus to evoke the original response to another because of their pairing are shown to be inadequate. They fail to characterize adequately the circumstances producing learning, the content of that learning, or the manner in which that learning influences performance. Instead, conditioning is now described as the learning of relations among events so as to allow the organism to represent its environment. Within this framework, the study of Pavlovian conditioning continues to be an intellectually active area, full of new discoveries and information relevant to other areas of psychology. Pavlovian conditioning is one of the oldest and most systematically studied phenomena in psychology. Outside of psychology, it is one of our best-known findings. But at the same time, within psychology it is badly misunderstood and misrepresented. In the last 20 years, knowledge of the associative processes underlying Pavlovian conditioning has expanded dramatically. The
A behavioural preparation for the study of human Pavlovian conditioning
- Q J Exp Psychol B
, 1996
"... Conditioned suppression is a useful technique for assessing whether subjects have learned a CS ± US association, but it is dif ® cult to use in humans because of the need for an aversive US. The purpose of this research was to develop a non-aversive procedure that would produce suppression. Subjects ..."
Abstract
-
Cited by 11 (10 self)
- Add to MetaCart
Conditioned suppression is a useful technique for assessing whether subjects have learned a CS ± US association, but it is dif ® cult to use in humans because of the need for an aversive US. The purpose of this research was to develop a non-aversive procedure that would produce suppression. Subjects learned to press the space bar of a computer as part of a video game, but they had to stop pressing whenever a visual US appeared, or they would lose points. In Experiment 1, we used an A+/B2 discrimination design: The US always followed Stimulus A and never followed Stimulus B. Although no information about the existence of CSs was given to the subjects, suppression ratio results showed a discrimination learning curveÐ that is, subjects learned to suppress responding in anticipation of the US when Stimulus A was present but not during the presentations of Stimulus B. Experiment 2 explored the potential of this preparation by using two different instruction sets and assessing post-experimental judgements of CS A and CS B in addition to suppression ratios. The results of these experiments suggest that conditioned suppression can be reliably and conveniently used in the human laboratory, providing a bridge between experiments on animal conditioning and experiments on human judgements of causality.
Risky Theories -- The Effects of Variance on Foraging Decisions
- AMER. ZOOL.
, 1996
"... This paper concerns the response of foraging animals to variability in rate of gain, or risk. Both the empirical and theoretical literatures relevant to this issue are reviewed. The methodology and results from fifty-nine studies in which animals are required to choose between foraging options diffe ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
This paper concerns the response of foraging animals to variability in rate of gain, or risk. Both the empirical and theoretical literatures relevant to this issue are reviewed. The methodology and results from fifty-nine studies in which animals are required to choose between foraging options differing in the variances in the rate of gain available are tabulated. We found that when risk is generated by variability in the amount of reward, animals are most frequently risk-averse and sometimes indifferent to risk, although in some studies preference depends on energy budget. In contrast, when variability is in delay to reward, animals are universally risk-prone. A range of functional, descriptive and mechanistic accounts for these findings is described, none of which alone is capable of accommodating all aspects of the data. Risk-sensitive foraging theory provides the only currently available explanation for why energy budget should affect preference. An information-processing model that incorporates Weber's law provides the only general explanation for why animals

