Results 1  10
of
51
Reinforcement learning in the brain
"... Abstract: A wealth of research focuses on the decisionmaking processes that animals and humans employ when selecting actions in the face of reward and punishment. Initially such work stemmed from psychological investigations of conditioned behavior, and explanations of these in terms of computation ..."
Abstract

Cited by 43 (6 self)
 Add to MetaCart
(Show Context)
Abstract: A wealth of research focuses on the decisionmaking processes that animals and humans employ when selecting actions in the face of reward and punishment. Initially such work stemmed from psychological investigations of conditioned behavior, and explanations of these in terms of computational models. Increasingly, analysis at the computational level has drawn on ideas from reinforcement learning, which provide a normative framework within which decisionmaking can be analyzed. More recently, the fruits of these extensive lines of research have made contact with investigations into the neural basis of decision making. Converging evidence now links reinforcement learning to specific neural substrates, assigning them precise computational roles. Specifically, electrophysiological recordings in behaving animals and functional imaging of human decisionmaking have revealed in the brain the existence of a key reinforcement learning signal, the temporal difference reward prediction error. Here, we first introduce the formal reinforcement learning framework. We then review the multiple lines of evidence linking reinforcement learning to the function of dopaminergic neurons in the mammalian midbrain and
Locally Bayesian Learning with Applications to Retrospective Revaluation and Highlighting
 Psychological Review
, 2006
"... A scheme is described for locally Bayesian parameter updating in models structured as successions of component functions. The essential idea is to backpropagate the target data to interior modules, such that an interior component’s target is the input to the next component that maximizes the probab ..."
Abstract

Cited by 36 (7 self)
 Add to MetaCart
(Show Context)
A scheme is described for locally Bayesian parameter updating in models structured as successions of component functions. The essential idea is to backpropagate the target data to interior modules, such that an interior component’s target is the input to the next component that maximizes the probability of the next component’s target. Each layer then does locally Bayesian learning. The approach assumes online trialbytrial learning. The resulting parameter updating is not globally Bayesian but can better capture human behavior. The approach is implemented for an associative learning model that first maps inputs to attentionally filtered inputs and then maps attentionally filtered inputs to outputs. The Bayesian updating allows the associative model to exhibit retrospective revaluation effects such as backward blocking and unovershadowing, which have been challenging for associative learning models. The backpropagation of target values to attention allows the model to show trialorder effects, including highlighting and differences in magnitude of forward and backward blocking, which have been challenging for Bayesian learning models.
Acquisition and extinction in autoshaping
 Psychological Review
, 2002
"... C. R. Gallistel and J. Gibbon (2000) presented quantitative data on the speed with which animals acquire behavioral responses during autoshaping, together with a statistical model of learning intended to account for them. Although this model captures the form of the dependencies among critical varia ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
(Show Context)
C. R. Gallistel and J. Gibbon (2000) presented quantitative data on the speed with which animals acquire behavioral responses during autoshaping, together with a statistical model of learning intended to account for them. Although this model captures the form of the dependencies among critical variables, its detailed predictions are substantially at variance with the data. In the present article, further key data on the speed of acquisition are used to motivate an alternative model of learning, in which animals can be interpreted as paying different amounts of attention to stimuli according to estimates of their differential reliabilities as predictors. In autoshaping experiments on pigeons, birds acquire a classically conditioned peck response to a lighted key associated, irrespective of their actions, with the delivery of food (Brown & Jenkins, 1968). As stressed persuasively by Gallistel and Gibbon (2000), there is substantial experimental evidence in favor of a simple quantitative relationship between the speed of acquisition in autoshaping and the three critical variables shown in Figure 1A. The first is I, the length of intertrial interval; the second is T, the time during the trial for which the conditioned stimulus (CS; a light in this case) is presented; and the third is the training schedule, 1/S, which is the fractional number of deliveries per light (for those birds that were only partially reinforced). Here, acquisition speeds are typically measured in terms of the number of trials it takes until a certain behavioral criterion is met, such as pecking during the time the light is illuminated on three out of four
Bayesian approaches to associative learning: From passive to active learning
 Learning & Behavior
, 2008
"... Traditional associationist models represent an organism’s knowledge state by a single strength of association on each associative link. Bayesian models instead represent knowledge by a distribution of graded degrees of belief over a range of candidate hypotheses. Many traditional associationist mode ..."
Abstract

Cited by 27 (7 self)
 Add to MetaCart
(Show Context)
Traditional associationist models represent an organism’s knowledge state by a single strength of association on each associative link. Bayesian models instead represent knowledge by a distribution of graded degrees of belief over a range of candidate hypotheses. Many traditional associationist models assume that the learner is passive, adjusting strengths of association only in reaction to stimuli delivered by the environment. Bayesian models, on the other hand, can describe how the learner should actively probe the environment to learn optimally. The first part of this article reviews two Bayesian accounts of backward blocking, a phenomenon that is challenging for many traditional theories. The broad Bayesian framework, in which these models reside, is also selectively reviewed. The second part focuses on two formalizations of optimal active learning: maximizing either the expected information gain or the probability gain. New analyses of optimal active learning by a Kalman filter and by a noisylogic gate show that these two Bayesian models make different predictions for some environments. The Kalman filter predictions are disconfirmed in at least one case. Bayesian formalizations of learning are a revolutionary advance over traditional approaches. Bayesian models assume that the learner maintains multiple candidate hypotheses with differing degrees of belief, unlike traditional
Isotropic Sequence Order Learning
, 2003
"... In this article, we present an isotropic unsupervised algorithm for temporal sequence learning. Nospecial reward signal is used such that all inputs are completely isotropic. All input signals are bandpass filtered before converging onto a linear output neuron. All synaptic weights change according ..."
Abstract

Cited by 24 (15 self)
 Add to MetaCart
(Show Context)
In this article, we present an isotropic unsupervised algorithm for temporal sequence learning. Nospecial reward signal is used such that all inputs are completely isotropic. All input signals are bandpass filtered before converging onto a linear output neuron. All synaptic weights change according to the correlation of bandpassfiltered inputs with the derivative of the output. We investigate the algorithm in an open and a closedloop condition, the latter being defined by embedding the learning system into a behavioral feedback loop. In the openloop condition, we find that the linear structure of the algorithm allows analytically calculating the shape of the weight change, which is strictly heterosynaptic and follows the shape of the weight change curves found in spiketimedependent plasticity. Furthermore, we show that synaptic weights stabilize automatically when no more temporal differences exist between the inputs without additional normalizing measures. In the second part of this study, the algorithm is is placed in an environment that leads to closed sensormotor loop. To this end, a robot is programmed with a prewired retraction reflex reaction in response to collisions. Through isotropic sequence order (ISO) learning, the robot achieves collision avoidance by learning the correlation between his early rangefinder signals and the later occurring collision signal. Synaptic weights stabilize at the end of learning as theoretically predicted. Finally, we discuss the relation of ISO learning with other drive reinforcement models and with the commonly used temporal difference learning algorithm. This study is followed up by a mathematical analysis of the closedloop situation in the companion article in this issue, “ISO Learning Approximates a Solution to the InverseController Problem in an Unsupervised Behavioral Paradigm” (pp. 865–884).
Dynamics of Attentional Selection Under Conflict: Toward a Rational Bayesian Account
"... The brain exhibits remarkable facility in exerting attentional control in most circumstances, but it also suffers apparent limitations in others. The authors ’ goal is to construct a rational account for why attentional control appears suboptimal under conditions of conflict and what this implies ab ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
The brain exhibits remarkable facility in exerting attentional control in most circumstances, but it also suffers apparent limitations in others. The authors ’ goal is to construct a rational account for why attentional control appears suboptimal under conditions of conflict and what this implies about the underlying computational principles. The formal framework used is based on Bayesian probability theory, which provides a convenient language for delineating the rationale and dynamics of attentional selection. The authors illustrate these issues with the Eriksen flanker task, a classical paradigm that explores the effects of competing sensory inputs on response tendencies. The authors show how 2 distinctly formulated models, based on compatibility bias and spatial uncertainty principles, can account for the behavioral data. They also suggest novel experiments that may differentiate these models. In addition, they elaborate a simplified model that approximates optimal computation and may map more directly onto the underlying neural machinery. This approximate model uses conflict monitoring, putatively mediated by the anterior cingulate cortex, as a proxy for compatibility representation. The authors also consider how this conflict information might be disseminated and used to control processing.
Qlearning of sequential attention for visual object recognition from informative local descriptors
 In Proc. of the 22nd International Conference on Machine Learning (ICML
, 2005
"... This work provides a framework for learning sequential attention in realworld visual object recognition, using an architecture of three processing stages. The first stage rejects irrelevant local descriptors based on an information theoretic saliency measure, providing candidates for foci of intere ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
(Show Context)
This work provides a framework for learning sequential attention in realworld visual object recognition, using an architecture of three processing stages. The first stage rejects irrelevant local descriptors based on an information theoretic saliency measure, providing candidates for foci of interest (FOI). The second stage investigates the information in the FOI using a codebook matcher and providing weak object hypotheses. The third stage integrates local information via shifts of attention, resulting in chains of descriptoraction pairs that characterize object discrimination. A Qlearner adapts then from explorative search and evaluative feedback from entropy decreases on the attention sequences, eventually prioritizing shifts that lead to a geometry of descriptoraction scanpaths that is highly discriminative with respect to object recognition. The methodology is successfully evaluated on indoors (COIL20 database) and outdoors (TSG20 database) imagery, demonstrating significant impact by learning, outperforming standard local descriptor based methods both in recognition accuracy and processing time. 1.
Cueguided search: a computational model of selective attention
 IEEE transactions on neural networks
, 2005
"... Abstract—Selective visual attention in a natural environment can be seen as the interaction between the external visual stimulus and task specific knowledge of the required behavior. This interaction between the bottomup stimulus and the topdown, taskrelated knowledge is crucial for what is selec ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
Abstract—Selective visual attention in a natural environment can be seen as the interaction between the external visual stimulus and task specific knowledge of the required behavior. This interaction between the bottomup stimulus and the topdown, taskrelated knowledge is crucial for what is selected in the space and time within the scene. In this paper, we propose a computational model for selective attention for a visual search task. We go beyond simple saliencybased attention models to model selective attention guided by topdown visual cues, which are dynamically integrated with the bottomup information. In this way, selection of a location is accomplished by interaction between bottomup and topdown information. First, the general structure of our model is briefly introduced and followed by a description of the topdown processing of taskrelevant cues. This is then followed by a description of the processing of the external images to give three feature maps that are combined to give an overall bottomup map. Second, the development of the formalism for our novel interactive spiking neural network (ISNN) is given, with the interactive activation rule that calculates the integration map. The learning rule for both bottomup and topdown weight parameters are given, together with some further analysis of the properties of the resulting ISNN. Third, the model is applied to a face detection task to search for the location of a specific face that is cued. The results show that the trajectories of attention are dramatically changed by interaction of information and variations of cues, giving an appropriate, taskrelevant search pattern. Finally, we discuss ways in which these results can be seen as compatible with existing psychological evidence. Index Terms—Attention, bottomup map, computer vision, cueguided search, topdown map.
Semirational Models of Conditioning: The Case of Trial Order
, 2007
"... Bayesian treatments of animal conditioning start from a generative model that specifies precisely a set of assumptions about the structure of the learning task. Optimal rules for learning are direct mathematical consequences of these assumptions. In terms of Marr’s (1982) levels of analyses, the mai ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Bayesian treatments of animal conditioning start from a generative model that specifies precisely a set of assumptions about the structure of the learning task. Optimal rules for learning are direct mathematical consequences of these assumptions. In terms of Marr’s (1982) levels of analyses, the main task at the computational level