Results 1 - 10
of
21
TD Models of Reward Predictive Responses in Dopamine Neurons
"... This article focuses on recent modeling studies of dopamine neuron activity and their influence on behavior. Activity of midbrain dopamine neurons is phasically increased by stimuli that increase the animal's reward expectation and is decreased below baseline levels when the reward fails to occur. T ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
This article focuses on recent modeling studies of dopamine neuron activity and their influence on behavior. Activity of midbrain dopamine neurons is phasically increased by stimuli that increase the animal's reward expectation and is decreased below baseline levels when the reward fails to occur. These characteristics resemble the reward prediction error signal of the temporal difference (TD) model, which is a model of reinforcement learning. Computational modeling studies show that such a dopamine-like reward prediction error can serve as a powerful teaching signal for learning with delayed reinforcement, in particular for learning of motor sequences.
Solving the Distal Reward Problem through Linkage of STDP and Dopamine Signaling
- CEREBRAL CORTEX
, 2007
"... In Pavlovian and instrumental conditioning, reward typically comes seconds after reward-triggering actions, creating an explanatory conundrum known as ‘‘distal reward problem’’: How does the brain know what firing patterns of what neurons are responsible for the reward if 1) the patterns are no long ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
In Pavlovian and instrumental conditioning, reward typically comes seconds after reward-triggering actions, creating an explanatory conundrum known as ‘‘distal reward problem’’: How does the brain know what firing patterns of what neurons are responsible for the reward if 1) the patterns are no longer there when the reward arrives and 2) all neurons and synapses are active during the waiting period to the reward? Here, we show how the conundrum is resolved by a model network of cortical spiking neurons with spike-timing--dependent plasticity (STDP) modulated by dopamine (DA). Although STDP is triggered by nearly coincident firing patterns on a millisecond timescale, slow kinetics of subsequent synaptic plasticity is sensitive to changes in the extracellular DA concentration during the critical period of a few seconds. Random firings during the waiting period to the reward do not affect STDP and hence make the network insensitive to the ongoing activity— the key feature that distinguishes our approach from previous theoretical studies, which implicitly assume that the network be quiet during the waiting period or that the patterns be preserved until the reward arrives. This study emphasizes the importance of precise firing patterns in brain dynamics and suggests how a global diffusive reinforcement signal in the form of extracellular DA can selectively influence the right synapses at the right time.
Temporal sequence learning, prediction and control - a review of different models and their relation to biological mechanisms
- Neural Computation
, 2004
"... In this article we compare methods for temporal sequence learning (TSL) across the disciplines machine-control, classical conditioning, neuronal models for TSL as well as spiketiming dependent plasticity. This review will briefly introduce the most influential models and focus on two questions: 1) T ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
In this article we compare methods for temporal sequence learning (TSL) across the disciplines machine-control, classical conditioning, neuronal models for TSL as well as spiketiming dependent plasticity. This review will briefly introduce the most influential models and focus on two questions: 1) To what degree are reward-based (e.g. TD-learning) and correlation based (hebbian) learning related? and 2) How do the different models correspond to possibly underlying biological mechanisms of synaptic plasticity? We will first compare the different models in an open-loop condition, where behavioral feedback does not alter the learning. Here we observe, that reward-based and correlation based learning are indeed very similar. Machine-control is then used to introduce the problem of closed-loop control (e.g. “actor-critic architectures”). Here the problem of evaluative (“rewards”) versus nonevaluative (“correlations”) feedback from the environment will be discussed showing that both learning approaches are fundamentally different in the closed-loop condition. In trying to answer the second question we will compare neuronal versions of the different learning architectures to the anatomy of the involved brain structures (basal-ganglia, thalamus and
Isotropic Sequence Order Learning
, 2003
"... In this article, we present an isotropic unsupervised algorithm for temporal sequence learning. Nospecial reward signal is used such that all inputs are completely isotropic. All input signals are bandpass filtered before converging onto a linear output neuron. All synaptic weights change according ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
In this article, we present an isotropic unsupervised algorithm for temporal sequence learning. Nospecial reward signal is used such that all inputs are completely isotropic. All input signals are bandpass filtered before converging onto a linear output neuron. All synaptic weights change according to the correlation of bandpass-filtered inputs with the derivative of the output. We investigate the algorithm in an open- and a closed-loop condition, the latter being defined by embedding the learning system into a behavioral feedback loop. In the open-loop condition, we find that the linear structure of the algorithm allows analytically calculating the shape of the weight change, which is strictly heterosynaptic and follows the shape of the weight change curves found in spike-time-dependent plasticity. Furthermore, we show that synaptic weights stabilize automatically when no more temporal differences exist between the inputs without additional normalizing measures. In the second part of this study, the algorithm is is placed in an environment that leads to closed sensormotor loop. To this end, a robot is programmed with a prewired retraction reflex reaction in response to collisions. Through isotropic sequence order (ISO) learning, the robot achieves collision avoidance by learning the correlation between his early range-finder signals and the later occurring collision signal. Synaptic weights stabilize at the end of learning as theoretically predicted. Finally, we discuss the relation of ISO learning with other drive reinforcement models and with the commonly used temporal difference learning algorithm. This study is followed up by a mathematical analysis of the closed-loop situation in the companion article in this issue, “ISO Learning Approximates a Solution to the Inverse-Controller Problem in an Unsupervised Behavioral Paradigm” (pp. 865–884).
Modeling embodied visual behaviors
- ACM Trans. Appl. Percpt
, 2007
"... To make progess in understanding human visuo-motor behavior, we will need to understand its basic components at an abstract level. One way to achieve such an understanding would be to create a model of a human that has a sufficient amount of complexity so as to be capable of generating such behavior ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
To make progess in understanding human visuo-motor behavior, we will need to understand its basic components at an abstract level. One way to achieve such an understanding would be to create a model of a human that has a sufficient amount of complexity so as to be capable of generating such behaviors. Recent technological advances have been made that allow progress to be made in this direction. Graphics models that simulate extensive human capabilities can be used as platforms from which to develop synthetic models of visuo-motor behavior. Currently such models can capture only a small portion of a full behavioral repertoire, but for the behaviors that they do model, they can describe complete visuo-motor subsystems at a useful level of detail. The value in doing so is that the body’s elaborate visuo-motor structures greatly simplify the specification of the abstract behaviors that guide them. The net result is that, essentially, one is behaviors at each instant. This paper outlines one such model. A centerpiece of the model uses vision to aid the behavior that has the most to gain from taking environmental measurements. Preliminary tests of the model against human performance in realistic VR environments show that main features of the model show up in human behavior. Categories and Subject Descriptors: I.2.10 [Vision and Scene Understanding]: Perceptual reasoning 1.
Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network
, 2005
"... Behavioral conditioning of cue–reward pairing results in a shift of midbrain dopamine (DA) cell activity from responding to the reward to responding to the predictive cue. However, the precise time course and mechanism underlying this shift remain unclear. Here, we report a combined single-unit reco ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Behavioral conditioning of cue–reward pairing results in a shift of midbrain dopamine (DA) cell activity from responding to the reward to responding to the predictive cue. However, the precise time course and mechanism underlying this shift remain unclear. Here, we report a combined single-unit recording and temporal difference (TD) modeling approach to this question. The data from recordings in conscious rats showed that DA cells retain responses to predicted reward after responses to conditioned cues have developed, at least early in training. This contrasts with previous TD models that predict a gradual stepwise shift in latency with responses to rewards lost before responses develop to the conditioned cue. By exploring the TD parameter space, we demonstrate that the persistent reward responses of DA cells during conditioning are only accurately replicated by a TD model with long-lasting eligibility traces (nonzero values for the parameter λ) and low learning rate (α). These physiological constraints for TD parameters suggest that eligibility traces and low per-trial rates of plastic modification may be essential features of neural circuits for reward learning in the brain. Such properties enable rapid but stable initiation of learning when the number of stimulus–reward pairings is limited, conferring significant adaptive advantages in real-world environments.
A mechanism for error detection in speeded response time tasks
- Journal of Experimental Psychology: General
, 2005
"... The concept of error detection plays a central role in theories of executive control. In this article, the authors present a mechanism that can rapidly detect errors in speeded response time tasks. This error monitor assigns values to the output of cognitive processes involved in stimulus categoriza ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
The concept of error detection plays a central role in theories of executive control. In this article, the authors present a mechanism that can rapidly detect errors in speeded response time tasks. This error monitor assigns values to the output of cognitive processes involved in stimulus categorization and response generation and detects errors by identifying states of the system associated with negative value. The mechanism is formalized in a computational model based on a recent theoretical framework for understanding error processing in humans (C. B. Holroyd & M. G. H. Coles, 2002). The model is used to simulate behavioral and event-related brain potential data in a speeded response time task, and the results of the simulation are compared with empirical data. Frontal parts of the brain, including the prefrontal cortex (Luria, 1973; Stuss & Knight, 2002), the anterior cingulate cortex (Devinsky, Morrell, & Vogt, 1995; Posner & DiGirolamo, 1998), and their connections with the basal ganglia (L. L. Brown, Schneider, & Lidsky, 1997; Cummings, 1993), are thought to compose an executive system for cognitive control. The functions of this system are thought to include setting high-level goals, directing other
Actor-critic models of reinforcement learning in the basal ganglia: From natural to artificial rats
- Adapt. Behav
, 2005
"... On behalf of: ..."
The Psikharpax Project: Towards Building an Artificial Rat
- Robotics and Autonomous Systems
, 2005
"... Drawing inspiration from biology, the Psikharpax project aims at endowing a robot with a sensori-motor equipment and a neural control architecture that will afford some of the capacities of autonomy and adaptation that are exhibited by real rats. The paper summarizes the current state of achievement ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Drawing inspiration from biology, the Psikharpax project aims at endowing a robot with a sensori-motor equipment and a neural control architecture that will afford some of the capacities of autonomy and adaptation that are exhibited by real rats. The paper summarizes the current state of achievement of the project. It successively describes the robot’s future sensors and actuators, and several biomimetic models of the anatomy and physiology of structures in the rat’s brain, like the hippocampus and the basal ganglia, which have already been at work on various robots, and that make navigation and action selection possible. Preliminary results on the implementation of learning mechanisms in these structures are also presented. Finally, the article discusses the potential benefits that a biologically-inspired approach affords to traditional autonomous robotics.

