Results 1 -
3 of
3
An imperfect dopaminergic error signal can drive temporal-difference learning
- PLoS Comput Biol
"... An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference learning includes the resemblance of the phasic activity of the midbrain dopaminergic neurons to the TD error and the discovery that cortico-striatal synaptic plasticity is modulated by dopamine. However, as the phasic dopaminergic signal does not reproduce all the properties of the theoretical TD error, it is unclear whether it is capable of driving behavior adaptation in complex tasks. Here, we present a spiking temporal-difference learning model based on the actor-critic architecture. The model dynamically generates a dopaminergic signal with realistic firing rates and exploits this signal to modulate the plasticity of synapses as a third factor. The predictions of our proposed plasticity dynamics are in good agreement with experimental results with respect to dopamine, pre- and post-synaptic activity. An analytical mapping from the parameters of our proposed plasticity dynamics to those of the classical discrete-time TD algorithm reveals that the biological constraints of the dopaminergic signal entail a modified TD algorithm with self-adapting learning parameters and an adapting offset. We show that the neuronal network is able to learn a task with sparse positive rewards as fast as the corresponding classical discrete-time TD algorithm. However, the performance of the neuronal network is impaired with respect to the traditional algorithm on a task with both positive and
RESEARCH ARTICLE Anticipation and Choice Heuristics in the Dynamic Consumption of Pain Relief
"... Humans frequently need to allocate resources across multiple time-steps. Economic theory proposes that subjects do so according to a stable set of intertemporal preferences, but the computational demands of such decisions encourage the use of formally less competent heuristics. Few empirical studies ..."
Abstract
- Add to MetaCart
(Show Context)
Humans frequently need to allocate resources across multiple time-steps. Economic theory proposes that subjects do so according to a stable set of intertemporal preferences, but the computational demands of such decisions encourage the use of formally less competent heuristics. Few empirical studies have examined dynamic resource allocation decisions systematically. Here we conducted an experiment involving the dynamic consumption over approximately 15 minutes of a limited budget of relief from moderately painful stimuli. We had previously elicited the participants ’ time preferences for the same painful stimuli in one-off choices, allowing us to assess self-consistency. Participants exhibited three characteris-tic behaviors: saving relief until the end, spreading relief across time, and early spending, of which the last was markedly less prominent. The likelihood that behavior was heuristic rath-er than normative is suggested by the weak correspondence between one-off and dynamic choices. We show that the consumption choices are consistent with a combination of simple heuristics involving early-spending, spreading or saving of relief until the end, with subjects predominantly exhibiting the last two.