Results 1 - 10
of
26
Reinforcement learning in the brain
"... Abstract: A wealth of research focuses on the decision-making processes that animals and humans employ when selecting actions in the face of reward and punishment. Initially such work stemmed from psychological investigations of conditioned behavior, and explanations of these in terms of computation ..."
Abstract
-
Cited by 48 (6 self)
- Add to MetaCart
(Show Context)
Abstract: A wealth of research focuses on the decision-making processes that animals and humans employ when selecting actions in the face of reward and punishment. Initially such work stemmed from psychological investigations of conditioned behavior, and explanations of these in terms of computational models. Increasingly, analysis at the computational level has drawn on ideas from reinforcement learning, which provide a normative framework within which decision-making can be analyzed. More recently, the fruits of these extensive lines of research have made contact with investigations into the neural basis of decision making. Converging evidence now links reinforcement learning to specific neural substrates, assigning them precise computational roles. Specifically, electrophysiological recordings in behaving animals and functional imaging of human decision-making have revealed in the brain the existence of a key reinforcement learning signal, the temporal difference reward prediction error. Here, we first introduce the formal reinforcement learning framework. We then review the multiple lines of evidence linking reinforcement learning to the function of dopaminergic neurons in the mammalian midbrain and
A mechanism for error detection in speeded response time tasks
- Journal of Experimental Psychology: General
, 2005
"... The concept of error detection plays a central role in theories of executive control. In this article, the authors present a mechanism that can rapidly detect errors in speeded response time tasks. This error monitor assigns values to the output of cognitive processes involved in stimulus categoriza ..."
Abstract
-
Cited by 37 (11 self)
- Add to MetaCart
(Show Context)
The concept of error detection plays a central role in theories of executive control. In this article, the authors present a mechanism that can rapidly detect errors in speeded response time tasks. This error monitor assigns values to the output of cognitive processes involved in stimulus categorization and response generation and detects errors by identifying states of the system associated with negative value. The mechanism is formalized in a computational model based on a recent theoretical framework for understanding error processing in humans (C. B. Holroyd & M. G. H. Coles, 2002). The model is used to simulate behavioral and event-related brain potential data in a speeded response time task, and the results of the simulation are compared with empirical data. Frontal parts of the brain, including the prefrontal cortex (Luria, 1973; Stuss & Knight, 2002), the anterior cingulate cortex (Devinsky, Morrell, & Vogt, 1995; Posner & DiGirolamo, 1998), and their connections with the basal ganglia (L. L. Brown, Schneider, & Lidsky, 1997; Cummings, 1993), are thought to compose an executive system for cognitive control. The functions of this system are thought to include setting high-level goals, directing other
Neural systems implicated in delayed and probabilistic reinforcement
, 2006
"... This review considers the theoretical problems facing agents that must learn and choose on the basis of reward or reinforcement that is uncertain or delayed, in implicit or procedural (stimulus–response) representational systems and in explicit or declarative (action–outcome–value) representational ..."
Abstract
-
Cited by 35 (0 self)
- Add to MetaCart
This review considers the theoretical problems facing agents that must learn and choose on the basis of reward or reinforcement that is uncertain or delayed, in implicit or procedural (stimulus–response) representational systems and in explicit or declarative (action–outcome–value) representational systems. Individual differences in sensitivity to delays and uncertainty may contribute to impulsivity and risk taking. Learning and choice with delayed and uncertain reinforcement are related but in some cases dissociable processes. The contributions to delay and uncertainty discounting of neuromodulators including serotonin, dopamine, and noradrenaline, and of specific neural structures including the nucleus accumbens core, nucleus accumbens shell, orbitofrontal cortex, basolateral amygdala, anterior cingulate cortex, medial prefrontal (prelimbic/infralimbic) cortex, insula, subthalamic nucleus, and hippocampus are examined.
Short-term gains, long-term pains: How cues about state aid learning in dynamic environments
, 2009
"... Successful investors seeking returns, animals foraging for food, and pilots controlling aircraft all must take into account how their current decisions will impact their future standing. One challenge facing decision makers is that options that appear attractive in the short-term may not turn out be ..."
Abstract
-
Cited by 31 (6 self)
- Add to MetaCart
(Show Context)
Successful investors seeking returns, animals foraging for food, and pilots controlling aircraft all must take into account how their current decisions will impact their future standing. One challenge facing decision makers is that options that appear attractive in the short-term may not turn out best in the long run. In this paper, we explore human learning in a dynamic decision-making task which places short- and long-term rewards in conflict. Our goal in these studies was to evaluate how people’s mental representation of a task affects their ability to discover an optimal decision strategy. We find that perceptual cues that readily align with the underlying state of the task environment help people overcome the impulsive appeal of short-term rewards. Our experimental manipulations, predictions, and analyses are motivated by current work in reinforcement learning which details how learners value delayed outcomes in sequential tasks and the importance that “state” identification plays in effective learning.
Timing and Partial Observability in the Dopamine System
- In Advances in Neural Information Processing Systems 15
, 2003
"... According to a series of influential models, dopamine (DA) neurons signal reward prediction error using a temporal-difference (TD) algorithm. ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
(Show Context)
According to a series of influential models, dopamine (DA) neurons signal reward prediction error using a temporal-difference (TD) algorithm.
Anxiety, cortisol and attachment predict plasma oxytocin
- Psychophysiology
, 2007
"... Oxytocin and attachment seem to interact in suppressing subjective anxiety and physiological stress responses. In this study we investigated the relationships between individual differences in trait attachment scores, state and trait anxiety, plasma cortisol, and plasma oxytocin levels in healthy pr ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
(Show Context)
Oxytocin and attachment seem to interact in suppressing subjective anxiety and physiological stress responses. In this study we investigated the relationships between individual differences in trait attachment scores, state and trait anxiety, plasma cortisol, and plasma oxytocin levels in healthy premenopausal women. Attachment proved to be a strong positive predictor of oxytocin levels, which were also positively predicted by cortisol levels and state and trait anxiety. The relationship between oxytocin and state anxiety wasmodulated by attachment scores. The present resultsmay help interpreting seeming contradictions in the recent literature on oxytocin, attachment, and stress in humans, by sug-gesting that context effects determine which relationships are found in different studies: anxiolytic effects of oxytocin in a context of partner support versus stress- or cortisol-induced oxytocin responses in a context of distress or increased cortisol.
Hyperbolically discounted temporal difference learning. Neural Comput
- Science
, 2010
"... Hyperbolic discounting of future outcomes is widely observed to underlie choice behavior in animals. Additionally, recent studies (Kobayashi & Schultz, 2008) have reported that hyperbolic discounting is observed even in neural systems underlying choice. However, the most prevalent models of tem ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Hyperbolic discounting of future outcomes is widely observed to underlie choice behavior in animals. Additionally, recent studies (Kobayashi & Schultz, 2008) have reported that hyperbolic discounting is observed even in neural systems underlying choice. However, the most prevalent models of temporal discounting, such as temporal difference learning, assume that future outcomes are discounted exponentially. Exponential discounting has been preferred largely because it can be expressed recursively, whereas hyperbolic discounting has heretofore been thought not to have a recursive definition. In this letter, we define a learning algorithm, hyperbolically discounted temporal difference (HDTD) learning, which constitutes a recursive formulation of the hyperbolic model.
Psychological and Neuroscientific Connections with Reinforcement Learning (preprint)
"... The field of Reinforcement Learning (RL) was inspired in large part by research in animal behavior and psychology. Early research showed that animals can, through trial and error, learn to execute behavior that would eventually lead to some (presumably satisfactory) outcome, and decades of subsequen ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
The field of Reinforcement Learning (RL) was inspired in large part by research in animal behavior and psychology. Early research showed that animals can, through trial and error, learn to execute behavior that would eventually lead to some (presumably satisfactory) outcome, and decades of subsequent research was (and is still) aimed at dis-covering the mechanisms of this learning process. This chapter describes behavioral and theoretical research in animal learning that is directly related to fundamental concepts used in RL. It then describes neuroscientific research that suggests that animals and many RL algorithms use very similar learning mechanisms. Along the way, I highlight ways that research in computer science contributes to and can be inspired by research in psychology and neuroscience. Please note: This is a preprint of a chapter for the book Reinforcement Learning: State of the Art, edited by Marco Wiering and Martijn van Otterlo. This document does not follow the format used in the book, but the text should be pretty much the same. If you cite this chapter, below is a bibtex you can use. Thanks.
On the Role of Dopamine in Cognitive Vision
"... Abstract. Although dopamine is one of the most studied neurotransmitter in the brain, its exact function is still unclear. This short review focuses on its role in different levels of cognitive vision: visual processing, visual attention and working memory. Dopamine can influence cognitive vision ei ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract. Although dopamine is one of the most studied neurotransmitter in the brain, its exact function is still unclear. This short review focuses on its role in different levels of cognitive vision: visual processing, visual attention and working memory. Dopamine can influence cognitive vision either through direct modulation of visual cells or through gating of basal ganglia functioning. Even if its classically assigned role is to signal reward prediction error, we review evidence that dopamine is also involved in novelty detection and attention shifting and discuss the possible implications for computational modeling. 1
LETTER Communicated by David S. Touretzky Hyperbolically Discounted Temporal Difference Learning
"... Hyperbolic discounting of future outcomes is widely observed to under-lie choice behavior in animals. Additionally, recent studies (Kobayashi & Schultz, 2008) have reported that hyperbolic discounting is observed even in neural systems underlying choice. However, the most prevalent mod-els of te ..."
Abstract
- Add to MetaCart
(Show Context)
Hyperbolic discounting of future outcomes is widely observed to under-lie choice behavior in animals. Additionally, recent studies (Kobayashi & Schultz, 2008) have reported that hyperbolic discounting is observed even in neural systems underlying choice. However, the most prevalent mod-els of temporal discounting, such as temporal difference learning, as-sume that future outcomes are discounted exponentially. Exponential discounting has been preferred largely because it can be expressed re-cursively, whereas hyperbolic discounting has heretofore been thought not to have a recursive definition. In this letter, we define a learning algo-rithm, hyperbolically discounted temporal difference (HDTD) learning, which constitutes a recursive formulation of the hyperbolic model. 1