Results 1 - 10
of
10
Motivated Reinforcement Learning
, 2001
"... The standard reinforcement learning view of the involvement of neuromodulatory systems in instrumental conditioning includes a rather straightforward conception of motivation as prediction of sum future reward. Competition between actions is based on the motivating characteristics of their consequen ..."
Abstract
-
Cited by 222 (8 self)
- Add to MetaCart
The standard reinforcement learning view of the involvement of neuromodulatory systems in instrumental conditioning includes a rather straightforward conception of motivation as prediction of sum future reward. Competition between actions is based on the motivating characteristics of their consequent states in this sense. Substantial, careful, experiments reviewed in Dickinson & Balleine, into the neurobiology and psychology of motivation shows that this view is incomplete. In many cases, animals are faced with the choice not between many different actions at a given state, but rather whether a single response is worth executing at all. Evidence suggests that the motivational process underlying this choice has different psychological and neural properties from that underlying action choice. We describe and model these motivational systems, and consider the way they interact.
Relations between Pavlovian-instrumental transfer and reinforcer devaluation
- Journal of Experimental Psychology: Animal Behavior Processes
, 2004
"... Relations between posttraining reinforcer devaluation and Pavlovian-instrumental transfer were examined in 2 experiments. When a single reinforcer was used, extended training of the instrumental response increased transfer but reduced devaluation effects. When multiple instrumental reinforcers were ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Relations between posttraining reinforcer devaluation and Pavlovian-instrumental transfer were examined in 2 experiments. When a single reinforcer was used, extended training of the instrumental response increased transfer but reduced devaluation effects. When multiple instrumental reinforcers were used, both reinforcer-specific transfer and devaluation effects were less influenced by the amount of instrumental training. Finally, although reinforcer devaluation decreased both Pavlovian conditioned responses and baseline instrumental responding, it had no effect on either single-reinforcer or reinforcer-specific transfer. These results indicate that transfer and reinforcer devaluation can reflect different aspects of associative learning and that the nature of associative learning can be influenced by parameters such as the amount of training and the use of multiple reinforcers. Modern views of Pavlovian conditioning stress the variety of consequences of arranging relations among environmental events, including the formation of associations between memorial representations of those events and the conditioning of overt motor behavior and emotional responses (e.g., Dickinson & Balleine, 1995, 2001; Holland, 1997; Rescorla, 1988). Historically, Pavlovian
Reinforcement learning in the brain
"... Abstract: A wealth of research focuses on the decision-making processes that animals and humans employ when selecting actions in the face of reward and punishment. Initially such work stemmed from psychological investigations of conditioned behavior, and explanations of these in terms of computation ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Abstract: A wealth of research focuses on the decision-making processes that animals and humans employ when selecting actions in the face of reward and punishment. Initially such work stemmed from psychological investigations of conditioned behavior, and explanations of these in terms of computational models. Increasingly, analysis at the computational level has drawn on ideas from reinforcement learning, which provide a normative framework within which decision-making can be analyzed. More recently, the fruits of these extensive lines of research have made contact with investigations into the neural basis of decision making. Converging evidence now links reinforcement learning to specific neural substrates, assigning them precise computational roles. Specifically, electrophysiological recordings in behaving animals and functional imaging of human decision-making have revealed in the brain the existence of a key reinforcement learning signal, the temporal difference reward prediction error. Here, we first introduce the formal reinforcement learning framework. We then review the multiple lines of evidence linking reinforcement learning to the function of dopaminergic neurons in the mammalian midbrain and
The Control of Instrumental Action Following Outcome Devaluation in Young Children Aged Between 1 and 4 Years
"... To determine the role of action–outcome learning in the control of young children’s instrumental behavior, the authors trained 18- to 48-month-olds to manipulate visual icons on a touch-sensitive display to obtain different types of video clips as outcomes. Subsequently, one of the outcomes was deva ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
To determine the role of action–outcome learning in the control of young children’s instrumental behavior, the authors trained 18- to 48-month-olds to manipulate visual icons on a touch-sensitive display to obtain different types of video clips as outcomes. Subsequently, one of the outcomes was devalued by repeated exposure, and children’s propensity to perform the trained actions was tested in extinction. On test, children with a mean age greater than 2.5 years performed the action trained with the devalued outcome less than those trained with the still-valued outcome, thereby demonstrating that their actions were mediated by action–outcome learning. By contrast, the instrumental responses of younger children (mean age �2 years) were resistant to outcome devaluation and may have been elicited directly by the icons associated with each response, rather than mediated by a specific action–outcome expectation.
THE COGNITIVE NEUROSCIENCE OF MOTIVATION AND LEARNING
"... Recent advances in the cognitive neuroscience of motivation and learning have demonstrated a critical role for midbrain dopamine and its targets in reward prediction. Converging evidence suggests that midbrain dopamine neurons signal a reward prediction error, allowing an organism to predict, and to ..."
Abstract
- Add to MetaCart
Recent advances in the cognitive neuroscience of motivation and learning have demonstrated a critical role for midbrain dopamine and its targets in reward prediction. Converging evidence suggests that midbrain dopamine neurons signal a reward prediction error, allowing an organism to predict, and to act to increase, the probability of reward in the future. This view has been highly successful in accounting for a wide range of reinforcement learning phenomena in animals and humans. However, while current theories of midbrain dopamine provide a good account of behavior known as habitual or stimulus-response learning, we review evidence suggesting that other neural and cognitive processes are involved in motivated, goal-directed behavior. We discuss how this distinction resembles the classic distinction in the cognitive neuroscience of memory between nondeclarative and declarative memory systems, and discuss common themes between mnemonic and motivational functions. Finally, we present data demonstrating links between mnemonic processes and reinforcement learning. The past decade has seen a growth of interest in the cognitive neuroscience of motivation and reward. This is largely rooted in a series of neurophysiology studies of the response properties of dopamine-containing midbrain neurons in primates receiving reward (Schultz, 1998). The responses of these neurons were subsequently interpreted in terms of reinforcement learning, a computational framework for trial and error learning from reward (Houk, Adams, & Barto, 1995; Montague, Dayan, & Sejnowski, 1996; Schultz, Dayan, & Montague, 1997). Together with Both authors contributed equally to this article. We are most grateful to Shanti Shanker for assistance with data collection, to Anthony Wagner for generously allowing us to conduct the experiment reported here in his laboratory, and to Alison Adcock, Lila Davachi, Peter Dayan, Mark
The Effects of Motivation on Extensively Trained Behavior
"... How motivation influences habitual behavior is unclear, since only motivational decrements have been considered. Here, in two experiments, we investigated the effects of motivational up-shifts and side-shifts on instrumental behavior which was extensively trained using a protocol known to promote ha ..."
Abstract
- Add to MetaCart
How motivation influences habitual behavior is unclear, since only motivational decrements have been considered. Here, in two experiments, we investigated the effects of motivational up-shifts and side-shifts on instrumental behavior which was extensively trained using a protocol known to promote habitual responding. In Experiment 1, hungry rats were trained to lever press for sucrose solution. Following a side-shift from food- to water-deprivation, rats showed less lever pressing in extinction compared to non-shifted controls, although a subsequent consumption test found no differences in sucrose consumption between thirsty and hungry groups. In Experiment 2, undeprived rats were trained to lever press for either sucrose solution or sucrose pellets. A post-training up-shift from satiety to water-deprivation did not affect lever pressing in extinction, regardless of outcome identity, although free consumption of sucrose solution, but not of pellets, was enhanced. Together, these results suggest that motivation affects extensively-trained instrumental behavior through a combination of general drive and generalization decrement, but not through determining the value 1 of the outcome. This is in stark contrast to the known effects of motivational states on moderately-trained (goal directed) behavior. The absence of an outcome-specific effect is in line with theories arguing for stimulusresponse rather than response-outcome control of habitual behavior.
Motivation Fixed-action pattern Learning Stimulus–response Cognition Hierarchy Stereotypies Displacement
"... TOATES, F. The interaction of cognitive and stimulus–response processes in the control of behaviour. NEUROSCI BIOBEHAV REV 22(1) 59–83, 1998.—It is argued that both stimulus–response (S–R) and cognitive theories of learning and behaviour capture part of the truth, in that these terms involve two dif ..."
Abstract
- Add to MetaCart
TOATES, F. The interaction of cognitive and stimulus–response processes in the control of behaviour. NEUROSCI BIOBEHAV REV 22(1) 59–83, 1998.—It is argued that both stimulus–response (S–R) and cognitive theories of learning and behaviour capture part of the truth, in that these terms involve two different types of process that are jointly responsible for the control of behaviour. The proposal that both processes coexist is investigated in the context of the production of behaviour. Evidence is presented to show that the weighting attached to S–R and cognitive processes can change as a function of (a) development; (b) experience; and (c) pathology. A model is proposed which is designed to sketch some ideas on how S–R and cognitive processes jointly determine behaviour, and it is related to the notion of behavioural hierarchy. It is argued that the model can help to develop a synthesis between psychology, ethology and
Associative theories of goal-directed behaviour: a case for animal–human translational models
, 2009
"... Associative accounts of goal-directed action, developed in the fields of human ideomotor action and that of animal learning, can capture cognitive belief-desire psychology of human decision-making. Whereas outcome-response accounts can account for the fact that the thought of a goal can call to min ..."
Abstract
- Add to MetaCart
Associative accounts of goal-directed action, developed in the fields of human ideomotor action and that of animal learning, can capture cognitive belief-desire psychology of human decision-making. Whereas outcome-response accounts can account for the fact that the thought of a goal can call to mind the action that has previously procured this goal, response-outcome accounts capture decision-making processes that start out with the consideration of possible response alternatives followed only in the second instance by evaluation of their consequences. We argue that while the outcome-response mechanism plays a crucial role in response priming effects, the response-outcome mechanism is particularly important for action selection on the basis of current needs and desires. We therefore develop an integrative account that encapsulates these two routes of action selection within the framework of the associative cybernetic model. This model has the additional benefit of providing mechanisms for the incentive modulation of goal-directed action and for the development of behavioural autonomy, and therefore provides a promising account of the multi-faceted process of animal as well as human instrumental decision-making.
Neuron Article Model-Based Influences on Humans ’ Choices and Striatal Prediction Errors
"... The mesostriatal dopamine system is prominently implicated in model-free reinforcement learning, with fMRI BOLD signals in ventral striatum notably covarying with model-free prediction errors. However, latent learning and devaluation studies show that behavior also shows hallmarks of model-based pla ..."
Abstract
- Add to MetaCart
The mesostriatal dopamine system is prominently implicated in model-free reinforcement learning, with fMRI BOLD signals in ventral striatum notably covarying with model-free prediction errors. However, latent learning and devaluation studies show that behavior also shows hallmarks of model-based planning, and the interaction between model-based and model-free values, prediction errors, and preferences is underexplored. We designed a multistep decision task in which modelbased and model-free influences on human choice behavior could be distinguished. By showing that choices reflected both influences we could then test the purity of the ventral striatal BOLD signal as a model-free report. Contrary to expectations, the signal reflected both model-free and modelbased predictions in proportions matching those that best explained choice behavior. These results challenge the notion of a separate model-free learner and suggest a more integrated computational architecture for high-level human decisionmaking.
In Press at Journal of Experimental Psychology: General
"... Recent computational theories of decision making in humans and animals have portrayed two systems locked in a battle for control of behavior. One system—variously termed “model-free” or “habitual”—favors actions that have previously led to reward, while a second called the “model-based ” or “goal-di ..."
Abstract
- Add to MetaCart
Recent computational theories of decision making in humans and animals have portrayed two systems locked in a battle for control of behavior. One system—variously termed “model-free” or “habitual”—favors actions that have previously led to reward, while a second called the “model-based ” or “goal-directed ” system favors actions that causally lead to reward according to the agent’s internal model of the environment. Some evidence suggests that control can be shifted between these systems using neural or behavioral manipulations, but other evidence suggests that the systems are more intertwined than a competitive account would imply. In four behavioral experiments, using a retrospective revaluation design and a cognitive load manipulation, we show that human decisions are more consistent with a cooperative architecture in which the model-free system controls behavior, while the model-based system trains the model-free system by replaying and simulating experience. Model-Based Revaluation 3

