DMCA
Behavioral theories and the neurophysiology of reward, (2006)
Venue: | Annu. Rev. Psychol. |
Citations: | 186 - 0 self |
BibTeX
@ARTICLE{Schultz06behavioraltheories,
author = {Wolfram Schultz},
title = {Behavioral theories and the neurophysiology of reward,},
journal = {Annu. Rev. Psychol.},
year = {2006},
pages = {87--115}
}
OpenURL
Abstract
■ Abstract The functions of rewards are based primarily on their effects on behavior and are less directly governed by the physics and chemistry of input events as in sensory systems. Therefore, the investigation of neural mechanisms underlying reward functions requires behavioral theories that can conceptualize the different effects of rewards on behavior. The scientific investigation of behavioral processes by animal learning theory and economic utility theory has produced a theoretical framework that can help to elucidate the neural correlates for reward functions in learning, goal-directed approach behavior, and decision making under uncertainty. Individual neurons can be studied in the reward systems of the brain, including dopamine neurons, orbitofrontal cortex, and striatum. The neural activity can be related to basic theoretical terms of reward and uncertainty, such as contiguity, contingency, prediction error, magnitude, probability, expected value, and variance. CONTENTS INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 GENERAL IDEAS ON REWARD FUNCTION, AND A CALL FOR THEORY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . INTRODUCTION How can we understand the common denominator of Pavlov's salivating dogs, an ale named Hobgoblin, a market in southern France, and the bargaining for lock access on the Mississippi River? Pavlov's dogs were presented with pieces of delicious sausage that undoubtedly made them salivate. We know that the same animal will salivate also when it hears a bell that has repeatedly sounded a few seconds before the sausage appears, as if the bell induced the well-known, pleasant anticipation of the desired sausage. Changing slightly the scenery, imagine you are in Cambridge, walk down Mill Lane, and unfailingly end up in the Mill pub by the river Cam. The known attraction inducing the pleasant anticipation is a pint of Hobgoblin. Hobgoblin's provocative ad reads something like "What's the matter Lager boy, afraid you might taste something?" and refers to a full-bodied, dark ale whose taste alone is a reward. Changing the scenery again, you are in the middle of a Saturday morning market in a small town in southern France and run into a nicely arranged stand of rosé and red wines. Knowing the presumably delicious contents of the differently priced bottles to varying degrees, you need to make a decision about what to get for lunch. You can do a numerical calculation and weigh the price of each bottle by the probability that its contents will please your taste, but chances are that a more automatic decision mechanism kicks in that is based on anticipation and will tell you quite quickly what to choose. However, you cannot use the same simple emotional judgment when you are in the shoes of an economist trying to optimize the access to the locks on the Mississippi River. The task is to find a pricing structure that assures the most efficient and uninterrupted use of the infrastructure over a 24-hour day, by avoiding long queues during prime daytime hours and inactive periods during the wee hours of the night. A proper pricing structure known in advance to the captains of the barges will shape their decisions to enter the locks at a moment that is economically most appropriate for the whole journey. The common denominator in these tasks appears to relate to the anticipation of outcomes of behavior in situations with varying degrees of uncertainty: the merely automatic salivation of a dog without much alternative, the choice of sophisticated but partly unknown liquids, or the well-calculated decision of a barge captain on how to get the most out of his money and time. The performance in these tasks is managed by the brain, which assesses the values and uncertainties of predictable outcomes (sausage, ale, wine, lock pricing, and access to resources) and directs the individuals' decisions toward the current Annu. Rev. Psychol. 2006.57:87-115. Downloaded from arjournals.annualreviews.org by HARVARD UNIVERSITY on 04/18/07. For personal use only. THEORY AND NEUROPHYSIOLOGY OF REWARD 89 optimum. This review describes some of the knowledge on brain mechanisms related to rewarding outcomes, without attempting to provide a complete account of all the studies done. We focus on the activity of single neurons studied by neurophysiological techniques in behaving animals, in particular monkeys, and emphasize the formative role of behavioral theories, such as animal learning theory and microeconomic utility theory, on the understanding of these brain mechanisms. Given the space limits and the only just beginning neurophysiological studies based on game theory (Barraclough et al. 2004 GENERAL IDEAS ON REWARD FUNCTION, AND A CALL FOR THEORY Homer's Odysseus proclaims, "Whatever my distress may be, I would ask you now to let me eat. There is nothing more devoid of shame than the accursed belly; it thrusts itself upon a man's mind in spite of his afflictions. . .my heart is sad but my belly keeps urging me to have food and drink. . .it says imperiously: 'eat and be filled'." (The Odyssey, Book VII, 800 BC). Despite these suggestive words, Homer's description hardly fits the common-sensical perceptions of reward, which largely belong to one of two categories. People often consider a reward as a particular object or event that one receives for having done something well. You succeed in an endeavor, and you receive your reward. This reward function could be most easily accommodated within the framework of instrumental conditioning, according to which the reward serves as a positive reinforcer of a behavioral act. The second common perception of reward relates to subjective feelings of liking and pleasure. You do something again because it produced a pleasant outcome before. We refer to this as the hedonic function of rewards. The following descriptions will show that both of these perceptions of reward fall well short of providing a complete and coherent description of reward functions. One of the earliest scientifically driven definitions of reward function comes from 90 SCHULTZ relevant what the dog feels (notion 2). Yet we will see that this definition is a key to neurobiological studies. Around this time, Reward objects for animals are primarily vegetative in nature, such as different foodstuffs and liquids with various tastes. These rewards are necessary for survival, their motivational value can be determined by controlled access, and they can be delivered in quantifiable amounts in laboratory situations. The other main vegetative reward, sex, is impossible to deliver in neurophysiological laboratory situations requiring hundreds of daily trials. Animals are also sensitive to other, nonvegetative rewards, such as touch to the skin or fur and presentation of novel objects and situations eliciting exploratory responses, but these again are difficult to parameterize for laboratory situations. Humans use a wide range of nonvegetative rewards, such as money, challenge, acclaim, visual and acoustic beauty, power, security, and many others, but these are not considered as this review considers neural mechanisms in animals. An issue with vegetative rewards is the precise definition of the rewarding effect. Is it the seeing of an apple, its taste on the tongue, the swallowing of a bite of it, the feeling of its going down the throat, or the rise in blood sugar subsequent to its digestion that makes it a reward and has one come back for more? Which of these events constitutes the primary rewarding effect, and do different objects draw their rewarding effects from different events (Wise 2002)? In some cases, the reward may be the taste experienced when an object activates the gustatory receptors, as with saccharin, which has no nutritional effects but increases behavioral reactions. The ultimate rewarding effect of many nutrient objects may be the specific influence on vegetative parameters, such as electrolyte, glucose, and amino acid concentrations in plasma and brain. This would explain why animals avoid foods that lack such nutrients as essential amino acids 91 Although these theories provide important insights into reward function, they tend to neglect the fact that individuals usually operate in a world with limited nutritional and mating resources, and that most resources occur with different degrees of uncertainty. The animal in the wild is not certain whether it will encounter a particular fruit or prey object at a particular moment, nor is the restaurant goer certain that her preferred chef will cook that night. To make the uncertainty of outcomes tractable was the main motive that led Blaise Pascal to develop probability theory around 1650 (see Glimcher 2003 for details). He soon realized that humans make decisions by weighing the potential outcomes by their associated probabilities and then go for the largest result. Or, mathematically speaking, they sum the products of magnitude and probability of all potential outcomes of each option and then choose the option with the highest expected value. Nearly one hundred years later, Bernoulli (1738) discovered that the utility of outcomes for decision making does not increase linearly but frequently follows a concave function, which marks the beginning of microeconomic decision theory. The theory provides quantifiable assessments of outcomes under uncertainty and has gone a long way to explain human and animal decision making, even though more recent data cast doubt on the logic in some decision situations A Call for Behavioral Theory Primary sensory systems have dedicated physical and chemical receptors that translate environmental energy and information into neural language. Thus, the functions of primary sensory systems are governed by the laws of mechanics, optics, acoustics, and receptor binding. By contrast, there are no dedicated receptors for reward, and the information enters the brain through mechanical, gustatory, visual, and auditory receptors of the sensory systems. The functions of rewards cannot be derived entirely from the physics and chemistry of input events but are based primarily on behavioral effects, and the investigation of reward functions requires behavioral theories that can conceptualize the different effects of rewards on behavior. Thus, the exploration of neural reward mechanisms should not be based primarily on the physics and chemistry of reward objects but on specific behavioral theories that define reward functions. Animal learning theory and microeconomics are two prominent examples of such behavioral theories and constitute the basis for this review. REWARD FUNCTIONS DEFINED BY ANIMAL LEARNING THEORY This section will combine some of the central tenets of animal learning theories in an attempt to define a coherent framework for the investigation of neural reward mechanisms. The framework is based on the description of observable behavior and superficially resembles the behaviorist approach, although mental states 92 SCHULTZ of representation and prediction are essential. Dropping the issues of subjective feelings of pleasure will allow us to do objective behavioral measurements in controlled neurophysiological experiments on animals. To induce subjective feelings of pleasure and positive emotion is a key function of rewards, although it is unclear whether the pleasure itself has a reinforcing, causal effect for behavior (i.e., I feel good because of the outcome I got and therefore will do again what produced the pleasant outcome) or is simply an epiphenomenon (i.e., my behavior gets reinforced and, in addition, I feel good because of the outcome). Learning Rewards induce changes in observable behavior and serve as positive reinforcers by increasing the frequency of the behavior that results in reward. In Pavlovian, or classical, conditioning, the outcome follows the conditioned stimulus (CS) irrespective of any behavioral reaction, and repeated pairing of stimuli with outcomes leads to a representation of the outcome that is evoked by the stimulus and elicits the behavioral reaction ( Three factors govern conditioning, namely contiguity, contingency, and prediction error. Contiguity refers to the requirement of near simultaneity ( requirement is fulfilled. The crucial role of prediction error is derived from Kamin's (1969) blocking effect, which postulates that a reward that is fully predicted does not contribute to learning, even when it occurs in a contiguous and contingent manner. This is conceptualized in the associative learning rules 94 SCHULTZ relate the capacity to learn (associability) in certain situations to the degree of attention evoked by the CS or reward Approach Behavior Rewards elicit two forms of behavioral reactions, approach and consumption. This is because the objects are labeled with appetitive value through innate mechanisms (primary rewards) or, in most cases, classical or instrumental conditioning, after which these objects constitute, strictly speaking, conditioned reinforcers (Wise 2002). Nutritional rewards can derive their value from hunger and thirst states, and satiation of the animal reduces the reward value and consequently the behavioral reactions. Conditioned, reward-predicting stimuli also induce preparatory or approach behavior toward the reward. In Pavlovian conditioning, subjects automatically show nonconsummatory behavioral reactions that would otherwise occur after the primary reward and that increase the chance of consuming the reward, as if a part of the behavioral response has been transferred from the primary reward to the CS (Pavlovian response transfer). In instrumental conditioning, a reward can become a goal for instrumental behavior if two conditions are met. The goal needs to be represented at the time the behavior is being prepared and executed. This representation should contain a prediction of the future reward together with the contingency that associates the behavioral action to the reward Motivational Valence Punishers have opposite valence to rewards, induce withdrawal behavior, and act as negative reinforcers by increasing the behavior that results in decreasing the aversive outcome. Avoidance can be passive when subjects increasingly refrain from doing something that is associated with a punisher (don't do it); active avoidance involves increasing an instrumental response that is likely to reduce the impact of a punisher (get away from it). Punishers induce negative emotional states of anger, fear, and panic. NEUROPHYSIOLOGY OF REWARD BASED ON ANIMAL LEARNING THEORY Primary Reward Neurons responding to liquid or food rewards are found in a number of brain structures, such as orbitofrontal, premotor and prefrontal cortex, striatum, amygdala, and dopamine neurons (Amador et al. 2000, Apicella et al. 1991 Contiguity Procedures involving Pavlovian conditioning provide simple paradigms for learning and allow the experimenter to test the basic requirements of contiguity, contingency, and prediction error. Contiguity can be tested by presenting a reward 1.5-2.0 seconds after an untrained, arbitrary visual or auditory stimulus for several trials. A dopamine neuron that responds initially to a liquid or food reward acquires a response to the CS after some tens of paired CS-reward trials ( Contingency The contingency requirement postulates that in order to be involved in reward prediction, neurons should discriminate between three kinds of stimuli, namely reward-predicting CSs (conditioned exciters), after which reward occurs more frequently compared with no CS 96 SCHULTZ by small activations, and hardly respond to neutral stimuli when response generalization is excluded Further tests assess the specificity of information contained in CS responses. In the typical behavioral tasks used in monkey experiments, the CS may contain several different stimulus components, namely spatial position; visual object features such as color, form, and spatial frequency; and motivational features such as reward prediction. It would be necessary to establish through behavioral testing which of these features is particularly effective in evoking a neural response. For example, neurons in the orbitofrontal cortex discriminate between different CSs on the basis of their prediction of different food and liquid rewards Reward neurons should distinguish rewards from punishers. Different neurons in orbitofrontal cortex respond to rewarding and aversive liquids The omission of reward following a CS moves the contingency toward the diagonal line in Prediction Error Just as with behavioral learning, the acquisition of neuronal responses to rewardpredicting CSs should depend on prediction errors. In the prediction error-defining blocking paradigm, dopamine neurons acquire a response to a CS only when the CS is associated with an unpredicted reward, but not when the CS is paired with a reward that is already predicted by another CS and the occurrence of the reward does not generate a prediction error More stringent tests for the neural coding of prediction errors include formal paradigms of animal learning theory in which prediction errors occur in specific situations. In the blocking paradigm, the blocked CS does not predict a reward. Accordingly, the absence of a reward following that stimulus does not produce a prediction error nor a response in dopamine neurons, and the delivery of a reward does produce a positive prediction error and a dopamine response This equation may constitute a neural equivalent for the prediction error term of (λ-V) of the Rescorla-Wagner learning rule. With these characteristics, the THEORY AND NEUROPHYSIOLOGY OF REWARD 101 bidirectional dopamine error response would constitute an ideal teaching signal for neural plasticity. The neural prediction error signal provides an additional means to investigate the kinds of information contained in the representations evoked by CSs. Time apparently plays a major role in behavioral learning, as demonstrated by the unblocking effects of temporal variations of reinforcement The uncertainty of reward is a major factor for generating the attention that determines learning according to the associability learning rules 102 SCHULTZ Approach Behavior and Goal Directedness Many behavioral tasks in the laboratory involve more than a CS and a reward and comprise instrumental ocular or skeletal reactions, mnemonic delays between instruction cues and behavioral reactions, and delays between behavioral reactions and rewards during which animals can expect the reward. Appropriately conditioned stimuli can evoke specific expectations of reward, and phasic neural responses to these CSs may reflect the process of evocation (see above). Once the representations have been evoked, their content can influence the behavior during some time. Neurons in a number of brain structures show sustained activations after an initial CS has occurred. The activations arise usually during specific epochs of well-differentiated instrumental tasks, such as during movement preparation Reward expectation-related activity in orbitofrontal cortex and amygdala develops as the reward becomes predictable during learning Figure 11 Potential neural mechanisms underlying goal-directed behavior. (a) Delay activity of a neuron in primate prefrontal cortex that encodes, while the movement is being prepared, both the behavioral reaction (left versus right targets) and the kind of outcome obtained for performing the action. From 104 SCHULTZ General learning theory suggests that Pavlovian associations of reward-predicting stimuli in instrumental tasks relate either to explicit CSs or to contexts. The neural correlates of behavioral associations with explicit stimuli may not only involve the phasic responses to CSs described above but also activations at other task epochs. Further neural correlates of Pavlovian conditioning may consist of the sustained activations that occur during the different task periods preceding movements or rewards Theories of goal-directed instrumental behavior postulate that in order to consider rewards as goals of behavior, there should be (a) an expectation of the outcome at the time of the behavior that leads to the reward, and (b) a representation of the contingency between the instrumental action and the outcome THEORY AND NEUROPHYSIOLOGY OF REWARD 105 that occurs in parallel and irrespective of the action. Such a reward would not constitute a goal of the action, and the reward-expecting activation might simply reflect the upcoming reward without being involved in any goal mechanism. By contrast, reward-expecting activations might fulfill the second, more stringent criterion if they are also specific for the action necessary to obtain the reward. These reward-expecting activations differentiate between different behavioral acts and arise only under the condition that the behavior leading to the reward is being prepared or executed REWARD FUNCTIONS DEFINED BY MICROECONOMIC UTILITY THEORY How can we compare apples and pears? We need a numerical scale in order to assess the influence of different rewards on behavior. A good way to quantify the value of individual rewards is to compare them in choice behavior. Given two options, I would choose the one that at this moment has the higher value for me. Give me the choice between a one-dollar bill and an apple, and you will see which one I prefer and thus my action will tell you whether the value of the apple for me is higher or lower or similar compared with one dollar. To be able to put a quantifiable, numerical value onto every reward, even when the value is short-lived, has enormous advantages for getting reward-related behavior under experimental control. To obtain a more complete picture, we need to take into account the uncertainty with which rewards frequently occur. One possibility would be to weigh the value of individual rewards with the probability with which they occur, an approach taken by Pascal ca. 1650. The sum of the products of each potential reward and its probability defines the expected value (EV) of the probability distribution and thus the theoretically expected payoff of an option, according to EV = i (p i · x i ); i = 1, n; n = number of rewards. With increasing numbers of trials, the measured mean of the actually occurring distribution will approach the expected value. Pascal conjectured that human choice behavior could be approximated by this procedure. 106 SCHULTZ Despite its advantages, expected value theory has limits when comparing very small with very large rewards or when comparing values at different start positions. Rather than following physical sizes of reward value in a linear fashion, human choice behavior in many instances increases more slowly as the values get higher, and the term of utility, or in some cases prospect, replaces the term of value when the impact of rewards on choices is assessed (Bernoulli 1738 A separation of value and uncertainty as components of utility can be achieved mathematically by using, for example, the negative exponential utility function often employed in financial mathematics. Using the exponential utility function for EU results in which can be developed by the Laplace transform into where EV is expected value, var is variance, and the probability distribution pi is Gaussian. Thus, EU is expressed as f(EV, variance). This procedure uses variance as a measure of uncertainty. Another measure of uncertainty is the entropy of information theory, which might be appropriate to use when dealing with information processing in neural systems, but entropy is not commonly employed for describing decision making in microeconomics. Taken together, microeconomic utility theory has defined basic reward parameters, such as magnitude, probability, expected value, expected utility, and variance, that can be used for neurobiological experiments searching for neural correlates of decision making under uncertainty. NEUROPHYSIOLOGY OF REWARD BASED ON ECONOMIC THEORY Magnitude The easiest quantifiable measure of reward for animals is the volume of juice, which animals can discriminate in submilliliter quantities 108 SCHULTZ and dopamine neurons Probability Simple tests for reward probability involve CSs that differentially predict the probability with which a reward, as opposed to no reward, will be delivered for trial completion in Pavlovian or instrumental tasks. Dopamine neurons show increasing phasic responses to CSs that predict reward with increasing probability Expected Value Parietal neurons show increasing task-related activations with both the magnitude and probability of reward that do not seem to distinguish between the two components of expected value Uncertainty Graphical analysis and application of the Laplace transform on the exponential utility function would permit experimenters to separate the components of expected value and utility from the uncertainty inherent in probabilistic gambles. Would the 110 SCHULTZ brain be able to produce an explicit signal that reflects the level of uncertainty, similar to producing a reward signal? For both reward and uncertainty, there are no specialized sensory receptors. A proportion of dopamine neurons show a sustained activation during the CS-reward interval when tested with CSs that predict reward at increasing probabilities, as opposed to no reward. The activation is highest for reward at p = 0.5 and progressively lower for probabilities further away from p = 0.5 in either direction CONCLUSIONS It is intuitively simple to understand that the use of well-established behavioral theories can only be beneficial when working with mechanisms underlying behavioral reactions. Indeed, these theories can very well define the different functions of rewards on behavior. It is then a small step on firm ground to base the investigation of neural mechanisms underlying the different reward functions onto the phenomena characterized by these theories. Although each theory has its own particular emphasis, they deal with the same kinds of outcome events of behavior, and it is more confirmation than surprise to see that many neural reward mechanisms can be commonly based on, and understood with, several theories. For the experimenter, the use of different theories provides good explanations for an interesting spectrum of reward functions that may not be so easily accessible by using only a single theory. For example, it seems that uncertainty plays a larger role in parts of microeconomic theory than in learning theory, and the investigation of neural mechanisms of uncertainty in outcomes of behavior can rely on several hundred years of thoughts about decision making (Pascal 1650 in Glimcher 2003, Bernoulli 1738