• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Short-term gains, long-term pains: How cues about state aid learning in dynamic environments (2009)

by Todd M. Gureckis, Bradley C. Love
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 31
Next 10 →

Regulatory Fit and Systematic Exploration in a Dynamic Decision-Making Environment

by A. Ross Otto, Arthur B. Markman, Todd M. Gureckis, Bradley C. Love
"... This work explores the influence of motivation on choice behavior in a dynamic decision-making environment, where the payoffs from each choice depend on one’s recent choice history. Previous research reveals that participants in a regulatory fit exhibit increased levels of exploratory choice and fle ..."
Abstract - Cited by 17 (6 self) - Add to MetaCart
This work explores the influence of motivation on choice behavior in a dynamic decision-making environment, where the payoffs from each choice depend on one’s recent choice history. Previous research reveals that participants in a regulatory fit exhibit increased levels of exploratory choice and flexible use of multiple strategies over the course of an experiment. The present study placed promotion and prevention-focused participants in a dynamic environment for which optimal performance is facilitated by systematic exploration of the decision space. These participants either gained or lost points with each choice. Our experiment revealed that participants in a regulatory fit were more likely to engage in systematic exploration of the task environment than were participants in a regulatory mismatch and performed more optimally as a result. Implications for contemporary models of human reinforcement learning are discussed.
(Show Context)

Citation Context

...fs on each trial are dependent on the proportion of selections made to each option over a 20-trial moving window. Thus, this proportion of responses defines the current state of the task environment (=-=Gureckis & Love, 2009-=-). If the next response changes the relative proportion of the previous 20 responses, then the state changes. In the rising optimum, there are 21 possible states. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 Rising ...

You don’t want to know what you’re missing: When information about forgone rewards impedes dynamic decision making

by A. Ross Otto, Bradley C. Love
"... When people learn to make decisions from experience, a reasonable intuition is that additional relevant information should improve their performance. In contrast, we find that additional information about foregone rewards (i.e., what could have gained at each point by making a different choice) seve ..."
Abstract - Cited by 8 (2 self) - Add to MetaCart
When people learn to make decisions from experience, a reasonable intuition is that additional relevant information should improve their performance. In contrast, we find that additional information about foregone rewards (i.e., what could have gained at each point by making a different choice) severely hinders participants ’ ability to repeatedly make choices that maximize long-term gains. We conclude that foregone reward information accentuates the local superiority of short-term options (e.g., consumption) and consequently biases choice away from productive long-term options (e.g., exercise). These conclusions are consistent with a standard reinforcement-learning mechanism that processes information about experienced and forgone rewards. In contrast to related contributions using delay-of-gratification paradigms, we do not posit separate top-down and emotion-driven systems to explain performance. We find that individual and group data are well characterized by a single reinforcement-learning mechanism that combines information about experienced and foregone rewards.
(Show Context)

Citation Context

... to maximize gains, making use of a reward signal that provides information about the “goodness” of actions. This framework has been used to model human decision-making behavior (Fu & Anderson, 2006; =-=Gureckis & Love, 2009-=-) as well as firing patterns in dopaminergic neurons in primates (Schultz, Dayan, & Montague, 1997). Our RL model demonstrates that weighting of a fictive (i.e., forgone) reward signal for the action ...

Taking more, now: The optimality of impulsive choice hinges on environment structure

by A Ross Otto , Arthur B Markman , Bradley C Love - Social Psychological and Personality Science , 2012
"... Abstract Impulsivity is a stable personality trait associated with myopic choice behavior that favors immediate rewards over larger, delayed rewards and is often characterized as maladaptive inside and outside of the laboratory. An alternative view suggests that the consequences of trait impulsivit ..."
Abstract - Cited by 4 (4 self) - Add to MetaCart
Abstract Impulsivity is a stable personality trait associated with myopic choice behavior that favors immediate rewards over larger, delayed rewards and is often characterized as maladaptive inside and outside of the laboratory. An alternative view suggests that the consequences of trait impulsivity depend on the nature of the task environment. On this view, the optimal level of impulsivity varies across task payoff structures. This hypothesis is tested in two dynamic decision-making tasks that differ in the relative payoffs of delayed and immediate rewards. In a task that favors delayed rewards to immediate rewards, high-impulsive participants perform worse than low-impulsive participants. In contrast, in a task that favors immediate rewards over delayed rewards, high-impulsive participants outperform low-impulsive participants. These results suggest a more nuanced conceptualization of trait impulsivity as it applies to rewards-related decision making that may help explain the variability observed in this trait across individuals.
(Show Context)

Citation Context

...3.3%, Caucasian: 46.5%, Native American: <0.1%, Others: 1.5%. The ages of participants in this pool ranged from 17 to 55 (M ! 19.08, SD ! 1.76). Materials and procedure. Participants were administered the BIS-11 questionnaire (Patton et al., 1995) that consists of 30 statements, such as ‘‘I do things without thinking’’ and ‘‘I am more interested in the present than the future’’ with which participants stated their level of agreement on a 4-point scale. Higher summed scores indicate higher levels of impulsivity. Following the questionnaire, participants played the ‘‘Farming on Mars’’ game (see Gureckis & Love, 2009), an adaptation of Herrnstein et al.’s (1993) choice task. Participants read a story about National Aeronautics and Space Administration (NASA) scientists on Mars attempting to extract oxygen from its atmosphere in order to create breathable air for use in a human colony. They were informed that, as members of the project, their job was to extract as much oxygen as possible from the atmosphere. To do this, they needed to repeatedly choose between two ‘‘oxygen-extraction robots’’ with different properties. Beyond this information, participants were only told that the specific oxygen-extracting ...

When, What, and How Much to Reward in Reinforcement Learning‐Based Models of Cognition

by Christian P. Janssen, C Wayne D. Grayb - Cognitive science , 2012
"... Reinforcement learning approaches to cognitive modeling represent task acquisition as learning to choose the sequence of steps that accomplishes the task while maximizing a reward. However, an apparently unrecognized problem for modelers is choosing when, what, and how much to reward; that is, when ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Reinforcement learning approaches to cognitive modeling represent task acquisition as learning to choose the sequence of steps that accomplishes the task while maximizing a reward. However, an apparently unrecognized problem for modelers is choosing when, what, and how much to reward; that is, when (the moment: end of trial, subtask, or some other interval of task performance), what (the objective function: e.g., performance time or performance accuracy), and how much (the magni-tude: with binary, categorical, or continuous values). In this article, we explore the problem space of these three parameters in the context of a task whose completion entails some combination of 36 state–action pairs, where all intermediate states (i.e., after the initial state and prior to the end state) represent progressive but partial completion of the task. Different choices produce profoundly different learning paths and outcomes, with the strongest effect for moment. Unfortunately, there is little discussion in the literature of the effect of such choices. This absence is disappointing, as the choice of when, what, and how much needs to be made by a modeler for every learning model.
(Show Context)

Citation Context

...nd feedback can scale up to explain effects known as melioration of performance, where maximization of short-term gains requires different action sequences than maximization of long-term gains (e.g., =-=Gureckis & Love, 2009-=-; Herrnstein, 1990; Neth, Sims, & Gray, 2006; Shanks, Tunney, & McCarthy, 2002). The magnitude of rewards can vary more easily between alternative settings and will depend on the objective function at...

How Long Have I Got? Making Optimal Visit Durations in a Dual-Task Setting

by George D. Farmer, Christian P. Janssen, Duncan P. Brumby
"... Can people multitask optimally? We use a dual-task paradigm in which participants had to enter digits while monitoring a randomly moving cursor. Participants earned points for entering digits correctly and were docked points if they let the cursor drift outside of a target area. The severity of the ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Can people multitask optimally? We use a dual-task paradigm in which participants had to enter digits while monitoring a randomly moving cursor. Participants earned points for entering digits correctly and were docked points if they let the cursor drift outside of a target area. The severity of the tracking penalty was varied between conditions. Participants therefore had to decide how long to leave the tracking task unattended. As expected, participants left the tracking task for longer when the penalty was less severe and also when the cursor moved less erratically. To test whether participants were adjusting their behavior in an optimal manner, observed behavior was compared to a prediction of the optimal visit duration for each condition. Overall, the degree of correspondence between the observed behavior and the predicted optimum was very good, suggesting that people can multitask in a near optimal fashion given explicit feedback on their performance.
(Show Context)

Citation Context

...gy that maximizes the highest local reward (e.g., the reward value after a single visit) rather than finding the strategy that maximizes the cumulative reward to be had at the end of the trial (e.g., =-=Gureckis & Love, 2009-=-; Neth, Sims, & Gray, 2006). Participants Method Twenty Master’s students (seven female) from University College London participated on a voluntary basis. Participants were aged between 22 and 37 year...

Reviewed by:

by W. Bradley Knox, A. Ross Otto, Peter Stone, Bradley C. Love, Bradley C. Love, Department Of, W. Bradley Knox, A. Ross Otto , 2012
"... doi: 10.3389/fpsyg.2011.00398 The nature of belief-directed exploratory choice in human decision-making ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
doi: 10.3389/fpsyg.2011.00398 The nature of belief-directed exploratory choice in human decision-making
(Show Context)

Citation Context

...of their simplicity and unexamined intuitions about the “randomness” of exploratory behavior, reflexive approaches are commonly adopted to model human behavior (Daw et al., 2006; Worthy et al., 2007; =-=Gureckis and Love, 2009-=-; Pearson et al., 2009; Jepma and Nieuwenhuis, 2011). Reflexive approaches are also prominent in the design of artificial agents (Sutton and Barto, 1998). THE LEAPFROG VARIANT OF THE CLASSIC BANDIT TA...

Reprints and permission: sagepub.com/journalsPermissions.nav

by unknown authors
"... Decisions are a pervasive part of people’s lives. The impor-tance and impact of these decisions may increase with an indi-vidual’s age. Older adults often work in prominent positions and face numerous important personal decisions, such as which retirement options to select, how to spend their life s ..."
Abstract - Add to MetaCart
Decisions are a pervasive part of people’s lives. The impor-tance and impact of these decisions may increase with an indi-vidual’s age. Older adults often work in prominent positions and face numerous important personal decisions, such as which retirement options to select, how to spend their life sav-ings, and how to best live out the remaining years of their lives. Likewise, younger adults must choose which career path to take, which college to attend, and when to buy a house. The importance of decision making throughout life makes it criti-cal to understand how age affects decision-making strategies. Decisions rarely occur without context. Often, the rewards available from each option depend on previous choices. One’s immediate job prospects or retirement investment options are dependent on the state that one has reached. Generally, one cannot apply for teaching jobs without first deciding to attend
(Show Context)

Citation Context

... Experiment 2, participants performed one of two dynamicsdecision-making tasks in which reward values were dependentson the sequence of previous choices (Bogacz, McClure, Li,sCohen, & Montague, 2007; =-=Gureckis & Love, 2009-=-; Otto,sGureckis, Markman, & Love, 2009). Figure 3 shows thesreward structures for the tasks. In each task, there were twosoptions: a decreasing option and an increasing option. Thesdecreasing option ...

Reprints and permission:

by unknown authors
"... Decisions are a pervasive part of people’s lives. The impor-tance and impact of these decisions may increase with an indi-vidual’s age. Older adults often work in prominent positions and face numerous important personal decisions, such as which retirement options to select, how to spend their life s ..."
Abstract - Add to MetaCart
Decisions are a pervasive part of people’s lives. The impor-tance and impact of these decisions may increase with an indi-vidual’s age. Older adults often work in prominent positions and face numerous important personal decisions, such as which retirement options to select, how to spend their life sav-ings, and how to best live out the remaining years of their lives. Likewise, younger adults must choose which career path to take, which college to attend, and when to buy a house. The importance of decision making throughout life makes it criti-cal to understand how age affects decision-making strategies. Decisions rarely occur without context. Often, the rewards available from each option depend on previous choices. One’s immediate job prospects or retirement investment options are dependent on the state that one has reached. Generally, one cannot apply for teaching jobs without first deciding to attend
(Show Context)

Citation Context

... Experiment 2, participants performed one of two dynamicsdecision-making tasks in which reward values were dependentson the sequence of previous choices (Bogacz, McClure, Li,sCohen, & Montague, 2007; =-=Gureckis & Love, 2009-=-; Otto,sGureckis, Markman, & Love, 2009). Figure 3 shows thesreward structures for the tasks. In each task, there were twosoptions: a decreasing option and an increasing option. Thesdecreasing option ...

Of matchers and maximizers: How competition shapes choice under risk and uncertainty

by Christin Schulze , Don Van Ravenzwaaij , Ben R Newell
"... a b s t r a c t In a world of limited resources, scarcity and rivalry are central challenges for decision makers-animals foraging for food, corporations seeking maximal profits, and athletes training to win, all strive against others competing for the same goals. In this article, we establish the r ..."
Abstract - Add to MetaCart
a b s t r a c t In a world of limited resources, scarcity and rivalry are central challenges for decision makers-animals foraging for food, corporations seeking maximal profits, and athletes training to win, all strive against others competing for the same goals. In this article, we establish the role of competitive pressures for the facilitation of optimal decision making in simple sequential binary choice tasks. In two experiments, competition was introduced with a computerized opponent whose choice behavior reinforced one of two strategies: If the opponent probabilistically imitated participant choices, probability matching was optimal; if the opponent was indifferent, probability maximizing was optimal. We observed accurate asymptotic strategy use in both conditions irrespective of the provision of outcome probabilities, suggesting that participants were sensitive to the differences in opponent behavior. An analysis of reinforcement learning models established that computational conceptualizations of opponent behavior are critical to account for the observed divergence in strategy adoption. Our results provide a novel appraisal of probability matching and show how this individually &apos;irrational&apos; choice phenomenon can be socially adaptive under competition.
(Show Context)

Citation Context

...imicking opponent and probability maximizing when encountering an indifferent opponent. Learning to choose optimally in our choice paradigm requires a number of cognitive processes also vital for decisions under uncertainty and competition in many real-world situations. These include the exploration of choice profitability and learning about the motives and choice strategies of competing agents. Computational models of reinforcement learning have been shown to successfully and parsimoniously describe such cognitive mechanisms in related tasks (e.g., Busemeyer & Stout, 2002; Erev & Roth, 1998; Gureckis & Love, 2009; Rieskamp & Otto, 2006) and provide an attractive approach to illuminating the nature of learning mechanisms adopted by decision makers in our paradigm. Following the presentation of the behavioral data, we therefore discuss the applicability of a variety of computational models of reinforcement learning that differ with regard to the importance they place on (solely) maximizing profit and out-smarting opponents.2. Experiment 1 2.1. Method 2.1.1. Participants Fifty (35 female) undergraduate students from the University of New South Wales with a mean age of 18.92 years (SD = 1.19 years) partic...

DOI 10.3758/s13423-012-0324-9

by Psychon Bull Rev, Darrell A. Worthy, Melissa J. Hawthorne, A. Ross Otto, D. A. Worthy, M. J. Hawthorne, A. R. Otto
"... Heterogeneity of strategy use in the Iowa gambling task: A comparison of win-stay/lose-shift and reinforcement learning models ..."
Abstract - Add to MetaCart
Heterogeneity of strategy use in the Iowa gambling task: A comparison of win-stay/lose-shift and reinforcement learning models
(Show Context)

Citation Context

...ers, &Stout,2008; Ahn, Krawitz, Kim, Busemeyer, & Brown, 2011). The EV, PVL, and other RL models have been a dominant class of models used to characterize decisionmaking behavior in numerous studies (=-=Gureckis & Love, 2009-=-a, b; Worthy, Maddox, & Markman, 2007). The basic assumptions underpinning these RL models is that the outcomes of past decisions are integrated to determine expected reward values for each option, an...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University