Results 1  10
of
19
Attention in learning
 Current Directions in Psychological Science
, 2003
"... explaining many phenomena in learning. The mechanism of selective attention in learning is also well motivated by its ability to minimize proactive interference and enhance generalization, thereby accelerating learning. Therefore, not only does the mechanism help explain behavioral phenomena, it mak ..."
Abstract

Cited by 37 (9 self)
 Add to MetaCart
explaining many phenomena in learning. The mechanism of selective attention in learning is also well motivated by its ability to minimize proactive interference and enhance generalization, thereby accelerating learning. Therefore, not only does the mechanism help explain behavioral phenomena, it makes sense that it should have evolved (Kruschke & Hullinger, 2010). The phrase “learned selective attention ” denotes three qualities. First, “attention ” means the amplification or attenuation of the processing of stimuli. Second, “selective” refers to differentially amplifying and/or attenuating a subset of the components of the stimulus. This selectivity within a stimulus is different from attenuating or amplifying all aspects of a stimulus simultaneously (cf. Larrauri & Schmajuk, 2008). Third, “learned ” denotes the idea that the allocation of selective processing is retained for future use. The allocation may be context sensitive, so that attention is allocated differently in different contexts. There are many phenomena in human and animal learning that suggest the involvement of learned selective attention. The first part of this chapter briefly reviews some of those phenomena. The emphasis of the chapter is not the empirical phenomena, however. Instead, the focus is on a collection of models that formally express theories of learned attention. These models will be surveyed subsequently. Phenomena suggestive of selective attention in learning There are many phenomena in human and animal learning that suggest that learning involves allocating attention to informative cues, while ignoring uninformative cues. The following subsections indicate the benefits of selective allocation of attention, and illustrate the benefits with particular findings.
Locally Bayesian Learning with Applications to Retrospective Revaluation and Highlighting
 Psychological Review
, 2006
"... A scheme is described for locally Bayesian parameter updating in models structured as successions of component functions. The essential idea is to backpropagate the target data to interior modules, such that an interior component’s target is the input to the next component that maximizes the probab ..."
Abstract

Cited by 26 (7 self)
 Add to MetaCart
A scheme is described for locally Bayesian parameter updating in models structured as successions of component functions. The essential idea is to backpropagate the target data to interior modules, such that an interior component’s target is the input to the next component that maximizes the probability of the next component’s target. Each layer then does locally Bayesian learning. The approach assumes online trialbytrial learning. The resulting parameter updating is not globally Bayesian but can better capture human behavior. The approach is implemented for an associative learning model that first maps inputs to attentionally filtered inputs and then maps attentionally filtered inputs to outputs. The Bayesian updating allows the associative model to exhibit retrospective revaluation effects such as backward blocking and unovershadowing, which have been challenging for associative learning models. The backpropagation of target values to attention allows the model to show trialorder effects, including highlighting and differences in magnitude of forward and backward blocking, which have been challenging for Bayesian learning models.
Dynamical causal learning
 In
, 2003
"... Current psychological theories of human causal learning and judgment focus primarily on longrun predictions: two by estimating parameters of a causal Bayes nets (though for different parameterizations), and a third through structural learning. This paper focuses on people’s shortrun behavior by ex ..."
Abstract

Cited by 21 (10 self)
 Add to MetaCart
Current psychological theories of human causal learning and judgment focus primarily on longrun predictions: two by estimating parameters of a causal Bayes nets (though for different parameterizations), and a third through structural learning. This paper focuses on people’s shortrun behavior by examining dynamical versions of these three theories, and comparing their predictions to a realworld dataset. 1
Bayesian generic priors for causal learning
 Psychological Review
, 2008
"... The article presents a Bayesian model of causal learning that incorporates generic priors—systematic assumptions about abstract properties of a system of cause–effect relations. The proposed generic priors for causal learning favor sparse and strong (SS) causes—causes that are few in number and high ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
The article presents a Bayesian model of causal learning that incorporates generic priors—systematic assumptions about abstract properties of a system of cause–effect relations. The proposed generic priors for causal learning favor sparse and strong (SS) causes—causes that are few in number and high in their individual powers to produce or prevent effects. The SS power model couples these generic priors with a causal generating function based on the assumption that unobservable causal influences on an effect operate independently (P. W. Cheng, 1997). The authors tested this and other Bayesian models, as well as leading nonnormative models, by fitting multiple data sets in which several parameters were varied parametrically across multiple types of judgments. The SS power model accounted for data concerning judgments of both causal strength and causal structure (whether a causal link exists). The model explains why human judgments of causal structure (relative to a Bayesian model lacking these generic priors) are influenced more by causal power and the base rate of the effect and less by sample size. Broader implications of the Bayesian framework for human learning are discussed.
Modeling Human Performance in Statistical Word Segmentation
"... What mechanisms support the ability of human infants, adults, and other primates to identify words from fluent speech using distributional regularities? In order to better characterize this ability, we collected data from adults in an artificial language segmentation task similar to Saffran, Newport ..."
Abstract

Cited by 20 (5 self)
 Add to MetaCart
What mechanisms support the ability of human infants, adults, and other primates to identify words from fluent speech using distributional regularities? In order to better characterize this ability, we collected data from adults in an artificial language segmentation task similar to Saffran, Newport, and Aslin (1996) in which the length of sentences was systematically varied between groups of participants. We then compared the fit of a variety of computational models— including simple statistical models of transitional probability and mutual information, a clustering model based on mutual information by Swingley (2005), PARSER (Perruchet & Vintner, 1998), and a Bayesian model. We found that while all models were able to successfully complete the task, fit to the human data varied considerably, with the Bayesian model achieving the highest correlation with our results.
The rat as particle filter
"... The core tenet of Bayesian modeling is that subjects represent beliefs as distributions over possible hypotheses. Such models have fruitfully been applied to the study of learning in the context of animal conditioning experiments (and analogously designed human learning tasks), where they explain ph ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
The core tenet of Bayesian modeling is that subjects represent beliefs as distributions over possible hypotheses. Such models have fruitfully been applied to the study of learning in the context of animal conditioning experiments (and analogously designed human learning tasks), where they explain phenomena such as retrospective revaluation that seem to demonstrate that subjects entertain multiple hypotheses simultaneously. However, a recent quantitative analysis of individual subject records by Gallistel and colleagues cast doubt on a very broad family of conditioning models by showing that all of the key features the models capture about even simple learning curves are artifacts of averaging over subjects. Rather than smooth learning curves (which Bayesian models interpret as revealing the gradual tradeoff from prior to posterior as data accumulate), subjects acquire suddenly, and their predictions continue to fluctuate abruptly. These data demand revisiting the model of the individual versus the ensemble, and also raise the worry that more sophisticated behaviors thought to support Bayesian models might also emerge artifactually from averaging over the simpler behavior of individuals. We suggest that the suddenness of changes in subjects ’ beliefs (as expressed in conditioned behavior) can be modeled by assuming they are conducting inference using sequential Monte Carlo sampling with a small number of samples — one, in our simulations. Ensemble behavior resembles exact Bayesian models since, as in particle filters, it averages over many samples. Further, the model is capable of exhibiting sophisticated behaviors like retrospective revaluation at the ensemble level, even given minimally sophisticated individuals that do not track uncertainty from trial to trial. These results point to the need for more sophisticated experimental analysis to test Bayesian models, and refocus theorizing on the individual, while at the same time clarifying why the ensemble may be of interest. 1
Locally Bayesian Learning
"... This article is concerned with trialbytrial, online learning of cueoutcome mappings. In models structured as successions of component functions, an external target can be backpropagated such that the lower layer’s target is the input to the higher layer that maximizes the probability of the highe ..."
Abstract

Cited by 7 (6 self)
 Add to MetaCart
This article is concerned with trialbytrial, online learning of cueoutcome mappings. In models structured as successions of component functions, an external target can be backpropagated such that the lower layer’s target is the input to the higher layer that maximizes the probability of the higher layer’s target. Each layer then does locally Bayesian learning. The resulting parameter updating is not globally Bayesian, but can better capture human behavior. The approach is implemented for an associative learning model that first maps inputs to attentionally filtered inputs, and then maps attentionally filtered inputs to outputs. The model is applied to the humanlearning phenomenon called highlighting, which is challenging to other extant Bayesian models, including the rational model of Anderson, the Kalman filter model of Dayan and
The RescorlaWagner algorithm and Maximum Likelihood estimation of causal parameters”. NIPS
 In L
, 2004
"... This paper analyzes generalization of the classic RescorlaWagner (RW) learning algorithm and studies their relationship to Maximum Likelihood estimation of causal parameters. We prove that the parameters of two popular causal models, ∆P and P C, can be learnt by the same generalized linear Rescorl ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
This paper analyzes generalization of the classic RescorlaWagner (RW) learning algorithm and studies their relationship to Maximum Likelihood estimation of causal parameters. We prove that the parameters of two popular causal models, ∆P and P C, can be learnt by the same generalized linear RescorlaWagner (GLRW) algorithm provided genericity conditions apply. We characterize the fixed points of these GLRW algorithms and calculate the fluctuations about them, assuming that the input is a set of i.i.d. samples from a fixed (unknown) distribution. We describe how to determine convergence conditions and calculate convergence rates for the GLRW algorithms under these conditions. 1
Semirational Models of Conditioning: The Case of Trial Order
, 2007
"... Bayesian treatments of animal conditioning start from a generative model that specifies precisely a set of assumptions about the structure of the learning task. Optimal rules for learning are direct mathematical consequences of these assumptions. In terms of Marr’s (1982) levels of analyses, the mai ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Bayesian treatments of animal conditioning start from a generative model that specifies precisely a set of assumptions about the structure of the learning task. Optimal rules for learning are direct mathematical consequences of these assumptions. In terms of Marr’s (1982) levels of analyses, the main task at the computational level
Augmented RescorlaWagner and maximum likelihood estimation
 In B
, 2006
"... We show that linear generalizations of RescorlaWagner can perform Maximum Likelihood estimation of the parameters of all generative models for causal reasoning. Our approach involves augmenting variables to deal with conjunctions of causes, similar to the agumented model of Rescorla. Our results in ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We show that linear generalizations of RescorlaWagner can perform Maximum Likelihood estimation of the parameters of all generative models for causal reasoning. Our approach involves augmenting variables to deal with conjunctions of causes, similar to the agumented model of Rescorla. Our results involve genericity assumptions on the distributions of causes. If these assumptions are violated, for example for the Cheng causal power theory, then we show that a linear RescorlaWagner can estimate the parameters of the model up to a nonlinear transformtion. Moreover, a nonlinear RescorlaWagner is able to estimate the parameters directly to within arbitrary accuracy. Previous results can be used to determine convergence and to estimate convergence rates. 1