Results 11 - 20
of
127
A bayesian framework for word segmentation: Exploring the effects of context
- In 46th Annual Meeting of the ACL
, 2009
"... Since the experiments of Saffran et al. (1996a), there has been a great deal of interest in the question of how statistical regularities in the speech stream might be used by infants to begin to identify individual words. In this work, we use computational modeling to explore the effects of differen ..."
Abstract
-
Cited by 26 (7 self)
- Add to MetaCart
Since the experiments of Saffran et al. (1996a), there has been a great deal of interest in the question of how statistical regularities in the speech stream might be used by infants to begin to identify individual words. In this work, we use computational modeling to explore the effects of different assumptions the learner might make regarding the nature of words – in particular, how these assumptions affect the kinds of words that are segmented from a corpus of transcribed child-directed speech. We develop several models within a Bayesian ideal observer framework, and use them to examine the consequences of assuming either that words are independent units, or units that help to predict other units. We show through empirical and theoretical results that the assumption of independence causes the learner to undersegment the corpus, with many two- and three-word sequences (e.g. what’s that, do you, in the house) misidentified as individual words. In contrast, when the learner assumes that words are predictive, the resulting segmentation is far more accurate. These results indicate that taking context into account is important for a statistical word segmentation strategy to be successful, and raise the possibility that even young infants may be able to exploit more subtle statistical patterns than have usually been considered. 1
Learning overhypotheses with hierarchical Bayesian models
"... Inductive learning is impossible without overhypotheses, or constraints on the hypotheses considered by the learner. Some of these overhypotheses must be innate, but we suggest that hierarchical Bayesian models help explain how the rest can be acquired. To illustrate this claim, we develop models th ..."
Abstract
-
Cited by 25 (11 self)
- Add to MetaCart
Inductive learning is impossible without overhypotheses, or constraints on the hypotheses considered by the learner. Some of these overhypotheses must be innate, but we suggest that hierarchical Bayesian models help explain how the rest can be acquired. To illustrate this claim, we develop models that acquire two kinds of overhypotheses — overhypotheses about feature variability (e.g. the shape bias in word learning) and overhypotheses about the grouping of categories into ontological kinds like objects and substances.
Homo Heuristicus: Why Biased Minds Make Better Inferences
, 2008
"... Heuristics are efficient cognitive processes that ignore information. In contrast to the widely held view that less processing reduces accuracy, the study of heuristics shows that less information, computation, and time can in fact improve accuracy. We review the major progress made so far: (a) the ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
Heuristics are efficient cognitive processes that ignore information. In contrast to the widely held view that less processing reduces accuracy, the study of heuristics shows that less information, computation, and time can in fact improve accuracy. We review the major progress made so far: (a) the discovery of less-is-more effects; (b) the study of the ecological rationality of heuristics, which examines in which environments a given strategy succeeds or fails, and why; (c) an advancement from vague labels to computational models of heuristics; (d) the development of a systematic theory of heuristics that identifies their building blocks and the evolved capacities they exploit, and views the cognitive system as relying on an ‘‘adaptive toolbox;’ ’ and (e) the development of an empirical methodology that accounts for individual differences, conducts competitive tests, and has provided evidence for people’s adaptive use of heuristics. Homo heuristicus has a biased mind and ignores part of the available information, yet a biased mind can handle uncertainty more efficiently and robustly than an unbiased mind relying on more resource-intensive and general-purpose processing strategies.
Why Are Different Features Central for Natural Kinds and Artifacts?: The Role of Causal Status in Determining Feature Centrality
, 1998
"... Ahn and Lassaline [Ahn, W., Lassaline, M.E., 1995. Causal structure in categorization. ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
Ahn and Lassaline [Ahn, W., Lassaline, M.E., 1995. Causal structure in categorization.
Isolated and Interrelated Concepts
"... A continuum between purely isolated and purely interrelated concepts is described. A concept is interrelated to the extent that it is influenced by other concepts. Methods for manipulating and identiying a concept's degree of interrelatedness are introduced. Relatively isolated concepts are empiri ..."
Abstract
-
Cited by 21 (7 self)
- Add to MetaCart
A continuum between purely isolated and purely interrelated concepts is described. A concept is interrelated to the extent that it is influenced by other concepts. Methods for manipulating and identiying a concept's degree of interrelatedness are introduced. Relatively isolated concepts are empirically identified by a relatively large use of nondiagnostic features, and by better categorization performance for a concept's prototype than for a caricature of the concept. Relatively interrelated concepts are identified by minimal use of nondiagnostic features, and by better categorization performance for a caricature than a prototype. A concept is likely to be relatively isolated when: subjects are instructed to create images for their concepts rather than find discriminating features, concepts are given unrelated labels, and the categories that are displayed alternate rarely between trials. The entire set of manipulations and measurements supports a graded distinction between isolated and interrelated concepts. The distinction is applied to current models of category learning, and a connectionist framework for interpreting the empirical results is presented. Modern research on concept representation and learning has evolved from two traditions. One tradition connects concept acquisition with language in general and word learning in specific (Lakoff, 1986; Saussure, 1915/1959). Concepts are approximately equated with single words or phrases. In this tradition, for example, evidence that a child has acquired the adult concept of dog comes from the child's use of the word "dog" to designate dogs. The other tradition connects concept acquisition with object recognition (Biederman, 1987). From this perspective, concept learning involves learning to correctly cate...
Eyetracking and selective attention in category learning
- Cognitive Psychology
, 2003
"... conducted. Forty years of research has assumed that category learning often involves learning to selectively attend to only those stimulus dimensions useful for classification. We confirmed that participants learned to allocate their attention optimally. We also found that learners tend to fixate al ..."
Abstract
-
Cited by 20 (7 self)
- Add to MetaCart
conducted. Forty years of research has assumed that category learning often involves learning to selectively attend to only those stimulus dimensions useful for classification. We confirmed that participants learned to allocate their attention optimally. We also found that learners tend to fixate all stimulus dimensions early in learning. This result obtained despite evidence that participants were also testing one-dimensional rules during this period. Finally, the restriction of eye movements to only relevant dimensions tended to occur only after errors were largely (or completely) eliminated. We interpret these findings as consistent with multiple-systems theories of learning which maximize information input in order to maximize the number of learning modules involved, and which focus solely on relevant information only after one module has solved the learning problem.
Mixture Models of Categorization
- Journal of Mathematical Psychology
, 2002
"... Many currently popular models of categorization are either strictly parametric (e.g., prototype models, decision bound models) or strictly nonparametric (e.g., exemplar models) (Ashby & Alfonso-Reese, 1995). In this article, a family of semi-parametric classifiers is investigated where categories ar ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
Many currently popular models of categorization are either strictly parametric (e.g., prototype models, decision bound models) or strictly nonparametric (e.g., exemplar models) (Ashby & Alfonso-Reese, 1995). In this article, a family of semi-parametric classifiers is investigated where categories are represented by a finite mixture distribution. The advantage of these mixture models of categorization is that they contain several parametric models and nonparametric models as a special case. Specifically, it is shown that both decision bound models (Ashby & Maddox, 1992, 1993) and the generalized context model (Nosofsky, 1986) can be interpreted as two extreme cases of a common mixture model. Furthermore, many other (semi-parametric) models of categorization can be derived from the same generic mixture framework. In this article, several examples are discussed, and a parameter estimation procedure for fitting these models is outlined. To illustrate the approach, several specific models are fitted to a data set collected by McKinley and Nosofsky (1995). The results suggest that semi-parametric models are a promising alternative for future model development. Formal models of categorization are often closely related to statistical methods of probability density estimation (Ashby & Alfonso-Reese, 1995). In statistics, a distinction is made between parametric estimators, that make strong assumptions about the distribution of the sample data, and nonparametric estimators that make only weak distributional assumptions. In accord with this distinction, Ashby and Alfonso-Reese defined parametric classifiers as those classifiers that make strong assumptions about the functional form of the category distributions, and nonparametric classifiers as classifiers that make almost no assumptions about the category form. Prototype models (Reed, 1972) and decision bound models (Ashby & Maddox, 1992, 1993) are parametric classifiers, because they make strong assumptions about category structure. Decision bound models, for example, assume that the category distributions are multivariate normal (see Ashby, 1992, for a motivation). Despite this strong assumption (and the fact that these models can only predict linear or quadratic decision bounds), Ashby and Maddox (1992, 1993)
Knowledge and Concept Learning
, 1997
"... ositive side, though, the second person might have some advantage over the first person in learning how to shift gears, because the second person would not have to overcome negative transfer from experience with automatic transmissions. As another example, imagine that you are an explorer visiting a ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
ositive side, though, the second person might have some advantage over the first person in learning how to shift gears, because the second person would not have to overcome negative transfer from experience with automatic transmissions. As another example, imagine that you are an explorer visiting a remote island, with the purpose of writing a book about the people that you see there. You bring to this island many forms of prior knowledge that will guide you in learning about these new people. For example, based on your experiences in other places, you would expect to see males and females, younger and older people, shy people and arrogant people. You would also have certain hypotheses at a more abstract level, for example, that the clothes that someone wears may be related to the person's age and gender. (Goodman, 1955, referred to such abstract hypotheses as overhypotheses.) In a way, these biases due to previous knowledge might seem to be undesirable. After all, wouldn't be it be be
Locally Bayesian Learning with Applications to Retrospective Revaluation and Highlighting
- Psychological Review
, 2006
"... A scheme is described for locally Bayesian parameter updating in models structured as successions of component functions. The essential idea is to back-propagate the target data to interior modules, such that an interior component’s target is the input to the next component that maximizes the probab ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
A scheme is described for locally Bayesian parameter updating in models structured as successions of component functions. The essential idea is to back-propagate the target data to interior modules, such that an interior component’s target is the input to the next component that maximizes the probability of the next component’s target. Each layer then does locally Bayesian learning. The approach assumes online trial-by-trial learning. The resulting parameter updating is not globally Bayesian but can better capture human behavior. The approach is implemented for an associative learning model that first maps inputs to attentionally filtered inputs and then maps attentionally filtered inputs to outputs. The Bayesian updating allows the associative model to exhibit retrospective revaluation effects such as backward blocking and unovershadowing, which have been challenging for associative learning models. The back-propagation of target values to attention allows the model to show trial-order effects, including highlighting and differences in magnitude of forward and backward blocking, which have been challenging for Bayesian learning models.

