Results 1  10
of
20
Theorybased causal induction
 In
, 2003
"... Inducing causal relationships from observations is a classic problem in scientific inference, statistics, and machine learning. It is also a central part of human learning, and a task that people perform remarkably well given its notorious difficulties. People can learn causal structure in various s ..."
Abstract

Cited by 37 (15 self)
 Add to MetaCart
(Show Context)
Inducing causal relationships from observations is a classic problem in scientific inference, statistics, and machine learning. It is also a central part of human learning, and a task that people perform remarkably well given its notorious difficulties. People can learn causal structure in various settings, from diverse forms of data: observations of the cooccurrence frequencies between causes and effects, interactions between physical objects, or patterns of spatial or temporal coincidence. These different modes of learning are typically thought of as distinct psychological processes and are rarely studied together, but at heart they present the same inductive challenge—identifying the unobservable mechanisms that generate observable relations between variables, objects, or events, given only sparse and limited data. We present a computationallevel analysis of this inductive problem and a framework for its solution, which allows us to model all these forms of causal learning in a common language. In this framework, causal induction is the product of domaingeneral statistical inference guided by domainspecific prior knowledge, in the form of an abstract causal theory. We identify 3 key aspects of abstract prior knowledge—the ontology of entities, properties, and relations that organizes a domain; the plausibility of specific causal relationships; and the functional form of those relationships—and show how they provide the constraints that people need to induce useful causal models from sparse data.
Rational approximations to rational models: Alternative algorithms for category learning
"... Rational models of cognition typically consider the abstract computational problems posed by the environment, assuming that people are capable of optimally solving those problems. This differs from more traditional formal models of cognition, which focus on the psychological processes responsible fo ..."
Abstract

Cited by 26 (6 self)
 Add to MetaCart
(Show Context)
Rational models of cognition typically consider the abstract computational problems posed by the environment, assuming that people are capable of optimally solving those problems. This differs from more traditional formal models of cognition, which focus on the psychological processes responsible for behavior. A basic challenge for rational models is thus explaining how optimal solutions can be approximated by psychological processes. We outline a general strategy for answering this question, namely to explore the psychological plausibility of approximation algorithms developed in computer science and statistics. In particular, we argue that Monte Carlo methods provide a source of “rational process models” that connect optimal solutions to psychological processes. We support this argument through a detailed example, applying this approach to Anderson’s (1990, 1991) Rational Model of Categorization (RMC), which involves a particularly challenging computational problem. Drawing on a connection between the RMC and ideas from nonparametric Bayesian statistics, we propose two alternative algorithms for approximate inference in this model. The algorithms we consider include Gibbs sampling, a procedure
Modeling Human Performance in Statistical Word Segmentation
"... What mechanisms support the ability of human infants, adults, and other primates to identify words from fluent speech using distributional regularities? In order to better characterize this ability, we collected data from adults in an artificial language segmentation task similar to Saffran, Newport ..."
Abstract

Cited by 22 (7 self)
 Add to MetaCart
(Show Context)
What mechanisms support the ability of human infants, adults, and other primates to identify words from fluent speech using distributional regularities? In order to better characterize this ability, we collected data from adults in an artificial language segmentation task similar to Saffran, Newport, and Aslin (1996) in which the length of sentences was systematically varied between groups of participants. We then compared the fit of a variety of computational models— including simple statistical models of transitional probability and mutual information, a clustering model based on mutual information by Swingley (2005), PARSER (Perruchet & Vintner, 1998), and a Bayesian model. We found that while all models were able to successfully complete the task, fit to the human data varied considerably, with the Bayesian model achieving the highest correlation with our results.
One and done? Optimal decisions from very few samples
 Cognitive Science Society
, 2009
"... In many situations human behavior approximates that of a Bayesian ideal observer, suggesting that, at some level, cognition can be described as Bayesian inference. However, a number of findings have highlighted an intriguing mismatch between human behavior and that predicted by Bayesian inference: p ..."
Abstract

Cited by 18 (6 self)
 Add to MetaCart
(Show Context)
In many situations human behavior approximates that of a Bayesian ideal observer, suggesting that, at some level, cognition can be described as Bayesian inference. However, a number of findings have highlighted an intriguing mismatch between human behavior and that predicted by Bayesian inference: people often appear to make judgments based on a few samples from a probability distribution, rather than the full distribution. Although samplebased approximations are a common implementation of Bayesian inference, the very limited number of samples used by humans seems to be insufficient to approximate the required probability distributions. Here we consider this discrepancy in the broader framework of statistical decision theory, and ask: if people were making decisions based on samples, but samples were costly, how many samples should people use? We find that under reasonable assumptions about how long it takes to produce a sample, locally suboptimal decisions based on few samples are globally optimal. These results reconcile a large body of work showing sampling, or probabilitymatching, behavior with the hypothesis that human cognition is well described as Bayesian inference, and suggest promising future directions for studies of resourceconstrained cognition.
A Probabilistic Model of Syntactic and Semantic Acquisition from ChildDirected Utterances and their Meanings
"... This paper presents an incremental probabilistic learner that models the acquistion of syntax and semantics from a corpus of childdirected utterances paired with possible representations of their meanings. These meaning representations approximate the contextual input available to the child; they d ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
This paper presents an incremental probabilistic learner that models the acquistion of syntax and semantics from a corpus of childdirected utterances paired with possible representations of their meanings. These meaning representations approximate the contextual input available to the child; they do not specify the meanings of individual words or syntactic derivations. The learner then has to infer the meanings and syntactic properties of the words in the input along with a parsing model. We use the CCG grammatical framework and train a nonparametric Bayesian model of parse structure with online variational Bayesian expectation maximization. When tested on utterances from the CHILDES corpus, our learner outperforms a stateoftheart semantic parser. In addition, it models such aspects of child acquisition as “fast mapping,” while also countering previous criticisms of statistical syntactic learners. 1
A rational model of eye movement control in reading
 In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics ACL
"... A number of results in the study of realtime sentence comprehension have been explained by computational models as resulting from the rational use of probabilistic linguistic information. Many times, these hypotheses have been tested in reading by linking predictions about relative word difficulty t ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
A number of results in the study of realtime sentence comprehension have been explained by computational models as resulting from the rational use of probabilistic linguistic information. Many times, these hypotheses have been tested in reading by linking predictions about relative word difficulty to wordaggregated eye tracking measures such as gopast time. In this paper, we extend these results by asking to what extent reading is wellmodeled as rational behavior at a finer level of analysis, predicting not aggregate measures, but the duration and location of each fixation. We present a new rational model of eye movement control in reading, the central assumption of which is that eye movement decisions are made to obtain noisy visual information as the reader performs Bayesian inference on the identities of the words in the sentence. As a case study, we present two simulations demonstrating that the model gives a rational explanation for betweenword regressions. 1
Cube Summing, Approximate Inference with NonLocal Features, and Dynamic Programming without Semirings
"... We introduce cube summing, a technique that permits dynamic programming algorithms for summing over structures (like the forward and inside algorithms) to be extended with nonlocal features that violate the classical structural independence assumptions. It is inspired by cube pruning (Chiang, 2007; ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
We introduce cube summing, a technique that permits dynamic programming algorithms for summing over structures (like the forward and inside algorithms) to be extended with nonlocal features that violate the classical structural independence assumptions. It is inspired by cube pruning (Chiang, 2007; Huang and Chiang, 2007) in its computation of nonlocal features dynamically using scored kbest lists, but also maintains additional residual quantities used in calculating approximate marginals. When restricted to local features, cube summing reduces to a novel semiring (kbest+residual) that generalizes many of the semirings of Goodman (1999). When nonlocal features are included, cube summing does not reduce to any semiring, but is compatible with generic techniques for solving dynamic programming equations. 1
A Simple Sequential Algorithm for Approximating Bayesian Inference
"... People can apparently make surprisingly sophisticated inductive inferences, despite the fact that there are constraints on cognitive resources that would make performing exact Bayesian inference computationally intractable. What algorithms could they be using to make this possible? We show that a si ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
People can apparently make surprisingly sophisticated inductive inferences, despite the fact that there are constraints on cognitive resources that would make performing exact Bayesian inference computationally intractable. What algorithms could they be using to make this possible? We show that a simple sequential algorithm, WinStay, LoseShift (WSLS), can be used to approximate Bayesian inference, and is consistent with human behavior on a causal learning task. This algorithm provides a new way to understand people’s judgments and a new efficient method for performing Bayesian inference.
Seeking Confirmation Is Rational for Deterministic Hypotheses
"... The tendency to test outcomes that are predicted by our current theory (the confirmation bias) is one of the bestknown biases of human decision making. We prove that the confirmation bias is an optimal strategy for testing hypotheses when those hypotheses are deterministic, each making a single pre ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
The tendency to test outcomes that are predicted by our current theory (the confirmation bias) is one of the bestknown biases of human decision making. We prove that the confirmation bias is an optimal strategy for testing hypotheses when those hypotheses are deterministic, each making a single prediction about the next event in a sequence. Our proof applies for two normative standards commonly used for evaluating hypothesis testing: maximizing expected information gain and maximizing the probability of falsifying the current hypothesis. This analysis rests on two assumptions: (a) that people predict the next event in a sequence in a way that is consistent with Bayesian inference; and (b) when testing hypotheses, people test the hypothesis to which they assign highest posterior probability. We present four behavioral experiments that support these assumptions, showing that a simple Bayesian model can capture people’s predictions about numerical sequences (Experiments 1 and 2), and that we can alter the hypotheses that people choose to test by manipulating the prior probability of those hypotheses (Experiments 3 and 4).
Recursion in grammar . . .
"... Recursion in grammar and performance In the last 50 years of cognitive science, linguistic theory has proposed more and more articulated structures, while computer science has shown that simpler, flatter structures are more easily processed. If we are interested in adequate models of human linguisti ..."
Abstract
 Add to MetaCart
Recursion in grammar and performance In the last 50 years of cognitive science, linguistic theory has proposed more and more articulated structures, while computer science has shown that simpler, flatter structures are more easily processed. If we are interested in adequate models of human linguistic abilities, models that explain the very rapid and accurate human recognition and production of ordinary fluent speech, it seems we need to come to some appropriate understandingoftherelationshipbetweentheseapparently opposing pressures for more and less structure. Here we show how the apparent conflict disappears when it is considered more carefully. Even when we regard the linguists ’ project as a psychological one, there is no pressure for linguists to abandon their rather deep structures in order to account for our easy production and recognition of fluent speech. The deeper, more recursive structures reflect insights into similarities among linguistic constituents and operations, but a processor can compute exactly these structures without the extra effort that deeper analyses might seem to require. To show how this works, we