## A tutorial introduction to Bayesian models of cognitive development

Citations: | 2 - 0 self |

### BibTeX

@MISC{Perfors_atutorial,

author = {Amy Perfors and Joshua B. Tenenbaum and Thomas L. Griffiths and Fei Xu},

title = {A tutorial introduction to Bayesian models of cognitive development},

year = {}

}

### OpenURL

### Abstract

We present an introduction to Bayesian inference as it is used in probabilistic models of cognitive development. Our goal is to provide an intuitive and accessible guide to the what, the how, and the why of the Bayesian approach: what sorts of problems and data the framework is most relevant for, and how and why it may be useful for developmentalists. We emphasize a qualitative understanding of Bayesian inference, but also include information about additional resources for those interested in the cognitive science applications, mathematical foundations, or machine learning details in more depth. In addition, we discuss some important interpretation issues that often arise when evaluating Bayesian models in cognitive science.

### Citations

4140 | Artificial intelligence: a modern approach - Russell, Norvig - 1995 |

1846 | The Foundations of Statistics - Savage - 1972 |

1451 |
Judgment under uncertainty: Heuristics and biases
- Tversky, Kahneman
- 1974
(Show Context)
Citation Context ...After all, people’s everyday reasoning can be said to be many things, but few would aver that it is always optimal, subject as it is to emotions, heuristics, and biases of many different sorts (e.g., =-=Tversky & Kahneman, 1974-=-). However, even if humans are non-optimal thinkers in many ways – and there is no reason to think they are in every way – it is impossible to know this without being able to precisely specify what op... |

1374 | Statistical Decision Theory and Bayesian Analysis - Berger - 1985 |

1280 |
Information Theory, Inference and Learning Algorithms
- MacKay
- 2003
(Show Context)
Citation Context ... height and a width. These might generate quantitatively different prior probabilities but would still give a qualitatively similar tradeoff between complexity and fit. The “Bayesian Ockham’s razor” (=-=MacKay, 2003-=-) thus removes much of the subjectivity inherent in assessing simplicity of an explanation. 4 Note that in any of these generative accounts where hypotheses are generated by a sequence of choices, ear... |

1244 | Causality : models, reasoning, and inference - Pearl - 2000 |

1238 |
Modeling by shortest data description
- Rissanen
- 1978
(Show Context)
Citation Context ...mation-theoretic models in which probabilities of data or hypotheses are replaced by the lengths (in bits) of messages that communicate them to a receiver. The result is known as the “MDL Principle” (=-=Rissanen, 1978-=-), and is related to Kolmogorov complexity (Solomonoff, 1964; Kolmogorov, 1965). The Bayesian version applies given certain assumptions about the randomness of the data relative to the hypotheses and ... |

1127 |
Vision: A computational investigation into the human representation and processing of visual information
- Marr
- 1982
(Show Context)
Citation Context ...t relatively easy to incorporate constraints based on memory, attention, or perception directly into one’s model. Many applications of Bayesian modelling operate on the level of computational theory (=-=Marr, 1982-=-), which seeks to understand cognition based on what its goal is, why that goal would be appropriate, and the constraints on achieving that goal, rather than precisely how it is implemented algorithmi... |

786 |
Principles of categorization
- Rosch
- 1978
(Show Context)
Citation Context ...rshkoff-Stowe, & Samuelson, 2002). Children also acquire other sorts of inductive constraints over the course of development, including the realization that categories may be organized taxonomically (=-=Rosch, 1978-=-), that some verbs occur in alternating patterns and others don’t (e.g., Pinker, 1989) or that comparative orderings should be transitive (Shultz & Vogel, 2004). How can an inductive constraint be lea... |

684 |
The Language of Thought
- Fodor
- 1975
(Show Context)
Citation Context ...glance to be captured by a model that simply does hypothesis testing within an already-specified hypothesis space. The same intuition lies at the core of Fodor’s famous puzzle of concept acquisition (=-=Fodor, 1975-=-, 1981). His essential point is that one cannot learn anything via hypothesis testing because one must possess it in order to test it in the first place. Therefore, except for those concepts that can ... |

643 | Bayesian Learning in Neural Networks - Neal - 1996 |

636 | Connectionsism and cognitive architecture: A critical analysis
- Fodor, Pylyshyn
- 1988
(Show Context)
Citation Context ... is both learned and structured). As a result of this flexibility, traditional critiques of connectionism that focus on their inability to adequately capture compositionality and systematicity (e.g., =-=Fodor & Pylyshyn, 1988-=-) do not apply to Bayesian models. In fact, there are several recent examples of Bayesian models that embrace language-like or compositional representations in domains ranging from causal induction (G... |

624 | Markov Chain Monte Carlo in Practice - Gilks, Richardson, et al. - 1996 |

606 | Basic objects in natural categories
- Rosch, Mervis, et al.
- 1976
(Show Context)
Citation Context ... refers to dogs, mammals, Labradors, canines, or living beings. One solution would be to add another constraint – the presumption that count nouns map preferentially to the basic level in a taxonomy (=-=Rosch, Mervis, Gray, Johnson, & Boyes-Braem, 1976-=-). This preference would allow children to learn names for basic-level categories, but would be counterproductive for every other kind of word. Xu and Tenenbaum (2007b) present a Bayesian model of wor... |

590 | Probabilistic inference using Markov chain Monte Carlo methods - Neal - 1993 |

572 | Probability Theory: The Logic of Science
- Jaynes
- 2007
(Show Context)
Citation Context ...ability theory 7 is not simply a set of ad hoc rules useful for manipulating and evaluating statistical information: it is also the set of unique, consistent rules for conducting plausible inference (=-=Jaynes, 2003-=-). In essence, it is a extension of deductive logic to the case where propositions have degrees of truth or falsity – that is, it is identical to deductive logic if we know all the propositions with 1... |

562 | The Theory of Probability - Jeffreys - 1961 |

555 |
Three approaches to the quantitative definition of information,” Probl
- Kolmogorov
- 1965
(Show Context)
Citation Context ...laced by the lengths (in bits) of messages that communicate them to a receiver. The result is known as the “MDL Principle” (Rissanen, 1978), and is related to Kolmogorov complexity (Solomonoff, 1964; =-=Kolmogorov, 1965-=-). The Bayesian version applies given certain assumptions about the randomness of the data relative to the hypotheses and the hypotheses relative to the prior (Vitànyi & Li, 2000). Both versions apply... |

550 |
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1. Foundations
- Rumelhart, McClelland
- 1986
(Show Context)
Citation Context ...st potentially. This is one reason for the popularity of the Parallel Distributed Process33ing, or connectionist, approach, which was developed as a neurally inspired model of the cognitive process (=-=Rumelhart & McClelland, 1986-=-). Connectionist networks, like the brain, contain many highly interconnected, active processing units (like neurons) that communicate with each other by sending activation or inhibition through their... |

526 | Bayesian Inference in Statistical Analysis - BOX, C - 1973 |

434 |
Conceptual change in childhood
- Carey
- 1985
(Show Context)
Citation Context .... Consider the “theory theory” view of cognitive development. Children’s knowledge about the world is organized into intuitive theories with a structure and function analogous to scientific theories (=-=Carey, 1985-=-; Gopnik & Meltzoff, 1997; Karmiloff-Smith, 1988; Keil, 1989). The theory serves as an abstract framework that guides inductive generalization at more concrete levels of knowledge, by generating a spa... |

421 |
2000: How Children Learn the Meanings of Words
- Bloom
(Show Context)
Citation Context ...of the word, young children are surprisingly adept at acquiring the meanings of words – even when there are only a few examples, and even when there is no systematic negative evidence (Markman, 1989; =-=Bloom, 2000-=-). How do children learn word meanings so well, so quickly? One suggestion is that infants are born equipped with strong prior knowledge about what sort of word meanings are natural (Carey, 1978; Mark... |

389 |
The role of theories in conceptual coherence
- Murphy, Medin
- 1985
(Show Context)
Citation Context ...t framework that guides inductive generalization at more concrete levels of knowledge, by generating a space of hypotheses. Intuitive theories have been posited to underlie real-world categorization (=-=Murphy & Medin, 1985-=-), causal induction (Waldmann, 1996; Griffiths & Tenenbaum, 2009), biological reasoning (Atran, 1995; Inagaki & Hatano, 2002; Medin & Atran, 1999), physical reasoning (McCloskey, 1983) and social inte... |

384 |
Learnability and Cognition: The Acquisition of Argument Structure
- Pinker
- 1989
(Show Context)
Citation Context ...nstraints over the course of development, including the realization that categories may be organized taxonomically (Rosch, 1978), that some verbs occur in alternating patterns and others don’t (e.g., =-=Pinker, 1989-=-) or that comparative orderings should be transitive (Shultz & Vogel, 2004). How can an inductive constraint be learned, and how might a Bayesian framework explain this? Is it possible to acquire an i... |

380 |
On the approximate realization of continuous mapping by neural networks
- Funahashi
- 1989
(Show Context)
Citation Context ...lementations of Bayesian inference (e.g., Funahashi, 1998; McClelland, 1998; MacKay, 2003), corresponding to a computational-level model whose hypothesis space is a set of continuous functions (e.g., =-=Funahashi, 1989-=-; Stinchcombe & White, 1989). This is a large space, but Bayesian inference can entertain hypothesis spaces that are equivalently large. Does this mean that there is no difference between Bayesian mod... |

333 |
Concepts, kinds, and cognitive development
- Keil
- 1989
(Show Context)
Citation Context ... Children’s knowledge about the world is organized into intuitive theories with a structure and function analogous to scientific theories (Carey, 1985; Gopnik & Meltzoff, 1997; Karmiloff-Smith, 1988; =-=Keil, 1989-=-). The theory serves as an abstract framework that guides inductive generalization at more concrete levels of knowledge, by generating a space of hypotheses. Intuitive theories have been posited to un... |

257 | From covariation to causation: a causal power theory - Cheng - 1997 |

249 | Fact, Fiction, and Forecast - Goodman - 1954 |

239 | The adaptive nature of human categorization - Anderson - 1991 |

233 |
A Child’s Theory of Mind
- Wellman
- 1990
(Show Context)
Citation Context ...996; Griffiths & Tenenbaum, 2009), biological reasoning (Atran, 1995; Inagaki & Hatano, 2002; Medin & Atran, 1999), physical reasoning (McCloskey, 1983) and social interaction (Nichols & Stich, 2003; =-=Wellman, 1990-=-). For instance, an intuitive theory of mind generates hypotheses about how a specific agent’s behavior might be explained in particular situations – candidate explanations framed in terms of mental s... |

226 |
Words, thoughts and theories
- Gopnik, Meltzoff
- 1997
(Show Context)
Citation Context ...e “theory theory” view of cognitive development. Children’s knowledge about the world is organized into intuitive theories with a structure and function analogous to scientific theories (Carey, 1985; =-=Gopnik & Meltzoff, 1997-=-; Karmiloff-Smith, 1988; Keil, 1989). The theory serves as an abstract framework that guides inductive generalization at more concrete levels of knowledge, by generating a space of hypotheses. Intuiti... |

215 | Data Analysis: A Bayesian Tutorial - Sivia, Skilling |

193 | Hierarchical Bayesian inference in the visual cortex
- Lee, Mumford
- 2003
(Show Context)
Citation Context ...w a resourcelimited learner might reason. Likewise, much work in computational neuroscience focuses on the implementational level, but is Bayesian in character (e.g., Pouget, Dayan, & Zemel, 2003; T. =-=Lee & Mumford, 2003-=-; Zemel, Huys, Natarajan, & Dayan, 2005; Ma, Beck, Latham, & Pouget, 2006; Doya, Ishii, Pouget, & Rao, 2007; Rao, 2007). We discuss the implications of this work in the next section. 5.2 Biological pl... |

192 |
Formal Principles of Language Acquisition
- Wexler, Culicover
- 1980
(Show Context)
Citation Context ...eses as the amount of observed data increases. In language acquisition, a traditional solution to the problem of constraining generalizing in the absence of negative evidence is the Subset Principle (=-=Wexler & Culicover, 1980-=-; Berwick, 1986): learners should choose the most specific grammar consistent with the observed data. In scientific theorizing, the classical form of Ockham’s Razor speaks similarly: entities should n... |

178 | A theory of causal learning in children: Causal maps and Bayes nets - Gopnik, Glymour, et al. - 2004 |

177 |
Probability, frequency and reasonable expectation
- Cox
- 1946
(Show Context)
Citation Context ...bility should be the same regardless of how you got there; etc. The basic axioms and theorems of probability theory, including Bayes’ Rule, emerge when these desiderata are formalized mathematically (=-=Cox, 1946-=-, 1961), and correspond to common-sense reasoning and the scientific method. Put another way, Bayesian probability theory is “optimal inference” in the sense that a non-Bayesian reasoner attempting to... |

153 | Probable networks and plausible predictions — a review of practical Bayesian methods for supervised neural networks - MacKay - 1995 |

151 | Introduction to Monte Carlo methods - MacKay - 1998 |

144 | The importance of shape in early lexical learning
- Smith, Jones
- 1988
(Show Context)
Citation Context ... A wealth of research indicates that people are capable of acquiring this sort of knowl16edge, both rapidly in the lab (Nosofsky, 1986; Perfors & Tenenbaum, 2009) and over the course of development (=-=Landau, Smith, & Jones, 1988-=-; L. Smith, Jones, Landau, Gershkoff-Stowe, & Samuelson, 2002). Children also acquire other sorts of inductive constraints over the course of development, including the realization that categories may... |

137 | Models of ecological rationality: The recognition heuristic
- Goldstein, Gigerenzer
- 2002
(Show Context)
Citation Context ...its step-by-step computations map onto anything like the algorithms used by current Bayesian models. Just as optimal decision-making can be approximated under certain conditions by simple heuristics (=-=Goldstein & Gigerenzer, 2002-=-), it may be possible that the optimal reasoning described by Bayesian models can be approximated by simple algorithms that look nothing like Bayesian reasoning in their mechanics. If so – in fact, ev... |

130 | Overregularization in language acquisition - Marcus, Pinker, et al. - 1992 |

119 |
The origins of concepts
- Carey
- 2000
(Show Context)
Citation Context ...nce of innate constraints, including the whole object constraint (Markman, 1990) core systems of object representation, psychology, physics, and biology (Carey & Spelke, 1996; Spelke & Kinzler, 2007; =-=Carey, 2009-=-), and so on. Given that they appear so early in development, it seems sensible to postulate that these constraints are innate rather than learned. However, it may be possible for inductive constraint... |

119 | Word learning as Bayesian inference
- Tenenbaum, Xu
- 2000
(Show Context)
Citation Context ...i.e., as examples of the concept). If people think the data were generated in some other way – for instance, another learner was asking about those particular pictures – then their inferences change (=-=Xu & Tenenbaum, 2007-=-a). In this case, the lack of non-Labradors no longer reflects something the experimenter can control; though it is a coincidence, it is not a suspicious one. The data are the same, but the inference ... |

116 |
The child as word learner
- Carey
- 1978
(Show Context)
Citation Context ...1989; Bloom, 2000). How do children learn word meanings so well, so quickly? One suggestion is that infants are born equipped with strong prior knowledge about what sort of word meanings are natural (=-=Carey, 1978-=-; Markman, 1989), which constrains the possible hypotheses considered. For instance, even if a child is able to rule out part-objects as possible 10Figure 2: Schematic view of hypotheses about possib... |

103 | Minimization of Boolean complexity in human concept learning - Feldman - 2000 |

103 | Structure and strength in causal induction - Griffiths, Tenenbaum - 2005 |

100 |
Categorization and naming in children
- Markman
- 1989
(Show Context)
Citation Context ...ble extensions of the word, young children are surprisingly adept at acquiring the meanings of words – even when there are only a few examples, and even when there is no systematic negative evidence (=-=Markman, 1989-=-; Bloom, 2000). How do children learn word meanings so well, so quickly? One suggestion is that infants are born equipped with strong prior knowledge about what sort of word meanings are natural (Care... |

98 | A philosophical essay on probabilities - Laplace - 1951 |

98 | Inferring causal networks from observations and interventions - Steyvers, Tenenbaum, et al. - 2003 |

96 |
Bayesian inference with probabilistic population codes
- Ma, Beck, et al.
- 2006
(Show Context)
Citation Context ...k in computational neuroscience focuses on the implementational level, but is Bayesian in character (e.g., Pouget, Dayan, & Zemel, 2003; T. Lee & Mumford, 2003; Zemel, Huys, Natarajan, & Dayan, 2005; =-=Ma, Beck, Latham, & Pouget, 2006-=-; Doya, Ishii, Pouget, & Rao, 2007; Rao, 2007). We discuss the implications of this work in the next section. 5.2 Biological plausibility Because cognitive scientists are ultimately interested in unde... |