## Bayesian models of cognition

Citations: | 25 - 1 self |

### BibTeX

@MISC{Griffiths_bayesianmodels,

author = {Thomas L. Griffiths and Charles Kemp and Joshua B. Tenenbaum},

title = {Bayesian models of cognition},

year = {}

}

### OpenURL

### Abstract

For over 200 years, philosophers and mathematicians have been using probability theory to describe human cognition. While the theory of probabilities was first developed as a means of analyzing games of chance, it quickly took on a larger and deeper significance as a formal account of how rational agents should reason in situations of uncertainty

### Citations

7413 |
Probabilistic reasoning in intelligent systems: Networks of plausible inference
- Pearl
- 1988
(Show Context)
Citation Context ...e edges indicate the direction of a dependency, the result is a directed graphical model. Our focus here will be on directed graphical models, which are also known as Bayesian networks or Bayes nets (=-=Pearl, 1988-=-). Bayesian networks can often be given a causal interpretation, where an edge between two nodes indicates that one node is a direct cause of the other, which makes them particularly appealing for mod... |

4097 |
Artificial Intelligence: A Modern Approach. Upper Saddle
- Russell, Norvig
- 1995
(Show Context)
Citation Context ... statistics and machine learning. Graphical models can take on a different interpretation in artificial intelligence, when the variables of interest represent the truth value of certain propositions (=-=Russell & Norvig, 2002-=-). For example, imagine that a friend of yours claims to possess psychic powers – in particular, the power of psychokinesis. He proposes to demonstrate these powers by flipping a coin, and influencing... |

3990 |
Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images
- Geman, Geman
- 1984
(Show Context)
Citation Context ...to sample from the conditional probability distribution for each variable in a set given the remaining variables, p(xj|x1,...,xj−1,xj+1,... ,xn), we can use another popular algorithm, Gibbs sampling (=-=Geman & Geman, 1984-=-; Gilks et al., 1996), which is known in statistical physics as the heatbath algorithm (Newman & Barkema, 1999). The Gibbs sampler for a target distribution p(x) is the Markov chain defined by drawing... |

2608 | Latent dirichlet allocation
- Blei, Ng, et al.
- 2003
(Show Context)
Citation Context ... explore alternative representations for the meaning of words. One such representation is exploited in topic models, in which words are represented in terms of the set of topics to which they belong (=-=Blei, Ng, & Jordan, 2003-=-; Hofmann, 1999; Griffiths & Steyvers, 2004). Each topic is a probability distribution over words, and the content of the topic is reflected in the words to which it assigns high probability. For exam... |

2083 |
Pattern Classification
- Duda, Hart, et al.
- 2000
(Show Context)
Citation Context ...generated from one of several probability distributions (or “categories”) over the space of possible objects, and the goal is to infer which distribution is most likely to have generated that object (=-=Duda, Hart, & Stork, 2000-=-). In rational probabilistic terms, these methods differ only in how these category-specific probability distributions are represented and estimated (Ashby & Alfonso-Reese, 1995; Nosofsky, 1998). Fina... |

1436 | Bayesian Data Analysis
- Gelman, Carlin, et al.
- 1995
(Show Context)
Citation Context ...between them. Here we provide a more comprehensive treatment of the problem of learning prior distributions, and show how this problem can be addressed using hierarchical Bayesian models (Good, 1980; =-=Gelman, Carlin, Stern, & Rubin, 1995-=-). Although we will focus on just two applications, the hierarchical Bayesian approach has been applied to several other cognitive problems (Lee, 2006; Tenenbaum et al., 2006; Mansinghka et al., 2006)... |

1272 |
Information Theory, Inference, and Learning Algorithms
- MacKay
- 2003
(Show Context)
Citation Context ... of Bayesian parameter estimation with informative “smoothing” priors have been applied to a number of cognitively interesting machine-learning problems, such as Bayesian learning in neural networks (=-=Mackay, 2003-=-). Our analysis of coin flipping with informative priors has two features of more general interest. First, the prior and posterior are specified using distributions of the same form (both being beta d... |

1261 | A solution to Plato’s problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge
- Landauer, Dumais
- 1997
(Show Context)
Citation Context ...onships. (b) In a semantic space, words are represented as points, and proximity indicates semantic association. These are the first two dimensions of a solution produced by Latent Semantic Analysis (=-=Landauer & Dumais, 1997-=-). The black dot is the origin. (c) In the topic model, words are represented as belonging to a set of probabilistic topics. The matrix shown on the left indicates the probability of each word under e... |

1235 |
Causality: Models, reasoning, and inference
- Pearl
- 2000
(Show Context)
Citation Context ...ecent work has explored the consequences of augmenting directed graphical models with a stronger assumption about the relationships indicated by edges: that they indicate direct causal relationships (=-=Pearl, 2000-=-; Spirtes et al., 1993). This assumption allows causal graphical models to represent not just the probabilities of events that one might observe, but also the probabilities of events that one can prod... |

1135 |
Bayesian Theory
- Bernardo, Smith
- 1994
(Show Context)
Citation Context ..., as if both the real and virtual examples had been observed in the same data set. These two properties are not accidental: they are characteristic of a class of priors called conjugate priors (e.g., =-=Bernardo & Smith, 1994-=-). The likelihood determines whether a conjugate prior exists for a given problem, and the form that the prior will take. The results we have given in this section exploit the fact that the beta distr... |

1118 | Bayes factor
- Kass, Raftery
- 1995
(Show Context)
Citation Context ...f the methods discussed so far. Hypotheses that differ in their complexity can be compared directly using Bayes’ rule, once they are reduced to probability distributions over the observable data (see =-=Kass & Raftery, 1995-=-). To illustrate this principle, assume that we have two hypotheses: h0 is the hypothesis that θ = 0.5, and h1 is the hypothesis that θ takes a value drawn from a uniform distribution on [0,1]. If we ... |

891 | A tutorial on learning with Bayesian networks - Heckerman - 1998 |

888 |
MRBAYES: Bayesian inference of phylogenetic trees
- Huelsenbeck, Ronquist
- 2001
(Show Context)
Citation Context ...ired by biological evolution: the property is randomly chosen to be on or off at the root of the tree, and then has some small probability of switching state at each point of each branch of the tree (=-=Huelsenbeck & Ronquist, 2001-=-; Kemp, Perfors, & Tenenbaum, 2004). For inferences about generic biological properties, the problem of acquiring prior knowledge has now been reduced to the problem of finding an appropriate tree S. ... |

863 | Probabilistic latent semantic indexing
- Hofmann
- 1999
(Show Context)
Citation Context ...sentations for the meaning of words. One such representation is exploited in topic models, in which words are represented in terms of the set of topics to which they belong (Blei, Ng, & Jordan, 2003; =-=Hofmann, 1999-=-; Griffiths & Steyvers, 2004). Each topic is a probability distribution over words, and the content of the topic is reflected in the words to which it assigns high probability. For example, high proba... |

797 |
Foundations of statistical natural language processing
- Manning, Schütze
- 1999
(Show Context)
Citation Context ... syntactic constraints on the structure of sentences), and φ encodes the probability that each word will be generated from a particular syntactic class (e.g., Charniak, 1993; Jurafsky & Martin, 2000; =-=Manning & Schütze, 1999-=-). The dependencies among the latent variables induce dependencies among the observed variables – in the case of language, the constraints on transitions between syntactic classes impose constraints o... |

768 |
Vision
- Marr
- 1982
(Show Context)
Citation Context ...onal models of human cognition then becomes a process of considering how best to characterize the computational problems that people face and the logic by which those computations can be carried out (=-=Marr, 1982-=-). This focus implies certain limits on the phenomena that are valuable to study within a Bayesian paradigm. Some phenomena will surely be more satisfying to address at an algorithmic or neurocomputat... |

685 |
Finding scientific topics
- Griffiths, Steyvers
- 2004
(Show Context)
Citation Context ...the meaning of words. One such representation is exploited in topic models, in which words are represented in terms of the set of topics to which they belong (Blei, Ng, & Jordan, 2003; Hofmann, 1999; =-=Griffiths & Steyvers, 2004-=-). Each topic is a probability distribution over words, and the content of the topic is reflected in the words to which it assigns high probability. For example, high probabilities for woods and strea... |

612 |
Markov Chain Monte Carlo in Practice
- Gilks, Richardson, et al.
- 1996
(Show Context)
Citation Context ...atistical physics (Metropolis, Rosenbluth, Rosenbluth, Teller, & Teller, 1953), and are now widely used across physics, statistics, machine learning, and related fields (e.g., Newman & Barkema, 1999; =-=Gilks, Richardson, & Spiegelhalter, 1996-=-; Mackay, 2003; Neal, 1993). As the name suggests, Markov chain Monte Carlo is based upon the theory of Markov chains – sequences of random variables in which each variable is conditionally independen... |

589 | Probabilistic inference using Markov chain Monte Carlo methods (Tech
- Neal
- 1993
(Show Context)
Citation Context ...utomatically generate samples from most probability distributions. There are a number of ways to address this problem, including methods such as rejection sampling and importance sampling (see, e.g., =-=Neal, 1993-=-). One of the most flexible methods for generating samples from a probability distribution is Markov chain Monte Carlo (MCMC), which can be used to construct samplers for arbitrary probability distrib... |

582 |
Statistical Language Learning
- Charniak
- 1996
(Show Context)
Citation Context ...ll appear after another (capturing simple syntactic constraints on the structure of sentences), and φ encodes the probability that each word will be generated from a particular syntactic class (e.g., =-=Charniak, 1993-=-; Jurafsky & Martin, 2000; Manning & Schütze, 1999). The dependencies among the latent variables induce dependencies among the observed variables – in the case of language, the constraints on transiti... |

530 |
Causation, Prediction and Search
- Spirtes, Glymour, et al.
- 2001
(Show Context)
Citation Context ...endencies between the variables in a fashion consistent with the Markov condition: conditioned on its parents, each variable is independent of all other variables except its descendants (Pearl, 1988; =-=Spirtes, Glymour, & Schienes, 1993-=-). As a consequence of the Markov condition, any Bayesian network specifies a canonical factorization of a full joint probability distribution into the product of local conditional distributions, one ... |

503 |
Context theory of classification learning
- Medin, Schaffer
- 1978
(Show Context)
Citation Context ...isting models. We will discuss some of these relationships in this chapter, but there are many other cases. For example, prototypesBAYESIAN MODELS 4 and exemplar models of categorization (Reed, 1972; =-=Medin & Schaffer, 1978-=-; Nosofsky, 1986) can both be seen as rational solutions to a standard classification task in statistical pattern recognition: an object is generated from one of several probability distributions (or ... |

484 | Attention, similarity and the identificationcategorization relationship
- Nosofsky
- 1986
(Show Context)
Citation Context ...iscuss some of these relationships in this chapter, but there are many other cases. For example, prototypesBAYESIAN MODELS 4 and exemplar models of categorization (Reed, 1972; Medin & Schaffer, 1978; =-=Nosofsky, 1986-=-) can both be seen as rational solutions to a standard classification task in statistical pattern recognition: an object is generated from one of several probability distributions (or “categories”) ov... |

481 |
The adaptive character of thought
- Anderson
- 1990
(Show Context)
Citation Context ...ut human processing mechanisms that are no longer needed when we assume that cognition is an approximately optimal response to the uncertainty and structure present in natural tasks and environments (=-=Anderson, 1990-=-). Finding effective computational models of human cognition then becomes a process of considering how best to characterize the computational problems that people face and the logic by which those com... |

470 |
Pattern recognition and machine learning
- Bishop
- 2006
(Show Context)
Citation Context ...ncies among variables. In general, ideal Bayesian computations can only be approximated for these complex models, and many methods for approximate Bayesian inference and learning have been developed (=-=Bishop, 2006-=-; Mackay, 2003). In this section we introduce the Markov chain Monte Carlo approach, a general-purpose toolkit for inferring the values of latent variables, estimating parameters and learning model st... |

458 | A learning algorithm for Boltzmann machines
- Ackley, Hinton, et al.
- 1985
(Show Context)
Citation Context ...on, then the result is an undirected graphical model. Undirected graphical models have long been used in statistical physics, and many probabilistic neural network models, such as Boltzmann machines (=-=Ackley, Hinton, & Sejnowski, 1985-=-), can be interpreted as models of this kind. If the edges indicate the direction of a dependency, the result is a directed graphical model. Our focus here will be on directed graphical models, which ... |

423 |
Conceptual change in childhood
- Carey
- 1985
(Show Context)
Citation Context ...th stronger and more structured prior distributions than were needed above to explain elemental causal induction. This prior knowledge can be usefully described in terms of intuitive domain theories (=-=Carey, 1985-=-; Wellman & Gelman, 1992; Gopnik & Meltzoff, 1997), systems of abstract concepts and principles that specify the kinds of entities that can exist in a domain, their properties and possible states, and... |

400 | Markov chain sampling methods for Dirichlet process mixture models - Neal - 2000 |

347 |
Markov Chains
- Norris
- 1997
(Show Context)
Citation Context ... the flips could be generated in a Markov chain, a sequence of random variables in which each variable is independent of all of its predecessors given the variable that immediately precedes it (e.g., =-=Norris, 1997-=-). Using a Markov chain structure, we could represent a hypothesis space of coins that are particularly biased towards alternating or i=1sBAYESIAN MODELS 15 maintaining their last outcomes, letting th... |

289 |
Retrieval time from semantic memory
- Collins, Quillian
- 1969
(Show Context)
Citation Context ...ple: Inferring topics from text Several computational models have been proposed to account for the large-scale structure of semantic memory, including semantic networks (e.g., Collins & Loftus, 1975; =-=Collins & Quillian, 1969-=-) and semantic spaces (e.g., Landauer & Dumais, 1997; Lund & Burgess, 1996). These approaches embody different assumptions about the way that words are represented. In semantic networks, words are nod... |

270 |
Producing highdimensional semantic spaces from lexical co-occurrence
- Lund, Burgess
- 1996
(Show Context)
Citation Context ...d to account for the large-scale structure of semantic memory, including semantic networks (e.g., Collins & Loftus, 1975; Collins & Quillian, 1969) and semantic spaces (e.g., Landauer & Dumais, 1997; =-=Lund & Burgess, 1996-=-). These approaches embody different assumptions about the way that words are represented. In semantic networks, words are nodes in a graph where edges indicate semantic relationships, as shown in Fig... |

250 | From covariation to causation: A causal power theory - Cheng - 1997 |

240 |
A Spreading Activation Theory of Semantic Processing
- Collins, Loftus
- 1975
(Show Context)
Citation Context ...text documents. 5.1 Example: Inferring topics from text Several computational models have been proposed to account for the large-scale structure of semantic memory, including semantic networks (e.g., =-=Collins & Loftus, 1975-=-; Collins & Quillian, 1969) and semantic spaces (e.g., Landauer & Dumais, 1997; Lund & Burgess, 1996). These approaches embody different assumptions about the way that words are represented. In semant... |

223 |
Words, thoughts, and theories
- Gopnik, Meltzoff
- 1997
(Show Context)
Citation Context ... distributions than were needed above to explain elemental causal induction. This prior knowledge can be usefully described in terms of intuitive domain theories (Carey, 1985; Wellman & Gelman, 1992; =-=Gopnik & Meltzoff, 1997-=-), systems of abstract concepts and principles that specify the kinds of entities that can exist in a domain, their properties and possible states, and the kinds of causal relations that can exist bet... |

217 | Being bayesian about network structure: A bayesian approach tostructure discovery in bayesian networks
- Friedman, Koller
(Show Context)
Citation Context ...le A causes variable B in a complex causal network of unknown structure, by computing the probability that a link A → B exists in a high-probability sample from the posterior over network structures (=-=Friedman & Koller, 2000-=-). Monte Carlo methods were originally developed primarily for approximating these sophisticated averages – that is, approximating a sum over all of the values taken on by a random variable with a sum... |

207 | Hierarchical topic models and the nested chinese restaurant process
- Blei, Griffiths, et al.
- 2004
(Show Context)
Citation Context ...rning focuses on building more complex models that maintain the benefits of working with conjugate priors, building on the techniques for model selection that we discuss next (e.g., Neal, 1992, 1998; =-=Blei, Griffiths, Jordan, & Tenenbaum, 2004-=-; Griffiths & Ghahramani, 2005). 2.4 Model selection Whether there were a finite number or not, the hypotheses that we have considered so far were relatively homogeneous, each offering a single value ... |

191 |
Connectionist learning of belief networks
- Neal
- 1991
(Show Context)
Citation Context ...cs and machine learning focuses on building more complex models that maintain the benefits of working with conjugate priors, building on the techniques for model selection that we discuss next (e.g., =-=Neal, 1992-=-, 1998; Blei, Griffiths, Jordan, & Tenenbaum, 2004; Griffiths & Ghahramani, 2005). 2.4 Model selection Whether there were a finite number or not, the hypotheses that we have considered so far were rel... |

189 | Infinite latent feature models and the indian buffet process
- Griffiths, Ghahramani
- 2005
(Show Context)
Citation Context ...s that maintain the benefits of working with conjugate priors, building on the techniques for model selection that we discuss next (e.g., Neal, 1992, 1998; Blei, Griffiths, Jordan, & Tenenbaum, 2004; =-=Griffiths & Ghahramani, 2005-=-). 2.4 Model selection Whether there were a finite number or not, the hypotheses that we have considered so far were relatively homogeneous, each offering a single value for the parameter θ characteri... |

185 | Category-based induction
- Osherson, Wilkie, et al.
- 1990
(Show Context)
Citation Context ...y, and decide how to extend the property to the remaining members of the domain. For instance, given that gorillas carry enzyme X132, how likely is it that chimps also carry this enzyme? (Rips, 1975; =-=Osherson, Smith, Wilkie, Lopez, & Shafir, 1990-=-). For our purposes, inductive problems like these are interesting because they rely on relatively rich prior knowledge, and because this prior knowledge often appears to be learned. For example, huma... |

167 |
The University of South Florida word association, rhyme, and word fragment norms. http://w3.usf.edu/FreeAssociation
- Nelson, McEvoy, et al.
- 1998
(Show Context)
Citation Context ...ted with buckle, but asteroid and buckle have little association. LSA thus has trouble representing these associations. Out of approximately 4500 words in a large-scale set of word association norms (=-=Nelson, McEvoy, & Schreiber, 1998-=-), LSA judges that belt is the 13th most similar word to asteroid, that buckle is 5 When computing quantities such as P(w2|w1), as given by Equation 37, we need a way of finding the parameters φ that ... |

166 |
Pattern recognition and categorization
- Reed
- 1972
(Show Context)
Citation Context ...king with existing models. We will discuss some of these relationships in this chapter, but there are many other cases. For example, prototypesBAYESIAN MODELS 4 and exemplar models of categorization (=-=Reed, 1972-=-; Medin & Schaffer, 1978; Nosofsky, 1986) can both be seen as rational solutions to a standard classification task in statistical pattern recognition: an object is generated from one of several probab... |

151 | Bayesian color constancy
- Brainard, Freeman
- 1997
(Show Context)
Citation Context ...rence is possible. At best we can make a reasonable guess, based on some expectations about which values of a and b are more likely a priori. This inference can be formalized in a Bayesian framework (=-=Brainard & Freeman, 1997-=-), and it can be solved reasonably well given prior probability distributions for natural surface reflectances and illumination spectra. The problems of core interest in other areas of cognitive scien... |

135 | Integrating topics and syntax
- Griffiths, Steyvers, et al.
- 2005
(Show Context)
Citation Context ...re other aspects of language. As generative models, topic models can be modified to incorporate richer semantic representations such as hierarchies (Blei et al., 2004), as well as rudimentary syntax (=-=Griffiths, Steyvers, Blei, & Tenenbaum, 2005-=-), and extensions of the Markov chain Monte Carlo algorithm described in this section make it possible to sample from the posterior distributions induced by these models. 6. Conclusion Our aim in this... |

121 | Expectation-propagation for the generative aspect model
- Minka, Lafferty
(Show Context)
Citation Context ...ngth of association between words (given by Equation 37) by averaging over many samples. 5 While other inference algorithms exist that can be used with this generative model (e.g., Blei et al., 2003; =-=Minka & Lafferty, 2002-=-), the Gibbs sampler is an extremely simple (and reasonably efficient) way to investigate the consequences of using topics to represent semantic relationships between words. Griffiths and Steyvers (20... |

100 | Structure and strength in causal induction
- Griffiths, Tenenbaum
- 2005
(Show Context)
Citation Context ...essing and acquisition (Chater & Manning, 2006; Xu & Tenenbaum, in press), symbolic reasoning (Oaksford & Chater, 2001), causal learning and inference (Steyvers, Tenenbaum, Wagenmakers, & Blum, 2003; =-=Griffiths & Tenenbaum, 2005-=-, 2007a), and social cognition (Baker, Tenenbaum, & Saxe, 2007), among other topics. Behind these different research programs is a shared sense of which are the most compelling computational questions... |

96 |
Inferring causal networks from observations and interventions
- Steyvers, Tenenbaum, et al.
- 2003
(Show Context)
Citation Context ...yvers, Griffiths, & Dennis, 2006), language processing and acquisition (Chater & Manning, 2006; Xu & Tenenbaum, in press), symbolic reasoning (Oaksford & Chater, 2001), causal learning and inference (=-=Steyvers, Tenenbaum, Wagenmakers, & Blum, 2003-=-; Griffiths & Tenenbaum, 2005, 2007a), and social cognition (Baker, Tenenbaum, & Saxe, 2007), among other topics. Behind these different research programs is a shared sense of which are the most compe... |

89 | Theorybased bayesian models of inductive learning and reasoning
- Tenenbaum, Griffiths, et al.
- 2006
(Show Context)
Citation Context ... broad spectrum of the cognitive sciences. Just in the last few years, Bayesian models have addressed animal learning (Courville, Daw, & Touretzky, 2006), human inductive learning and generalization (=-=Tenenbaum, Griffiths, & Kemp, 2006-=-), visual scene perception (Yuille & Kersten, 2006), motor control (Kording & Wolpert, 2006), semantic memory (Steyvers, Griffiths, & Dennis, 2006), language processing and acquisition (Chater & Manni... |

87 | The mind’s arrows: Bayes nets and graphical causal models in psychology - Glymour - 2001 |

83 | The use of statistical heuristics in everyday inductive reasoning - Nisbett, Krantz, et al. - 1983 |

78 |
Mathematical Statistics and Data Analysis, 2nd Ed
- Rice
- 1995
(Show Context)
Citation Context ... Under one classical approach, inferring θ is treated as a problem of estimating a fixed parameter of a probabilistic model, to which the standard solution is maximumlikelihood estimation (see, e.g., =-=Rice, 1995-=-). Maximum-likelihood estimation is simple and often sensible, but can also be problematic – particularly as a way to think about human inference. Our coinflipping example illustrates some of these pr... |