Results 1  10
of
21
Learning overhypotheses with hierarchical Bayesian models
"... Inductive learning is impossible without overhypotheses, or constraints on the hypotheses considered by the learner. Some of these overhypotheses must be innate, but we suggest that hierarchical Bayesian models help explain how the rest can be acquired. To illustrate this claim, we develop models th ..."
Abstract

Cited by 59 (23 self)
 Add to MetaCart
Inductive learning is impossible without overhypotheses, or constraints on the hypotheses considered by the learner. Some of these overhypotheses must be innate, but we suggest that hierarchical Bayesian models help explain how the rest can be acquired. To illustrate this claim, we develop models that acquire two kinds of overhypotheses — overhypotheses about feature variability (e.g. the shape bias in word learning) and overhypotheses about the grouping of categories into ontological kinds like objects and substances.
Exact bayesian structure learning from uncertain interventions
 AI & Statistics, In
, 2007
"... We show how to apply the dynamic programming algorithm of Koivisto and Sood [KS04, Koi06], which computes the exact posterior marginal edge probabilities p(Gij = 1D) of a DAG G given data D, to the case where the data is obtained by interventions (experiments). In particular, we consider the case w ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
(Show Context)
We show how to apply the dynamic programming algorithm of Koivisto and Sood [KS04, Koi06], which computes the exact posterior marginal edge probabilities p(Gij = 1D) of a DAG G given data D, to the case where the data is obtained by interventions (experiments). In particular, we consider the case where the targets of the interventions are a priori unknown. We show that it is possible to learn the targets of intervention at the same time as learning the causal structure. We apply our exact technique to a biological data set that had previously been analyzed using MCMC [SPP + 05, EW06, WGH06]. 1
Bayesian models of cognition
"... For over 200 years, philosophers and mathematicians have been using probability theory to describe human cognition. While the theory of probabilities was first developed as a means of analyzing games of chance, it quickly took on a larger and deeper significance as a formal account of how rational a ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
(Show Context)
For over 200 years, philosophers and mathematicians have been using probability theory to describe human cognition. While the theory of probabilities was first developed as a means of analyzing games of chance, it quickly took on a larger and deeper significance as a formal account of how rational agents should reason in situations of uncertainty
Learning causal schemata
 In Proceedings of the 29th Annual Conference of the Cognitive Science Society (pp. 389–394). Austin, TX: Cognitive Science Society
"... Causal inferences about sparsely observed objects are often supported by causal schemata, or systems of abstract causal knowledge. We present a hierarchical Bayesian framework that discovers simple causal schemata given only raw data as input. Given a set of objects and observations of causal events ..."
Abstract

Cited by 15 (11 self)
 Add to MetaCart
(Show Context)
Causal inferences about sparsely observed objects are often supported by causal schemata, or systems of abstract causal knowledge. We present a hierarchical Bayesian framework that discovers simple causal schemata given only raw data as input. Given a set of objects and observations of causal events involving some of these objects, our framework simultaneously discovers the causal type of each object, the causal powers of these types, the characteristic features of these types, and the nature of the interactions between these types. Several behavioral studies confirm that humans are able to discover causal schemata, and we show that our framework accounts for data collected by Lien and Cheng and Shanks and Darby.
Bayesian structure learning using dynamic programming and MCMC
 In UAI, 2007b
"... We show how to significantly speed up MCMC sampling of DAG structures by using a powerful nonlocal proposal based on Koivisto’s dynamic programming (DP) algorithm (11; 10), which computes the exact marginal posterior edge probabilities by analytically summing over orders. Furthermore, we show how s ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
We show how to significantly speed up MCMC sampling of DAG structures by using a powerful nonlocal proposal based on Koivisto’s dynamic programming (DP) algorithm (11; 10), which computes the exact marginal posterior edge probabilities by analytically summing over orders. Furthermore, we show how sampling in DAG space can avoid subtle biases that are introduced by approaches that work only with orders, such as Koivisto’s DP algorithm and MCMC order samplers (6; 5). 1
Sparse Gaussian Graphical Models with Unknown Block Structure
"... Recent work has shown that one can learn the structure of Gaussian Graphical Models by imposing an L1 penalty on the precision matrix, and then using efficient convex optimization methods to find the penalized maximum likelihood estimate. This is similar to performing MAP estimation with a prior tha ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Recent work has shown that one can learn the structure of Gaussian Graphical Models by imposing an L1 penalty on the precision matrix, and then using efficient convex optimization methods to find the penalized maximum likelihood estimate. This is similar to performing MAP estimation with a prior that prefers sparse graphs. In this paper, we use the stochastic block model as a prior. This prefer graphs that are blockwise sparse, but unlike previous work, it does not require that the blocks or groups be specified a priori. The resulting problem is no longer convex, but we devise an efficient variational Bayes algorithm to solve it. We show that our method has better test set likelihood on two different datasets (motion capture and gene expression) compared to independent L1, and can match the performance of group L1 using manually created groups. 1.
Sparse and Locally Constant Gaussian Graphical Models
"... Locality information is crucial in datasets where each variable corresponds to a measurement in a manifold (silhouettes, motion trajectories, 2D and 3D images). Although these datasets are typically undersampled and highdimensional, they often need to be represented with lowcomplexity statistical ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Locality information is crucial in datasets where each variable corresponds to a measurement in a manifold (silhouettes, motion trajectories, 2D and 3D images). Although these datasets are typically undersampled and highdimensional, they often need to be represented with lowcomplexity statistical models, which are comprised of only the important probabilistic dependencies in the datasets. Most methods attempt to reduce model complexity by enforcing structure sparseness. However, sparseness cannot describe inherent regularities in the structure. Hence, in this paper we first propose a new class of Gaussian graphical models which, together with sparseness, imposes local constancy through ℓ1norm penalization. Second, we propose an efficient algorithm which decomposes the strictly convex maximum likelihood estimation into a sequence of problems with closed form solutions. Through synthetic experiments, we evaluate the closeness of the recovered models to the ground truth. We also test the generalization performance of our method in a wide range of complex realworld datasets and demonstrate that it captures useful structures such as the rotation and shrinking of a beating heart, motion correlations between body parts during walking and functional interactions of brain regions. Our method outperforms the stateoftheart structure learning techniques for Gaussian graphical models both for small and large datasets. 1
Geometric Representations of Hypergraphs for Prior Specification and Posterior Sampling
, 2009
"... A parametrization of hypergraphs based on the geometry of points in R d is developed. Informative prior distributions on hypergraphs are induced through this parametrization by priors on point configurations via spatial processes. This prior specification is used to infer conditional independence m ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
A parametrization of hypergraphs based on the geometry of points in R d is developed. Informative prior distributions on hypergraphs are induced through this parametrization by priors on point configurations via spatial processes. This prior specification is used to infer conditional independence models or Markov structure of multivariate distributions. Specifically, we can recover both the junction tree factorization as well as the hyper Markov law. This approach offers greater control on the distribution of graph features than ErdösRényi random graphs, supports inference of factorizations that cannot be retrieved by a graph alone, and leads to new Metropolis/Hastings Markov chain Monte Carlo algorithms with both local and global moves in graph space. We illustrate the utility of this parametrization and prior specification using simulations.
Sufficient condition for pooling data from different distributions
 In First Symposium on Philosophy, History, and Methodology of Error
, 2006
"... We consider the problems arising from using sequences of experiments to discover the causal structure among a set of variables, none of whom are known ahead of time to be an “outcome”. In particular, we present various approaches to resolve conflicts in the experimental results arising from sampling ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
We consider the problems arising from using sequences of experiments to discover the causal structure among a set of variables, none of whom are known ahead of time to be an “outcome”. In particular, we present various approaches to resolve conflicts in the experimental results arising from sampling variability in the experiments. We provide a sufficient condition that allows for pooling of data from experiments with different joint distributions over the variables. Satisfaction of the condition allows for more powerful independence tests that may resolve some of the conflicts in the experimental results. The pooling condition has its own problems, but should – due to its generality – be informative to techniques for metaanalysis. 1.
Connecting Human and Machine Learning via Probabilistic Models of Cognition
 In: Technical Program. 10th Annual Conference of the International Speech Communication Association
, 2009
"... Human performance defines the standard that machine learning systems aspire to in many areas, including learning language. This suggests that studying human cognition may be a good way to develop better learning algorithms, as well as providing basic insights into how the human mind works. However, ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Human performance defines the standard that machine learning systems aspire to in many areas, including learning language. This suggests that studying human cognition may be a good way to develop better learning algorithms, as well as providing basic insights into how the human mind works. However, in order for ideas to flow easily from cognitive science to computer science and vice versa, we need a common framework for describing human and machine learning. I will summarize recent work exploring the hypothesis that probabilistic models of cognition, which view learning as a form of statistical inference, provide such a framework, including results that illustrate how novel ideas from statistics can inform cognitive science. Specifically, I will talk about how probabilistic models can be used to identify the assumptions of learners, learn at different levels of abstraction, and link the inductive biases of individuals to cultural universals. Index Terms: human learning, machine learning, probabilistic models