Results 1  10
of
81
Hierarchical Dirichlet processes
 Journal of the American Statistical Association
, 2004
"... program. The authors wish to acknowledge helpful discussions with Lancelot James and Jim Pitman and the referees for useful comments. 1 We consider problems involving groups of data, where each observation within a group is a draw from a mixture model, and where it is desirable to share mixture comp ..."
Abstract

Cited by 543 (56 self)
 Add to MetaCart
program. The authors wish to acknowledge helpful discussions with Lancelot James and Jim Pitman and the referees for useful comments. 1 We consider problems involving groups of data, where each observation within a group is a draw from a mixture model, and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the wellknown clustering property of the Dirichlet process provides a nonparametric prior for the number of mixture components within each group. Given our desire to tie the mixture models in the various groups, we consider a hierarchical model, specifically one in which the base measure for the child Dirichlet processes is itself distributed according to a Dirichlet process. Such a base measure being discrete, the child Dirichlet processes necessarily share atoms. Thus, as desired, the mixture models in the different groups necessarily share mixture components. We discuss representations of hierarchical Dirichlet processes in terms of
Infinite Latent Feature Models and the Indian Buffet Process
, 2005
"... We define a probability distribution over equivalence classes of binary matrices with a finite number of rows and an unbounded number of columns. This distribution ..."
Abstract

Cited by 185 (38 self)
 Add to MetaCart
We define a probability distribution over equivalence classes of binary matrices with a finite number of rows and an unbounded number of columns. This distribution
Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models
 PROC. IEEE
, 2008
"... Inference for Dirichlet process hierarchical models is typically performed using Markov chain Monte Carlo methods, which can be roughly categorised into marginal and conditional methods. The former integrate out analytically the infinitedimensional component of the hierarchical model and sample fro ..."
Abstract

Cited by 42 (5 self)
 Add to MetaCart
Inference for Dirichlet process hierarchical models is typically performed using Markov chain Monte Carlo methods, which can be roughly categorised into marginal and conditional methods. The former integrate out analytically the infinitedimensional component of the hierarchical model and sample from the marginal distribution of the remaining variables using the Gibbs sampler. Conditional methods impute the Dirichlet process and update it as a component of the Gibbs sampler. Since this requires imputation of an infinitedimensional process, implementation of the conditional method has relied on finite approximations. In this paper we show how to avoid such approximations by designing two novel Markov chain Monte Carlo algorithms which sample from the exact posterior distribution of quantities of interest. The approximations are avoided by the new technique of retrospective sampling. We also show how the algorithms can obtain samples from functionals of the Dirichlet process. The marginal and the conditional methods are compared and a careful simulation study is included, which involves a nonconjugate model, different datasets and prior specifications.
Discovering latent classes in relational data
, 2004
"... We present a framework for learning abstract relational knowledge with the aim of explaining how people acquire intuitive theories of physical, biological, or social systems. Our approach is based on a generative relational model with latent classes, and simultaneously determines the kinds of entiti ..."
Abstract

Cited by 40 (4 self)
 Add to MetaCart
We present a framework for learning abstract relational knowledge with the aim of explaining how people acquire intuitive theories of physical, biological, or social systems. Our approach is based on a generative relational model with latent classes, and simultaneously determines the kinds of entities that exist in a domain, the number of these latent classes, and the relations between classes that are possible or likely. This model goes beyond previous psychological models of category learning, which consider attributes associated with individual categories but not relationships between categories. We apply this domaingeneral framework to two specific problems: learning the structure of kinship systems and learning causal theories. 1 1
Unsupervised learning of visual taxonomies
 In CVPR
, 2008
"... Corel dataset. Images are represented using ‘spacecolor histograms ’ (section 4.1). Each node shows a synthetically generated ‘quilt ’ – an icon that represents that node’s model of images. As can be seen, common colors (such as black) are represented at top nodes and therefore are shared among mu ..."
Abstract

Cited by 33 (1 self)
 Add to MetaCart
Corel dataset. Images are represented using ‘spacecolor histograms ’ (section 4.1). Each node shows a synthetically generated ‘quilt ’ – an icon that represents that node’s model of images. As can be seen, common colors (such as black) are represented at top nodes and therefore are shared among multiple images. Bottom: an example image from each leaf node is shown below each leaf. As more images and categories become available, organizing them becomes crucial. We present a novel statistical method for organizing a collection of images into a treeshaped hierarchy. The method employs a nonparametric Bayesian model and is completely unsupervised. Each image is associated with a path through a tree. Similar images share initial segments of their paths and therefore have a smaller distance from each other. Each internal node in the hierarchy represents information that is common to images whose paths pass through that node, thus providing a compact image representation. Our experiments show that a disorganized collection of images will be organized into an intuitive taxonomy. Furthermore, we find that the taxonomy allows good image categorization and, in this respect, is superior to the popular LDA model. 1.
A Bayesian model for supervised clustering with the Dirichlet process prior
 Journal of Machine Learning Research
, 2005
"... We develop a Bayesian framework for tackling the supervised clustering problem, the generic problem encountered in tasks such as reference matching, coreference resolution, identity uncertainty and record linkage. Our clustering model is based on the Dirichlet process prior, which enables us to defi ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
We develop a Bayesian framework for tackling the supervised clustering problem, the generic problem encountered in tasks such as reference matching, coreference resolution, identity uncertainty and record linkage. Our clustering model is based on the Dirichlet process prior, which enables us to define distributions over the countably infinite sets that naturally arise in this problem. We add supervision to our model by positing the existence of a set of unobserved random variables (we call these “reference types”) that are generic across all clusters. Inference in our framework, which requires integrating over infinitely many parameters, is solved using Markov chain Monte Carlo techniques. We present algorithms for both conjugate and nonconjugate priors. We present a simple—but general—parameterization of our model based on a Gaussian assumption. We evaluate this model on one artificial task and three realworld tasks, comparing it against both unsupervised and stateoftheart supervised algorithms. Our results show that our model is able to outperform other models across a variety of tasks and performance metrics.
Variable selection in clustering via Dirichlet process mixture models
, 2006
"... The increased collection of highdimensional data in various fields has raised a strong interest in clustering algorithms and variable selection procedures. In this paper, we propose a modelbased method that addresses the two problems simultaneously. We introduce a latent binary vector to identify ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
The increased collection of highdimensional data in various fields has raised a strong interest in clustering algorithms and variable selection procedures. In this paper, we propose a modelbased method that addresses the two problems simultaneously. We introduce a latent binary vector to identify discriminating variables and use Dirichlet process mixture models to define the cluster structure. We update the variable selection index using a Metropolis algorithm and obtain inference on the cluster structure via a splitmerge Markov chain Monte Carlo technique. We explore the performance of the methodology on simulated data and illustrate an application with a dna microarray study.
Generalpurpose mcmc inference over relational structures
 In Proceedings of the Proceedings of the TwentySecond Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI06
"... Tasks such as record linkage and multitarget tracking, which involve reconstructing the set of objects that underlie some observed data, are particularly challenging for probabilistic inference. Recent work has achieved efficient and accurate inference on such problems using Markov chain Monte Carl ..."
Abstract

Cited by 22 (6 self)
 Add to MetaCart
Tasks such as record linkage and multitarget tracking, which involve reconstructing the set of objects that underlie some observed data, are particularly challenging for probabilistic inference. Recent work has achieved efficient and accurate inference on such problems using Markov chain Monte Carlo (MCMC) techniques with customized proposal distributions. Currently, implementing such a system requires coding MCMC state representations and acceptance probability calculations that are specific to a particular application. An alternative approach, which we pursue in this paper, is to use a generalpurpose probabilistic modeling language (such as BLOG) and a generic MetropolisHastings MCMC algorithm that supports usersupplied proposal distributions. Our algorithm gains flexibility by using MCMC states that are only partial descriptions of possible worlds; we provide conditions under which MCMC over partial worlds yields correct answers to queries. We also show how to use a contextspecific Bayes net to identify the factors in the acceptance probability that need to be computed for a given proposed move. Experimental results on a citation matching task show that our generalpurpose MCMC engine compares favorably with an applicationspecific system. 1
Modelling Relational Data using Bayesian Clustered Tensor Factorization
"... We consider the problem of learning probabilistic models for complex relational structures between various types of objects. A model can help us “understand ” a dataset of relational facts in at least two ways, by finding interpretable structure in the data, and by supporting predictions, or inferen ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
We consider the problem of learning probabilistic models for complex relational structures between various types of objects. A model can help us “understand ” a dataset of relational facts in at least two ways, by finding interpretable structure in the data, and by supporting predictions, or inferences about whether particular unobserved relations are likely to be true. Often there is a tradeoff between these two aims: clusterbased models yield more easily interpretable representations, while factorizationbased approaches have given better predictive performance on large data sets. We introduce the Bayesian Clustered Tensor Factorization (BCTF) model, which embeds a factorized representation of relations in a nonparametric Bayesian clustering framework. Inference is fully Bayesian but scales well to large data sets. The model simultaneously discovers interpretable clusters and yields predictive performance that matches or beats previous probabilistic models for relational data. 1
Clustering Using Objective Functions and Stochastic Search
, 2007
"... Summary. A new approach to clustering multivariate data, based on a multilevel linear mixed model, is proposed. A key feature of the model is that observations from the same cluster are correlated, because they share clusterspecific random effects. The inclusion of clusterspecific random effects a ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
Summary. A new approach to clustering multivariate data, based on a multilevel linear mixed model, is proposed. A key feature of the model is that observations from the same cluster are correlated, because they share clusterspecific random effects. The inclusion of clusterspecific random effects allows parsimonious departure from an assumed base model for cluster mean profiles. This departure is captured statistically via the posterior expectation, or best linear unbiased predictor. One of the parameters in the model is the true underlying partition of the data, and the posterior distribution of this parameter, which is known up to a normalizing constant, is used to cluster the data. The problem of finding partitions with high posterior probability is not amenable to deterministic methods such as the EM algorithm. Thus, we propose a stochastic search algorithm that is driven by a Markov chain that is a mixture of two Metropolis–Hastings algorithms—one that makes small scale changes to individual objects and another that performs large scale moves involving entire clusters. The methodology proposed is fundamentally different from the wellknown finite mixture model approach to clustering, which does not explicitly include the partition as a parameter, and involves an independent and identically distributed structure.