Results 1  10
of
633
Hierarchical Dirichlet processes.
 Journal of the American Statistical Association,
, 2006
"... We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this s ..."
Abstract

Cited by 940 (78 self)
 Add to MetaCart
(Show Context)
We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the wellknown clustering property of the Dirichlet process provides a nonparametric prior for the number of mixture components within each group. Given our desire to tie the mixture models in the various groups, we consider a hierarchical model, specifically one in which the base measure for the child Dirichlet processes is itself distributed according to a Dirichlet process. Such a base measure being discrete, the child Dirichlet processes necessarily share atoms. Thus, as desired, the mixture models in the different groups necessarily share mixture components. We discuss representations of hierarchical Dirichlet processes in terms of a stickbreaking process, and a generalization of the Chinese restaurant process that we refer to as the "Chinese restaurant franchise." We present Markov chain Monte Carlo algorithms for posterior inference in hierarchical Dirichlet process mixtures and describe applications to problems in information retrieval and text modeling.
Bayesian density estimation and inference using mixtures.
 J. Amer. Statist. Assoc.
, 1995
"... JSTOR is a notforprofit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about J ..."
Abstract

Cited by 653 (18 self)
 Add to MetaCart
(Show Context)
JSTOR is a notforprofit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. We describe and illustrate Bayesian inference in models for density estimation using mixtures of Dirichlet processes. These models provide natural settings for density estimation and are exemplified by special cases where data are modeled as a sample from mixtures of normal distributions. Efficient simulation methods are used to approximate various prior, posterior, and predictive distributions. This allows for direct inference on a variety of practical issues, including problems of local versus global smoothing, uncertainty about density estimates, assessment of modality, and the inference on the numbers of components. Also, convergence results are established for a general class of normal mixture models. American Statistical Association
The Infinite Hidden Markov Model
 Machine Learning
, 2002
"... We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. Th ..."
Abstract

Cited by 638 (41 self)
 Add to MetaCart
We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying statetransition matrix, and the expected number of distinct hidden states in a finite sequence. In this framework it is also natural to allow the alphabet of emitted symbols to be infiniteconsider, for example, symbols being possible words appearing in English text.
Gibbs Sampling Methods for StickBreaking Priors
"... ... In this paper we present two general types of Gibbs samplers that can be used to fit posteriors of Bayesian hierarchical models based on stickbreaking priors. The first type of Gibbs sampler, referred to as a Polya urn Gibbs sampler, is a generalized version of a widely used Gibbs sampling meth ..."
Abstract

Cited by 387 (19 self)
 Add to MetaCart
... In this paper we present two general types of Gibbs samplers that can be used to fit posteriors of Bayesian hierarchical models based on stickbreaking priors. The first type of Gibbs sampler, referred to as a Polya urn Gibbs sampler, is a generalized version of a widely used Gibbs sampling method currently employed for Dirichlet process computing. This method applies to stickbreaking priors with a known P'olya urn characterization; that is priors with an explicit and simple prediction rule. Our second method, the blocked Gibbs sampler, is based on a entirely different approach that works by directly sampling values from the posterior of the random measure. The blocked Gibbs sampler can be viewed as a more general approach as it works without requiring an explicit prediction rule. We find that the blocked Gibbs avoids some of the limitations seen with the Polya urn approach and should be simpler for nonexperts to use.
Infinite Latent Feature Models and the Indian Buffet Process
, 2005
"... We define a probability distribution over equivalence classes of binary matrices with a finite number of rows and an unbounded number of columns. This distribution ..."
Abstract

Cited by 272 (45 self)
 Add to MetaCart
We define a probability distribution over equivalence classes of binary matrices with a finite number of rows and an unbounded number of columns. This distribution
The Infinite Gaussian Mixture Model
 In Advances in Neural Information Processing Systems 12
, 2000
"... In a Bayesian mixture model it is not necessary a priori to limit the number of components to be finite. In this paper an infinite Gaussian mixture model is presented which neatly sidesteps the difficult problem of finding the "right" number of mixture components. Inference in the model is ..."
Abstract

Cited by 253 (8 self)
 Add to MetaCart
In a Bayesian mixture model it is not necessary a priori to limit the number of components to be finite. In this paper an infinite Gaussian mixture model is presented which neatly sidesteps the difficult problem of finding the "right" number of mixture components. Inference in the model is done using an efficient parameterfree Markov Chain that relies entirely on Gibbs sampling.
Variational inference for Dirichlet process mixtures
 Bayesian Analysis
, 2005
"... Abstract. Dirichlet process (DP) mixture models are the cornerstone of nonparametric Bayesian statistics, and the development of MonteCarlo Markov chain (MCMC) sampling methods for DP mixtures has enabled the application of nonparametric Bayesian methods to a variety of practical data analysis prob ..."
Abstract

Cited by 244 (27 self)
 Add to MetaCart
(Show Context)
Abstract. Dirichlet process (DP) mixture models are the cornerstone of nonparametric Bayesian statistics, and the development of MonteCarlo Markov chain (MCMC) sampling methods for DP mixtures has enabled the application of nonparametric Bayesian methods to a variety of practical data analysis problems. However, MCMC sampling can be prohibitively slow, and it is important to explore alternatives. One class of alternatives is provided by variational methods, a class of deterministic algorithms that convert inference problems into optimization problems (Opper and Saad 2001; Wainwright and Jordan 2003). Thus far, variational methods have mainly been explored in the parametric setting, in particular within the formalism of the exponential family (Attias 2000; Ghahramani and Beal 2001; Blei et al. 2003). In this paper, we present a variational inference algorithm for DP mixtures. We present experiments that compare the algorithm to Gibbs sampling algorithms for DP mixtures of Gaussians and present an application to a largescale image analysis problem.
Computational Discovery of Gene Modules, Regulatory Networks and Expression Programs
, 2007
"... Highthroughput molecular data are revolutionizing biology by providing massive amounts of information about gene expression and regulation. Such information is applicable both to furthering our understanding of fundamental biology and to developing new diagnostic and treatment approaches for diseas ..."
Abstract

Cited by 236 (17 self)
 Add to MetaCart
Highthroughput molecular data are revolutionizing biology by providing massive amounts of information about gene expression and regulation. Such information is applicable both to furthering our understanding of fundamental biology and to developing new diagnostic and treatment approaches for diseases. However, novel mathematical methods are needed for extracting biological knowledge from highdimensional, complex and noisy data sources. In this thesis, I develop and apply three novel computational approaches for this task. The common theme of these approaches is that they seek to discover meaningful groups of genes, which confer robustness to noise and compress complex information into interpretable models. I first present the GRAM algorithm, which fuses information from genomewide expression and in vivo transcription factorDNA binding data to discover regulatory networks of
Learning systems of concepts with an infinite relational model
 In Proceedings of the 21st National Conference on Artificial Intelligence
, 2006
"... Relationships between concepts account for a large proportion of semantic knowledge. We present a nonparametric Bayesian model that discovers systems of related concepts. Given data involving several sets of entities, our model discovers the kinds of entities in each set and the relations between ki ..."
Abstract

Cited by 229 (24 self)
 Add to MetaCart
(Show Context)
Relationships between concepts account for a large proportion of semantic knowledge. We present a nonparametric Bayesian model that discovers systems of related concepts. Given data involving several sets of entities, our model discovers the kinds of entities in each set and the relations between kinds that are possible or likely. We apply our approach to four problems: clustering objects and features, learning ontologies, discovering kinship systems, and discovering structure in political data. Philosophers, psychologists and computer scientists have proposed that semantic knowledge is best understood as a system of relations. Two questions immediately arise: how can these systems be represented, and how are these representations acquired? Researchers who start with the
A SplitMerge Markov Chain Monte Carlo Procedure for the Dirichlet Process Mixture Model
 Journal of Computational and Graphical Statistics
, 2000
"... . We propose a splitmerge Markov chain algorithm to address the problem of inefficient sampling for conjugate Dirichlet process mixture models. Traditional Markov chain Monte Carlo methods for Bayesian mixture models, such as Gibbs sampling, can become trapped in isolated modes corresponding to an ..."
Abstract

Cited by 150 (0 self)
 Add to MetaCart
(Show Context)
. We propose a splitmerge Markov chain algorithm to address the problem of inefficient sampling for conjugate Dirichlet process mixture models. Traditional Markov chain Monte Carlo methods for Bayesian mixture models, such as Gibbs sampling, can become trapped in isolated modes corresponding to an inappropriate clustering of data points. This article describes a MetropolisHastings procedure that can escape such local modes by splitting or merging mixture components. Our MetropolisHastings algorithm employs a new technique in which an appropriate proposal for splitting or merging components is obtained by using a restricted Gibbs sampling scan. We demonstrate empirically that our method outperforms the Gibbs sampler in situations where two or more components are similar in structure. Key words: Dirichlet process mixture model, Markov chain Monte Carlo, MetropolisHastings algorithm, Gibbs sampler, splitmerge updates 1 Introduction Mixture models are often applied to density estim...