Results 1  10
of
45
Hierarchical Dirichlet processes
 Journal of the American Statistical Association
, 2004
"... program. The authors wish to acknowledge helpful discussions with Lancelot James and Jim Pitman and the referees for useful comments. 1 We consider problems involving groups of data, where each observation within a group is a draw from a mixture model, and where it is desirable to share mixture comp ..."
Abstract

Cited by 927 (79 self)
 Add to MetaCart
(Show Context)
program. The authors wish to acknowledge helpful discussions with Lancelot James and Jim Pitman and the referees for useful comments. 1 We consider problems involving groups of data, where each observation within a group is a draw from a mixture model, and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the wellknown clustering property of the Dirichlet process provides a nonparametric prior for the number of mixture components within each group. Given our desire to tie the mixture models in the various groups, we consider a hierarchical model, specifically one in which the base measure for the child Dirichlet processes is itself distributed according to a Dirichlet process. Such a base measure being discrete, the child Dirichlet processes necessarily share atoms. Thus, as desired, the mixture models in the different groups necessarily share mixture components. We discuss representations of hierarchical Dirichlet processes in terms of
Infinite Latent Feature Models and the Indian Buffet Process
, 2005
"... We define a probability distribution over equivalence classes of binary matrices with a finite number of rows and an unbounded number of columns. This distribution ..."
Abstract

Cited by 274 (46 self)
 Add to MetaCart
We define a probability distribution over equivalence classes of binary matrices with a finite number of rows and an unbounded number of columns. This distribution
A SplitMerge Markov Chain Monte Carlo Procedure for the Dirichlet Process Mixture Model
 Journal of Computational and Graphical Statistics
, 2000
"... . We propose a splitmerge Markov chain algorithm to address the problem of inefficient sampling for conjugate Dirichlet process mixture models. Traditional Markov chain Monte Carlo methods for Bayesian mixture models, such as Gibbs sampling, can become trapped in isolated modes corresponding to an ..."
Abstract

Cited by 151 (0 self)
 Add to MetaCart
. We propose a splitmerge Markov chain algorithm to address the problem of inefficient sampling for conjugate Dirichlet process mixture models. Traditional Markov chain Monte Carlo methods for Bayesian mixture models, such as Gibbs sampling, can become trapped in isolated modes corresponding to an inappropriate clustering of data points. This article describes a MetropolisHastings procedure that can escape such local modes by splitting or merging mixture components. Our MetropolisHastings algorithm employs a new technique in which an appropriate proposal for splitting or merging components is obtained by using a restricted Gibbs sampling scan. We demonstrate empirically that our method outperforms the Gibbs sampler in situations where two or more components are similar in structure. Key words: Dirichlet process mixture model, Markov chain Monte Carlo, MetropolisHastings algorithm, Gibbs sampler, splitmerge updates 1 Introduction Mixture models are often applied to density estim...
A hierarchical dirichlet language model
 Natural Language Engineering
, 1994
"... We discuss a hierarchical probabilistic model whose predictions are similar to those of the popular language modelling procedure known as 'smoothing'. A number of interesting differences from smoothing emerge. The insights gained from a probabilistic view of this problem point towards new ..."
Abstract

Cited by 95 (3 self)
 Add to MetaCart
(Show Context)
We discuss a hierarchical probabilistic model whose predictions are similar to those of the popular language modelling procedure known as 'smoothing'. A number of interesting differences from smoothing emerge. The insights gained from a probabilistic view of this problem point towards new directions for language modelling. The ideas of this paper are also applicable to other problems such as the modelling of triphomes in speech, and DNA and protein sequences in molecular biology. The new algorithm is compared with smoothing on a two million word corpus. The methods prove to be about equally accurate, with the hierarchical model using fewer computational resources. 1
Variable selection in clustering via Dirichlet process mixture models
, 2006
"... The increased collection of highdimensional data in various fields has raised a strong interest in clustering algorithms and variable selection procedures. In this paper, we propose a modelbased method that addresses the two problems simultaneously. We introduce a latent binary vector to identify ..."
Abstract

Cited by 40 (3 self)
 Add to MetaCart
The increased collection of highdimensional data in various fields has raised a strong interest in clustering algorithms and variable selection procedures. In this paper, we propose a modelbased method that addresses the two problems simultaneously. We introduce a latent binary vector to identify discriminating variables and use Dirichlet process mixture models to define the cluster structure. We update the variable selection index using a Metropolis algorithm and obtain inference on the cluster structure via a splitmerge Markov chain Monte Carlo technique. We explore the performance of the methodology on simulated data and illustrate an application with a dna microarray study.
A tutorial on Bayesian nonparametric models
 Journal of Mathematical Psychology
"... A key problem in statistical modeling is model selection, how to choose a model at an appropriate level of complexity. This problem appears in many settings, most prominently in choosing the number of clusters in mixture models or the number of factors in factor analysis. In this tutorial we describ ..."
Abstract

Cited by 39 (8 self)
 Add to MetaCart
(Show Context)
A key problem in statistical modeling is model selection, how to choose a model at an appropriate level of complexity. This problem appears in many settings, most prominently in choosing the number of clusters in mixture models or the number of factors in factor analysis. In this tutorial we describe Bayesian nonparametric methods, a class of methods that sidesteps this issue by allowing the data to determine the complexity of the model. This tutorial is a highlevel introduction to Bayesian nonparametric methods and contains several examples of their application. 1
Choice of Basis for Laplace Approximation
 Machine Learning
, 1998
"... Maximum a posterJori optimization of parameters and the Laplace approximation for the marginal likelihood are both basisdependent methods. This note compares two choices of basis for models parameterized by probabilities, showing that it is possible to improve on the traditional choice, the prob ..."
Abstract

Cited by 35 (1 self)
 Add to MetaCart
(Show Context)
Maximum a posterJori optimization of parameters and the Laplace approximation for the marginal likelihood are both basisdependent methods. This note compares two choices of basis for models parameterized by probabilities, showing that it is possible to improve on the traditional choice, the probability simplex, by transforming to the softmax' basis.
AN IMPROVED MERGESPLIT SAMPLER FOR CONJUGATE DIRICHLET PROCESS MIXTURE MODELS
, 2003
"... The Gibbs sampler is the standard Markov chain Monte Carlo sampler for drawing samples from the posterior distribution of conjugate Dirichlet process mixture models. Researchers have noticed the Gibbs sampler’s tendency to get stuck in local modes and, thus, poorly explore the posterior distribution ..."
Abstract

Cited by 28 (2 self)
 Add to MetaCart
The Gibbs sampler is the standard Markov chain Monte Carlo sampler for drawing samples from the posterior distribution of conjugate Dirichlet process mixture models. Researchers have noticed the Gibbs sampler’s tendency to get stuck in local modes and, thus, poorly explore the posterior distribution. Jain and Neal (2004) proposed a mergesplit sampler in which a naive random split is sweetened by a series of restricted Gibbs scans, where the number of Gibbs scans is a tuning parameter that must be supplied by the user. In this work, I propose an alternative mergesplit sampler borrowing ideas from sequential importance sampling. My sampler proposes splits by sequentially allocating observations to one of two split components using allocation probabilities that are conditional on previously allocated data. The algorithm does not require further sweetening and is, hence, computationally efficient. In addition, no tuning parameter needs to be chosen. While the conditional allocation of observations is similar to sequential importance sampling, the output from the sampler has the correct stationary distribution due to the use of the MetropolisHastings ratio. The computational efficiency of my sequentiallyallocated mergesplit (SAMS) sampler is