Results 1  10
of
26
Hierarchical Dirichlet processes
 Journal of the American Statistical Association
, 2004
"... program. The authors wish to acknowledge helpful discussions with Lancelot James and Jim Pitman and the referees for useful comments. 1 We consider problems involving groups of data, where each observation within a group is a draw from a mixture model, and where it is desirable to share mixture comp ..."
Abstract

Cited by 543 (56 self)
 Add to MetaCart
program. The authors wish to acknowledge helpful discussions with Lancelot James and Jim Pitman and the referees for useful comments. 1 We consider problems involving groups of data, where each observation within a group is a draw from a mixture model, and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the wellknown clustering property of the Dirichlet process provides a nonparametric prior for the number of mixture components within each group. Given our desire to tie the mixture models in the various groups, we consider a hierarchical model, specifically one in which the base measure for the child Dirichlet processes is itself distributed according to a Dirichlet process. Such a base measure being discrete, the child Dirichlet processes necessarily share atoms. Thus, as desired, the mixture models in the different groups necessarily share mixture components. We discuss representations of hierarchical Dirichlet processes in terms of
Unsupervised learning of finite mixture models
 IEEE Transactions on pattern analysis and machine intelligence
, 2002
"... AbstractÐThis paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectationmaximization ..."
Abstract

Cited by 271 (20 self)
 Add to MetaCart
AbstractÐThis paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectationmaximization (EM) algorithm, it does not require careful initialization. The proposed method also avoids another drawback of EM for mixture fitting: the possibility of convergence toward a singular estimate at the boundary of the parameter space. The novelty of our approach is that we do not use a model selection criterion to choose one among a set of preestimated candidate models; instead, we seamlessly integrate estimation and model selection in a single algorithm. Our technique can be applied to any type of parametric mixture model for which it is possible to write an EM algorithm; in this paper, we illustrate it with experiments involving Gaussian mixtures. These experiments testify for the good performance of our approach. Index TermsÐFinite mixtures, unsupervised learning, model selection, minimum message length criterion, Bayesian methods, expectationmaximization algorithm, clustering. æ 1
Infinite Latent Feature Models and the Indian Buffet Process
, 2005
"... We define a probability distribution over equivalence classes of binary matrices with a finite number of rows and an unbounded number of columns. This distribution ..."
Abstract

Cited by 185 (38 self)
 Add to MetaCart
We define a probability distribution over equivalence classes of binary matrices with a finite number of rows and an unbounded number of columns. This distribution
A SplitMerge Markov Chain Monte Carlo Procedure for the Dirichlet Process Mixture Model
 Journal of Computational and Graphical Statistics
, 2000
"... . We propose a splitmerge Markov chain algorithm to address the problem of inefficient sampling for conjugate Dirichlet process mixture models. Traditional Markov chain Monte Carlo methods for Bayesian mixture models, such as Gibbs sampling, can become trapped in isolated modes corresponding to an ..."
Abstract

Cited by 93 (0 self)
 Add to MetaCart
. We propose a splitmerge Markov chain algorithm to address the problem of inefficient sampling for conjugate Dirichlet process mixture models. Traditional Markov chain Monte Carlo methods for Bayesian mixture models, such as Gibbs sampling, can become trapped in isolated modes corresponding to an inappropriate clustering of data points. This article describes a MetropolisHastings procedure that can escape such local modes by splitting or merging mixture components. Our MetropolisHastings algorithm employs a new technique in which an appropriate proposal for splitting or merging components is obtained by using a restricted Gibbs sampling scan. We demonstrate empirically that our method outperforms the Gibbs sampler in situations where two or more components are similar in structure. Key words: Dirichlet process mixture model, Markov chain Monte Carlo, MetropolisHastings algorithm, Gibbs sampler, splitmerge updates 1 Introduction Mixture models are often applied to density estim...
A hierarchical dirichlet language model
 Natural Language Engineering
, 1994
"... We discuss a hierarchical probabilistic model whose predictions are similar to those of the popular language modelling procedure known as 'smoothing'. A number of interesting differences from smoothing emerge. The insights gained from a probabilistic view of this problem point towards new directions ..."
Abstract

Cited by 80 (3 self)
 Add to MetaCart
We discuss a hierarchical probabilistic model whose predictions are similar to those of the popular language modelling procedure known as 'smoothing'. A number of interesting differences from smoothing emerge. The insights gained from a probabilistic view of this problem point towards new directions for language modelling. The ideas of this paper are also applicable to other problems such as the modelling of triphomes in speech, and DNA and protein sequences in molecular biology. The new algorithm is compared with smoothing on a two million word corpus. The methods prove to be about equally accurate, with the hierarchical model using fewer computational resources. 1
Variable selection in clustering via Dirichlet process mixture models
, 2006
"... The increased collection of highdimensional data in various fields has raised a strong interest in clustering algorithms and variable selection procedures. In this paper, we propose a modelbased method that addresses the two problems simultaneously. We introduce a latent binary vector to identify ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
The increased collection of highdimensional data in various fields has raised a strong interest in clustering algorithms and variable selection procedures. In this paper, we propose a modelbased method that addresses the two problems simultaneously. We introduce a latent binary vector to identify discriminating variables and use Dirichlet process mixture models to define the cluster structure. We update the variable selection index using a Metropolis algorithm and obtain inference on the cluster structure via a splitmerge Markov chain Monte Carlo technique. We explore the performance of the methodology on simulated data and illustrate an application with a dna microarray study.
Choice of Basis for Laplace Approximation
 Machine Learning
, 1998
"... Maximum a posterJori optimization of parameters and the Laplace approximation for the marginal likelihood are both basisdependent methods. This note compares two choices of basis for models parameterized by probabilities, showing that it is possible to improve on the traditional choice, the prob ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
Maximum a posterJori optimization of parameters and the Laplace approximation for the marginal likelihood are both basisdependent methods. This note compares two choices of basis for models parameterized by probabilities, showing that it is possible to improve on the traditional choice, the probability simplex, by transforming to the softmax' basis.
On Fitting Mixture Models
, 1999
"... Consider the problem of fitting a finite Gaussian mixture, with an unknown number of components, to observed data. This paper proposes a new minimum description length (MDL) type criterion, termed MMDL (for mixture MDL), to select the number of components of the model. MMDL is based on the ident ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
Consider the problem of fitting a finite Gaussian mixture, with an unknown number of components, to observed data. This paper proposes a new minimum description length (MDL) type criterion, termed MMDL (for mixture MDL), to select the number of components of the model. MMDL is based on the identification of an "equivalent sample size", for each component, which does not coincide with the full sample size. We also introduce an algorithm based on the standard expectationmaximization (EM) approach together with a new agglomerative step, called agglomerative EM (AEM). The experiments here reported have shown that MMDL outperforms existing criteria of comparable computational cost. The good behavior of AEM, namely its good robustness with respect to initialization, is also illustrated experimentally.
AN IMPROVED MERGESPLIT SAMPLER FOR CONJUGATE DIRICHLET PROCESS MIXTURE MODELS
, 2003
"... The Gibbs sampler is the standard Markov chain Monte Carlo sampler for drawing samples from the posterior distribution of conjugate Dirichlet process mixture models. Researchers have noticed the Gibbs sampler’s tendency to get stuck in local modes and, thus, poorly explore the posterior distribution ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
The Gibbs sampler is the standard Markov chain Monte Carlo sampler for drawing samples from the posterior distribution of conjugate Dirichlet process mixture models. Researchers have noticed the Gibbs sampler’s tendency to get stuck in local modes and, thus, poorly explore the posterior distribution. Jain and Neal (2004) proposed a mergesplit sampler in which a naive random split is sweetened by a series of restricted Gibbs scans, where the number of Gibbs scans is a tuning parameter that must be supplied by the user. In this work, I propose an alternative mergesplit sampler borrowing ideas from sequential importance sampling. My sampler proposes splits by sequentially allocating observations to one of two split components using allocation probabilities that are conditional on previously allocated data. The algorithm does not require further sweetening and is, hence, computationally efficient. In addition, no tuning parameter needs to be chosen. While the conditional allocation of observations is similar to sequential importance sampling, the output from the sampler has the correct stationary distribution due to the use of the MetropolisHastings ratio. The computational efficiency of my sequentiallyallocated mergesplit (SAMS) sampler is