Results 1 - 10
of
71
Dealing with label switching in mixture models
- Journal of the Royal Statistical Society, Series B
, 2000
"... In a Bayesian analysis of finite mixture models, parameter estimation and clustering are sometimes less straightforward that might be expected. In particular, the common practice of estimating parameters by their posterior mean, and summarising joint posterior distributions by marginal distributions ..."
Abstract
-
Cited by 72 (0 self)
- Add to MetaCart
In a Bayesian analysis of finite mixture models, parameter estimation and clustering are sometimes less straightforward that might be expected. In particular, the common practice of estimating parameters by their posterior mean, and summarising joint posterior distributions by marginal distributions, often leads to nonsensical answers. This is due to the so-called “labelswitching” problem, which is caused by symmetry in the likelihood of the model parameters. A frequent response to this problem is to remove the symmetry using artificial identifiability constraints. We demonstrate that this fails in general to solve the problem, and describe an alternative class of approaches, relabelling algorithms, which arise from attempting to minimise the posterior expected loss under a class of loss functions. We describe in detail one particularly simple and general relabelling algorithm, and illustrate its success in dealing with the labelswitching problem on two examples.
A Split-Merge Markov Chain Monte Carlo Procedure for the Dirichlet Process Mixture Model
- Journal of Computational and Graphical Statistics
, 2000
"... . We propose a split-merge Markov chain algorithm to address the problem of inefficient sampling for conjugate Dirichlet process mixture models. Traditional Markov chain Monte Carlo methods for Bayesian mixture models, such as Gibbs sampling, can become trapped in isolated modes corresponding to an ..."
Abstract
-
Cited by 64 (0 self)
- Add to MetaCart
. We propose a split-merge Markov chain algorithm to address the problem of inefficient sampling for conjugate Dirichlet process mixture models. Traditional Markov chain Monte Carlo methods for Bayesian mixture models, such as Gibbs sampling, can become trapped in isolated modes corresponding to an inappropriate clustering of data points. This article describes a Metropolis-Hastings procedure that can escape such local modes by splitting or merging mixture components. Our Metropolis-Hastings algorithm employs a new technique in which an appropriate proposal for splitting or merging components is obtained by using a restricted Gibbs sampling scan. We demonstrate empirically that our method outperforms the Gibbs sampler in situations where two or more components are similar in structure. Key words: Dirichlet process mixture model, Markov chain Monte Carlo, Metropolis-Hastings algorithm, Gibbs sampler, split-merge updates 1 Introduction Mixture models are often applied to density estim...
Bayesian Methods for Hidden Markov Models -- Recursive Computing in the 21st Century
- JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2002
"... Markov chain Monte Carlo (MCMC) sampling strategies can be used to simulate hidden Markov model (HMM) parameters from their posterior distribution given observed data. Some MCMC methods (for computing likelihood, conditional probabilities of hidden states, and the most likely sequence of states) use ..."
Abstract
-
Cited by 52 (8 self)
- Add to MetaCart
Markov chain Monte Carlo (MCMC) sampling strategies can be used to simulate hidden Markov model (HMM) parameters from their posterior distribution given observed data. Some MCMC methods (for computing likelihood, conditional probabilities of hidden states, and the most likely sequence of states) used in practice can be improved by incorporating established recursive algorithms. The most important is a set of forward-backward recursions calculating conditional distributions of the hidden states given observed data and model parameters. We show how to use the recursive algorithms in an MCMC context and demonstrate mathematical and empirical results showing a Gibbs sampler using the forward-backward recursions mixes more rapidly than another sampler often used for HMM's. We introduce an augmented variables technique for obtaining unique state labels in HMM's and finite mixture models. We show how recursive computing allows statistically efficient use of MCMC output when estimating the hidden states. We directly calculate the posterior distribution of the hidden chain's state space size by MCMC, circumventing asymptotic arguments underlying the Bayesian information criterion, which is shown to be inappropriate for a frequently analyzed data set in the HMM literature. The use of log-likelihood for assessing MCMC convergence is illustrated, and posterior predictive checks are used to investigate application specific questions of model adequacy.
Dirichlet Prior Sieves in Finite Normal Mixtures
- Statistica Sinica
, 2002
"... Abstract: The use of a finite dimensional Dirichlet prior in the finite normal mixture model has the effect of acting like a Bayesian method of sieves. Posterior consistency is directly related to the dimension of the sieve and the choice of the Dirichlet parameters in the prior. We find that naive ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
Abstract: The use of a finite dimensional Dirichlet prior in the finite normal mixture model has the effect of acting like a Bayesian method of sieves. Posterior consistency is directly related to the dimension of the sieve and the choice of the Dirichlet parameters in the prior. We find that naive use of the popular uniform Dirichlet prior leads to an inconsistent posterior. However, a simple adjustment to the parameters in the prior induces a random probability measure that approximates the Dirichlet process and yields a posterior that is strongly consistent for the density and weakly consistent for the unknown mixing distribution. The dimension of the resulting sieve can be selected easily in practice and a simple and efficient Gibbs sampler can be used to sample the posterior of the mixing distribution. Key words and phrases: Bose-Einstein distribution, Dirichlet process, identification, method of sieves, random probability measure, relative entropy, weak convergence.
Deviance information criteria for missing data models
- Bayesian Analysis
, 2006
"... The deviance information criterion (DIC) introduced by Spiegelhalter et al. (2002) for model assessment and model comparison is directly inspired by linear and generalised linear models, but it is open to different possible variations in the setting of missing data models, depending in particular on ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
The deviance information criterion (DIC) introduced by Spiegelhalter et al. (2002) for model assessment and model comparison is directly inspired by linear and generalised linear models, but it is open to different possible variations in the setting of missing data models, depending in particular on whether or not the missing variables are treated as parameters. In this paper, we reassess the criterion for such models and compare different DIC constructions, testing the behaviour of these various extensions in the cases of mixtures of distributions and random effect models.
Estimating the integrated likelihood via posterior simulation using the harmonic mean identity
- Bayesian Statistics
, 2007
"... The integrated likelihood (also called the marginal likelihood or the normalizing constant) is a central quantity in Bayesian model selection and model averaging. It is defined as the integral over the parameter space of the likelihood times the prior density. The Bayes factor for model comparison a ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
The integrated likelihood (also called the marginal likelihood or the normalizing constant) is a central quantity in Bayesian model selection and model averaging. It is defined as the integral over the parameter space of the likelihood times the prior density. The Bayes factor for model comparison and Bayesian testing is a ratio of integrated likelihoods, and the model weights in Bayesian model averaging are proportional to the integrated likelihoods. We consider the estimation of the integrated likelihood from posterior simulation output, aiming at a generic method that uses only the likelihoods from the posterior simulation iterations. The key is the harmonic mean identity, which says that the reciprocal of the integrated likelihood is equal to the posterior harmonic mean of the likelihood. The simplest estimator based on the identity is thus the harmonic mean of the likelihoods. While this is an unbiased and simulation-consistent estimator, its reciprocal can have infinite variance and so it is unstable in general. We describe two methods for stabilizing the harmonic mean estimator. In the first one, the parameter space is reduced in such a way that the modified estimator involves a harmonic mean of heavier-tailed densities, thus resulting in a finite variance estimator. The resulting
A Constrained Semi-Supervised Learning Approach to Data Association
- In European Conference for Computer Vision (ECCV
, 2004
"... Data association (obtaining correspondences) is a ubiquitous problem in computer vision. It appears when matching image features across multiple images, matching image features to object recognition models and matching image features to semantic concepts. In this paper, we show how a wide class of d ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
Data association (obtaining correspondences) is a ubiquitous problem in computer vision. It appears when matching image features across multiple images, matching image features to object recognition models and matching image features to semantic concepts. In this paper, we show how a wide class of data association tasks arising in computer vision can be interpreted as a constrained semi-supervised learning problem. This interpretation opens up room for the development of new, more efficient data association methods. In particular, it leads to the formulation of a new principled probabilistic model for constrained semi-supervised learning that accounts for uncertainty in the parameters and missing data. By adopting an ingenious data augmentation strategy, it becomes possible to develop an efficient MCMC algorithm where the high-dimensional variables in the model can be sampled efficiently and directly from their posterior distributions. We demonstrate the new model and algorithm on synthetic data and the complex problem of matching image features to words in the image captions.
Iterated Importance Sampling in Missing Data Problems
, 2005
"... Missing variable models are typical benchmarks for new computational techniques in that the ill-posed nature of missing variable models o#er a challenging testing ground for these techniques. This was the case for the EM algorithm and the Gibbs sampler, and this is also true for importance sampling ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Missing variable models are typical benchmarks for new computational techniques in that the ill-posed nature of missing variable models o#er a challenging testing ground for these techniques. This was the case for the EM algorithm and the Gibbs sampler, and this is also true for importance sampling schemes. A population Monte Carlo scheme taking advantage of the latent structure of the problem is proposed. The potential of this approach and its specifics in missing data problems are illustrated in settings of increasing di#culty, in comparison with existing approaches. The improvement brought by a general Rao--Blackwellisation technique is also discussed.
On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods
- Journal of Computational and Graphical Statistics
, 2010
"... We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and la ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers. For certain classes of Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we find speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider

