## MCMC and the label switching problem in Bayesian mixture models (2005)

Venue: | Statistical Science |

Citations: | 1 - 0 self |

### BibTeX

@ARTICLE{Jasra05mcmcand,

author = {A. Jasra and C. C. Holmes and D. A. Stephens},

title = {MCMC and the label switching problem in Bayesian mixture models},

journal = {Statistical Science},

year = {2005},

volume = {20},

pages = {50--67}

}

### OpenURL

### Abstract

Abstract. In the past ten years there has been a dramatic increase of interest in the Bayesian analysis of finite mixture models. This is primarily because of the emergence of Markov chain Monte Carlo (MCMC) methods. Whilst MCMC provides a convenient way to draw inference from complicated statistical models, there are many, perhaps under appreciated, problems associated with the MCMC analysis of mixtures. The problems are mainly caused by the nonidentifiability of the components under symmetric priors, which leads to so called label switching in the MCMC output. This will mean that ergodic averages of component specific quantities will be identical and thus useless for inference. We review the solutions to the label switching problem, such as artificial identifiability constraints (e.g. Diebolt & Robert (1994)), relabelling algorithms (Stephens 1997a) and label invariant loss functions (Celeux, Hurn & Robert 2000). We also review various MCMC sampling schemes that have been suggested for mixture models and discuss posterior sensitivity to prior specification.

### Citations

8919 | Maximum likelihood from incomplete data via the EM algorithm - Dempster, Laird, et al. - 1977 |

1074 | Finite Mixture Models - McLachlan, Peel - 2000 |

1007 | Monte Carlo Statistical Methods - Robert, Casella - 1999 |

921 | Reversible jump Markov chain Monte Carlo computation and Bayesian model determination
- Green
- 1995
(Show Context)
Citation Context ...n this subsection we discuss variable dimension samplers. Following Richardson & Green (1997) the standard way to simulate from a mixture with an unknown number of components is reversible jump MCMC (=-=Green 1995-=-) (for an up-to-date review see Green (2003)). Reversible jump is simply an extension of the Metropolis-Hastings method, with the measure theoretic construction necessary because of the lack of common... |

837 | A maximization technique occuring in the statistical analysis of probabilistic functions of markov chains - Baum, Petrie, et al. - 1972 |

698 | Statistical Analysis of Finite Mixture Distributions - Titterington, Smith, et al. - 1985 |

445 | Bayesian density estimation and inference using mixtures - Escobar, West - 1995 |

391 | Statistical Inference for Probabilistic Functions of Finite State Markov Chains - Baum, Petrie - 1966 |

192 | Monte Carlo Strategies - Liu - 2001 |

163 | Estimation of finite mixture distributions through Bayesian sampling - Diebolt, CP - 1994 |

123 | Dealing with label-switching in mixture models - Stephens - 2000 |

116 | Computational and inferential difficulties with mixture posterior distributions
- Celeux
- 2000
(Show Context)
Citation Context ... the Gibbs sampler cannot always visit the k! symmetric modes of a posterior mixture distribution easily. We note that: From a statistical viewpoint, exploration of the k! modal regions is redundant (=-=Celeux et al. 2000-=-).sMCMC and the Label Switching Problem in Bayesian Mixture Modelling 11 Indeed if we wish to explore all of the k! symmetric modes we could randomly permute the output from the sampler. That is, a si... |

76 |
A statistical paradox
- Lindley
- 1957
(Show Context)
Citation Context ...mixtures (see Gruet, Philippe & Robert (1999) for an example of improper priors in the mixture context). A problem with the above prior, when k is unknown, arises due to the Lindley-Bartlett paradox (=-=Lindley 1957-=-, Bartlett 1957). Jennison (1997) noted that, in the limit as κ → 0 and β → ∞, the posterior distribution for k favours models with fewer components. We illustrate this phenomenon in section 7. Quanti... |

75 | Sampling from Multimodal Distributions Using Tempered Transitions
- Neal
- 1996
(Show Context)
Citation Context ...modal distributions. We emphasise that we can simulate from a mixture posterior using Metropolis-Hastings updates without completion (simulation of the missing class labels), and that tempering MCMC (=-=Neal 1996-=-) may be used. We also consider reparameterisations, as discussed by Celeux et al. (2000), and variable dimension samplers. Next, we examine the existing solutions to the label switching problem. We b... |

69 | Bayesian analysis of mixture models with an unknown number of components – an alternative to reversible jump methods
- Stephens
- 2000
(Show Context)
Citation Context ...dentical and thus useless for inference. We review the solutions to the label switching problem, such as artificial identifiability constraints (e.g. Diebolt & Robert (1994)), relabelling algorithms (=-=Stephens 1997-=-a) and label invariant loss functions (Celeux, Hurn & Robert 2000). We also review various MCMC sampling schemes that have been suggested for mixture models and discuss posterior sensitivity to prior ... |

65 | Density estimation with confidence sets exemplified by superclusters and voids in the galaxies - Roeder - 1990 |

61 | The utilization of multiple measurements in problems of biological classification - Rao - 1948 |

60 | Hidden Markov models and disease mapping
- Green, Richardson
- 2002
(Show Context)
Citation Context ... as biological sequence analysis (Boys & Henderson 2004), econometrics (Frühwirth-Schnatter 2001), (Hurn, Justel & Robert 2003), machine learning (Beal, Ghahramani & Rasmussen 2002) and epidemiology (=-=Green & Richardson 2002-=-). One of the main challenges of a Bayesian analysis using mixtures is the nonidentifiability of the components. That is, if exchangeable priors are placed upon the parameters of a mixture model, then... |

55 | Bayesian Inference in Hidden Markov Models through the Reversible Jump Markov Chain Monte Carlo Method - Robert, Rydén, et al. - 2000 |

54 |
Bayesian methods for mixtures of normal distributions, DPhil Thesis
- Stephens
- 1997
(Show Context)
Citation Context ...dentical and thus useless for inference. We review the solutions to the label switching problem, such as artificial identifiability constraints (e.g. Diebolt & Robert (1994)), relabelling algorithms (=-=Stephens 1997-=-a) and label invariant loss functions (Celeux, Hurn & Robert 2000). We also review various MCMC sampling schemes that have been suggested for mixture models and discuss posterior sensitivity to prior ... |

49 | Exact and efficient bayesian inference for multiple changepoint problems - Fearnhead - 2006 |

46 |
W.: Real-Parameter Evolutionary Monte Carlo With Applications to Bayesian Mixture Models
- Liang, Wong
(Show Context)
Citation Context .... We will now introduce this method and apply it to the simulated data set of the previous section. We note that more advanced methods exist, for example Population or Evolutionary Monte Carlo (EMC) (=-=Liang & Wong 2001-=-). We shall not review these methods here, other than to note that population based MCMC works by embedding the target distribution of interest into a sequence of related distributions and sampling fr... |

46 | C.P.: Bayesian modelling and inference on mixtures of distributions. Bayesian Thinking - Marin, Mengersen, et al. - 2005 |

41 |
Markov Chain Monte Carlo Estimation of Classical and Dynamic Switching and Mixture Models
- Frühwirth-Schnatter
- 2001
(Show Context)
Citation Context ...o the above developments, implementation of Bayesian mixtures has become increasingly popular in many academic disciplines, such as biological sequence analysis (Boys & Henderson 2004), econometrics (=-=Frühwirth-Schnatter 2001-=-), (Hurn, Justel & Robert 2003), machine learning (Beal, Ghahramani & Rasmussen 2002) and epidemiology (Green & Richardson 2002). One of the main challenges of a Bayesian analysis using mixtures is th... |

37 | Estimating mixtures of regressions - Hurn, Justel, et al. |

35 | Testing for mixtures: a Bayesian entropic approach". Bayesían Statístícs 5 - Mengersen, Robert - 1994 |

29 | A Generalized Theory of the Combination of Observations so as to Obtain the Best Result - Newcomb - 1886 |

23 |
Reversible jump, birth-and-death and more general continuous time markov chain monte carlo samplers
- Cappé, Robert, et al.
(Show Context)
Citation Context ...The main differences appear that, firstly, the continuous time sampler can visit unlikely regions in the support of the posterior, thus yielding a sort of springboard between different modal regions (=-=Cappé et al. 2003-=-). Secondly that for the continuous time sampler there is a ‘free’ Rao-Blackwellisation to reduce the variance of the MC estimates of integrals. We have found that, in practice, these latter differenc... |

22 |
A bayesian approach to DNA sequence segmentation
- Boys, Henderson
(Show Context)
Citation Context ...s (2000a) (Birth-andDeath MCMC). Due to the above developments, implementation of Bayesian mixtures has become increasingly popular in many academic disciplines, such as biological sequence analysis (=-=Boys & Henderson 2004-=-), econometrics (Frühwirth-Schnatter 2001), (Hurn, Justel & Robert 2003), machine learning (Beal, Ghahramani & Rasmussen 2002) and epidemiology (Green & Richardson 2002). One of the main challenges of... |

19 | A Bayesian analysis of simple mixture problems - Bernardo, Giron - 1988 |

19 | Bayesian inference for mixture: The label switching problem - Celeux - 1998 |

13 | Reversible jump MCMC converging to birth-and-death MCMC and more general continuous time samplers - Cappé, Robert, et al. - 2001 |

10 | On Bayesian Analysis of Mixture Models with Unknown Number of - Richardson, Green - 1997 |

8 | Likelihood and Bayesian analysis of mixtures - Aitkin - 2001 |

8 | Trans-dimensional Markov chain Monte - Green - 2003 |

7 | MCMC control spreadsheets for exponential mixture estimation - Gruet, Philippe, et al. - 1999 |

6 | Perfect samplers for mixtures of distributions - CASELLA, MENGERSEN, et al. - 2002 |

5 | A comment on D
- Bartlett
- 1957
(Show Context)
Citation Context ...Gruet, Philippe & Robert (1999) for an example of improper priors in the mixture context). A problem with the above prior, when k is unknown, arises due to the Lindley-Bartlett paradox (Lindley 1957, =-=Bartlett 1957-=-). Jennison (1997) noted that, in the limit as κ → 0 and β → ∞, the posterior distribution for k favours models with fewer components. We illustrate this phenomenon in section 7. Quantities in which w... |

5 | Penalized maximum likelihood estimator for normal mixtures - Ciuperca, Ridolfi, et al. - 2003 |

4 | Probes of large scale structures in the Corona Borealis region - Postman, Huchra, et al. - 1986 |

3 | Discussion of “On Bayesian analysis of mixtures with an unknown number of components,” by - JENNISON - 1997 |

2 | Data augmentation and marginal updating schemes for inference in hidden markov models - Boys, Henderson - 2003 |

2 | A comparative study of perinatal mortality using a two-component mixture model - DELLAPORTAS, STEPHENS, et al. - 1996 |

1 | the Label Switching Problem in Bayesian Mixture Modelling 35 - MCMC - 2002 |

1 | Discussion of ‘On Bayesian analysis of mixture models with an unknown number of components - Celeux - 1997 |

1 | On population based reversible jump Markov chain Monte Carlo - Jasra, Holmes, et al. - 2004 |

1 | Discussion of ‘On Bayesian analysis of mixture models with an unknown number of components - Robert - 1997 |

1 | Discussion of ‘On Bayesian analysis of mixture models with an unknown number of components - West - 1997 |