Results 1  10
of
49
Markov Chain Monte Carlo methods and the label switching problem in Bayesian mixture modelling
 Statistical Science
"... Abstract. In the past ten years there has been a dramatic increase of interest in the Bayesian analysis of finite mixture models. This is primarily because of the emergence of Markov chain Monte Carlo (MCMC) methods. While MCMC provides a convenient way to draw inference from complicated statistical ..."
Abstract

Cited by 51 (4 self)
 Add to MetaCart
Abstract. In the past ten years there has been a dramatic increase of interest in the Bayesian analysis of finite mixture models. This is primarily because of the emergence of Markov chain Monte Carlo (MCMC) methods. While MCMC provides a convenient way to draw inference from complicated statistical models, there are many, perhaps underappreciated, problems associated with the MCMC analysis of mixtures. The problems are mainly caused by the nonidentifiability of the components under symmetric priors, which leads to socalled label switching in the MCMC output. This means that ergodic averages of component specific quantities will be identical and thus useless for inference. We review the solutions to the label switching problem, such as artificial identifiability constraints, relabelling algorithms and label invariant loss functions. We also review various MCMC sampling schemes that have been suggested for mixture models and discuss posterior sensitivity to prior specification.
An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants
 Biometrika
, 2006
"... Maximum likelihood parameter estimation and sampling from Bayesian posterior distributions are problematic when the probability density for the parameter of interest involves an intractable normalising constant which is also a function of that parameter. In this paper, an auxiliary variable method i ..."
Abstract

Cited by 49 (2 self)
 Add to MetaCart
Maximum likelihood parameter estimation and sampling from Bayesian posterior distributions are problematic when the probability density for the parameter of interest involves an intractable normalising constant which is also a function of that parameter. In this paper, an auxiliary variable method is presented which requires only that independent samples can be drawn from the unnormalised density at any particular parameter value. The proposal distribution is constructed so that the normalising constant cancels from the Metropolis–Hastings ratio. The method is illustrated by producing posterior samples for parameters of the Ising model given a particular lattice realisation.
Modelling spatially correlated data via mixtures: a Bayesian approach
 Journal of the Royal Statistical Society, Series B
, 2002
"... This paper develops mixture models for spatially indexed data. We confine attention to the case of finite, typically irregular, patterns of points or regions with prescribed spatial relationships, and to problems where it is only the weights in the mixture that vary from one location to another. Our ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
This paper develops mixture models for spatially indexed data. We confine attention to the case of finite, typically irregular, patterns of points or regions with prescribed spatial relationships, and to problems where it is only the weights in the mixture that vary from one location to another. Our specific focus is on Poisson distributed data, and applications in disease mapping. We work in a Bayesian framework, with the Poisson parameters drawn from gamma priors, and an unknown number of components. We propose two alternative models for spatiallydependent weights, based on transformations of autoregressive gaussian processes: in one (the Logistic normal model), the mixture component labels are exchangeable, in the other (the Grouped continuous model), they are ordered. Reversible jump Markov chain Monte Carlo algorithms for posterior inference are developed. Finally, the performance of both of these formulations is examined on synthetic data and real data on mortality from rare disease.
Stochastic Approximation in Monte Carlo Computation
, 2006
"... The WangLandau algorithm is an adaptive Markov chain Monte Carlo algorithm to calculate the spectral density for a physical system. A remarkable feature of the algorithm is that it is not trapped by local energy minima, which is very important for systems with rugged energy landscapes. This feature ..."
Abstract

Cited by 23 (13 self)
 Add to MetaCart
The WangLandau algorithm is an adaptive Markov chain Monte Carlo algorithm to calculate the spectral density for a physical system. A remarkable feature of the algorithm is that it is not trapped by local energy minima, which is very important for systems with rugged energy landscapes. This feature has led to many successful applications of the algorithm in statistical physics and biophysics. However, there does not exist rigorous theory to support its convergence, and the estimates produced by the algorithm can only reach a limited statistical accuracy. In this paper, we propose the stochastic approximation Monte Carlo (SAMC) algorithm, which overcomes the shortcomings of the WangLandau algorithm. We establish a theorem concerning its convergence. The estimates produced by SAMC can be improved continuously as the simulation goes on. SAMC also extends applications of the WangLandau algorithm to continuum systems. The potential uses of SAMC in statistics are discussed through two classes of applications, importance sampling and model selection. The results show that SAMC can work as a general importance
Computational Methods for Multiplicative Intensity Models using Weighted Gamma . . .
 PROCESSES: PROPORTIONAL HAZARDS, MARKED POINT PROCESSES AND PANEL COUNT DATA
, 2004
"... We develop computational procedures for a class of Bayesian nonparametric and semiparametric multiplicative intensity models incorporating kernel mixtures of spatial weighted gamma measures. A key feature of our approach is that explicit expressions for posterior distributions of these models share ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
We develop computational procedures for a class of Bayesian nonparametric and semiparametric multiplicative intensity models incorporating kernel mixtures of spatial weighted gamma measures. A key feature of our approach is that explicit expressions for posterior distributions of these models share many common structural features with the posterior distributions of Bayesian hierarchical models using the Dirichlet process. Using this fact, along with an approximation for the weighted gamma process, we show that with some care, one can adapt efficient algorithms used for the Dirichlet process to this setting. We discuss blocked Gibbs sampling procedures and Pólya urn Gibbs samplers. We illustrate our methods with applications to proportional hazard models, Poisson spatial regression models, recurrent events, and panel count data.
Bayesian random fields: The BetheLaplace approximation
 In ICML
, 2006
"... While learning the maximum likelihood value of parameters of an undirected graphical model is hard, modelling the posterior distribution over parameters given data is harder. Yet, undirected models are ubiquitous in computer vision and text modelling (e.g. conditional random fields). But where Bayes ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
While learning the maximum likelihood value of parameters of an undirected graphical model is hard, modelling the posterior distribution over parameters given data is harder. Yet, undirected models are ubiquitous in computer vision and text modelling (e.g. conditional random fields). But where Bayesian approaches for directed models have been very successful, a proper Bayesian treatment of undirected models in still in its infant stages. We propose a new method for approximating the posterior of the parameters given data based on the Laplace approximation. This approximation requires the computation of the covariance matrix over features which we compute using the linear response approximation based in turn on loopy belief propagation. We develop the theory for conditional and “unconditional ” random fields with or without hidden variables. In the conditional setting we introduce a new variant of bagging suitable for structured domains. Here we run the loopy maxproduct algorithm on a “supergraph ” composed of graphs for individual models sampled from the posterior and connected by constraints. Experiments on real world data validate the proposed methods. 1
Variational Approximations in Bayesian Model Selection for Finite Mixture Distributions
 COMPUTATIONAL STATISTICS AND DATA ANALYSIS
, 2006
"... Variational methods for model comparison have become popular in the neural computing/machine learning literature. In this paper we explore their application to the Bayesian analysis of mixtures of Gaussians. We also consider how the Deviance Information Criterion, or DIC, devised by Spiegelhalter e ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
Variational methods for model comparison have become popular in the neural computing/machine learning literature. In this paper we explore their application to the Bayesian analysis of mixtures of Gaussians. We also consider how the Deviance Information Criterion, or DIC, devised by Spiegelhalter et al. (2002), can be extended to these types of model by exploiting the use of variational approximations. We illustrate the results of using variational methods for model selection and the calculation of a DIC using real and simulated data. Using the variational approximation, one can simultaneously estimate component parameters and the model complexity. It turns out that, if one starts o# with a large number of components, superfluous components are eliminated as the method converges to a solution, thereby leading to an automatic choice of model complexity, the appropriateness of which is reflected in the DIC values.
Interpretation and inference in mixture models: Simple MCMC works
 Journal of Econometrics
, 2007
"... The mixture model likelihood function is invariant with respect to permutation of the components of the mixture. If functions of interest are permutation sensitive, as in classification applications, then interpretation of the likelihood function requires valid inequality constraints and a very larg ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
The mixture model likelihood function is invariant with respect to permutation of the components of the mixture. If functions of interest are permutation sensitive, as in classification applications, then interpretation of the likelihood function requires valid inequality constraints and a very large sample may be required to resolve ambiguities. If functions of interest are permutation invariant, as in prediction applications, then there are no such problems of interpretation. Contrary to assessments in some recent publications, simple and widely used Markov chain Monte Carlo (MCMC) algorithms with data augmentation reliably recover the entire posterior distribution. 1 1
Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS
, 2005
"... This paper presents a new sampling algorithm for approximating functions of variables representable as undirected graphical models of arbitrary connectivity with pairwise potentials, as well as for estimating the notoriously difficult partition function of the graph. The algorithm fits into the ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
This paper presents a new sampling algorithm for approximating functions of variables representable as undirected graphical models of arbitrary connectivity with pairwise potentials, as well as for estimating the notoriously difficult partition function of the graph. The algorithm fits into the framework of sequential Monte Carlo methods rather than the more widely used MCMC, and relies on constructing a sequence of intermediate distributions which get closer to the desired one. While the idea of using "tempered" proposals is known, we construct a novel sequence of target distributions where, rather than dropping a global temperature parameter, we sequentially couple individual pairs of variables that are, initially, sampled exactly from a spanning tree of the variables. We present
Hybrid Dirichlet mixture models for functional data
"... Summary. In functional data analysis, curves or surfaces are observed, up to measurement error, at a finite set of locations, for, say, a sample of n individuals. Often, the curves are homogeneous, except perhaps for individualspecific regions that provide heterogeneous behaviour (e.g. ‘damaged ’ a ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Summary. In functional data analysis, curves or surfaces are observed, up to measurement error, at a finite set of locations, for, say, a sample of n individuals. Often, the curves are homogeneous, except perhaps for individualspecific regions that provide heterogeneous behaviour (e.g. ‘damaged ’ areas of irregular shape on an otherwise smooth surface). Motivated by applications with functional data of this nature, we propose a Bayesian mixture model, with the aim of dimension reduction, by representing the sample of n curves through a smaller set of canonical curves. We propose a novel prior on the space of probability measures for a random curve which extends the popular Dirichlet priors by allowing local clustering: nonhomogeneous portions of a curve can be allocated to different clusters and the n individual curves can be represented as recombinations (hybrids) of a few canonical curves. More precisely, the prior proposed envisions a conceptual hidden factor with klevels that acts locally on each curve. We discuss several models incorporating this prior and illustrate its performance with simulated and real data sets. We examine theoretical properties of the proposed finite hybrid Dirichlet mixtures, specifically, their behaviour as the number of the mixture components goes to 1 and their connection with Dirichlet process mixtures.