Results 1  10
of
233
Sequential Monte Carlo Samplers
, 2002
"... In this paper, we propose a general algorithm to sample sequentially from a sequence of probability distributions known up to a normalizing constant and defined on a common space. A sequence of increasingly large artificial joint distributions is built; each of these distributions admits a marginal ..."
Abstract

Cited by 303 (44 self)
 Add to MetaCart
In this paper, we propose a general algorithm to sample sequentially from a sequence of probability distributions known up to a normalizing constant and defined on a common space. A sequence of increasingly large artificial joint distributions is built; each of these distributions admits a marginal which is a distribution of interest. To sample from these distributions, we use sequential Monte Carlo methods. We show that these methods can be interpreted as interacting particle approximations of a nonlinear FeynmanKac flow in distribution space. One interpretation of the FeynmanKac flow corresponds to a nonlinear Markov kernel admitting a specified invariant distribution and is a natural nonlinear extension of the standard MetropolisHastings algorithm. Many theoretical results have already been established for such flows and their particle approximations. We demonstrate the use of these algorithms through simulation.
Annealed importance sampling
 In Statistics and Computing
, 2001
"... Abstract. Simulated annealing — moving from a tractable distribution to a distribution of interest via a sequence of intermediate distributions — has traditionally been used as an inexact method of handling isolated modes in Markov chain samplers. Here, it is shown how one can use the Markov chain t ..."
Abstract

Cited by 262 (5 self)
 Add to MetaCart
(Show Context)
Abstract. Simulated annealing — moving from a tractable distribution to a distribution of interest via a sequence of intermediate distributions — has traditionally been used as an inexact method of handling isolated modes in Markov chain samplers. Here, it is shown how one can use the Markov chain transitions for such an annealing sequence to define an importance sampler. The Markov chain aspect allows this method to perform acceptably even for highdimensional problems, where finding good importance sampling distributions would otherwise be very difficult, while the use of importance weights ensures that the estimates found converge to the correct values as the number of annealing runs increases. This annealed importance sampling procedure resembles the second half of the previouslystudied tempered transitions, and can be seen as a generalization of a recentlyproposed variant of sequential importance sampling. It is also related to thermodynamic integration methods for estimating ratios of normalizing constants. Annealed importance sampling is most attractive when isolated modes are present, or when estimates of normalizing constants are required, but it may also be more generally useful, since its independent sampling allows one to bypass some of the problems of assessing convergence and autocorrelation in Markov chain samplers. 1
A SplitMerge Markov Chain Monte Carlo Procedure for the Dirichlet Process Mixture Model
 Journal of Computational and Graphical Statistics
, 2000
"... . We propose a splitmerge Markov chain algorithm to address the problem of inefficient sampling for conjugate Dirichlet process mixture models. Traditional Markov chain Monte Carlo methods for Bayesian mixture models, such as Gibbs sampling, can become trapped in isolated modes corresponding to an ..."
Abstract

Cited by 150 (0 self)
 Add to MetaCart
(Show Context)
. We propose a splitmerge Markov chain algorithm to address the problem of inefficient sampling for conjugate Dirichlet process mixture models. Traditional Markov chain Monte Carlo methods for Bayesian mixture models, such as Gibbs sampling, can become trapped in isolated modes corresponding to an inappropriate clustering of data points. This article describes a MetropolisHastings procedure that can escape such local modes by splitting or merging mixture components. Our MetropolisHastings algorithm employs a new technique in which an appropriate proposal for splitting or merging components is obtained by using a restricted Gibbs sampling scan. We demonstrate empirically that our method outperforms the Gibbs sampler in situations where two or more components are similar in structure. Key words: Dirichlet process mixture model, Markov chain Monte Carlo, MetropolisHastings algorithm, Gibbs sampler, splitmerge updates 1 Introduction Mixture models are often applied to density estim...
H: Computing Bayes factors using thermodynamic integration
 Syst Biol
"... Abstract.—In the Bayesian paradigm, a common method for comparing two models is to compute the Bayes factor, defined as the ratio of their respective marginal likelihoods. In recent phylogenetic works, the numerical evaluation of marginal likelihoods has often been performed using the harmonic mean ..."
Abstract

Cited by 112 (7 self)
 Add to MetaCart
(Show Context)
Abstract.—In the Bayesian paradigm, a common method for comparing two models is to compute the Bayes factor, defined as the ratio of their respective marginal likelihoods. In recent phylogenetic works, the numerical evaluation of marginal likelihoods has often been performed using the harmonic mean estimation procedure. In the present article, we propose to employ another method, based on an analogy with statistical physics, called thermodynamic integration. We describe the method, propose an implementation, and show on two analytical examples that this numerical method yields reliable estimates. In contrast, the harmonic mean estimator leads to a strong overestimation of the marginal likelihood, which is all the more pronounced as the model is higher dimensional. As a result, the harmonic mean estimator systematically favors more parameterrich models, an artefact that might explain some recent puzzling observations, based on harmonic mean estimates, suggesting that Bayes factors tend to overscore complex models. Finally, we apply our method to the comparison of several alternative models of aminoacid replacement. We confirm our previous observations, indicating that modeling pattern heterogeneity across sites tends to yield better models than standard empirical matrices. [Bayes factor; harmonic mean; mixture model; path sampling; phylogeny; thermodynamic integration.] Bayesian methods have become popular in molecular phylogenetics over the recent years. The simple and intuitive interpretation of the concept of probabilities
Hidden Markov models and disease mapping
 Journal of the American Statistical Association
, 2001
"... We present new methodology to extend Hidden Markov models to the spatial domain, and use this class of models to analyse spatial heterogeneity of count data on a rare phenomenon. This situation occurs commonly in many domains of application, particularly in disease mapping. We assume that the counts ..."
Abstract

Cited by 93 (7 self)
 Add to MetaCart
We present new methodology to extend Hidden Markov models to the spatial domain, and use this class of models to analyse spatial heterogeneity of count data on a rare phenomenon. This situation occurs commonly in many domains of application, particularly in disease mapping. We assume that the counts follow a Poisson model at the lowest level of the hierarchy, and introduce a finite mixture model for the Poisson rates at the next level. The novelty lies in the model for allocation to the mixture components, which follows a spatially correlated process, the Potts model, and in treating the number of components of the spatial mixture as unknown. Inference is performed in a Bayesian framework using reversible jump MCMC. The model introduced can be viewed as a Bayesian semiparametric approach to specifying exible spatial distribution in hierarchical models. Performance of the model and comparison with an alternative wellknown Markov random field specification for the Poisson rates are demonstrated on synthetic data sets. We show that our allocation model avoids the problem of oversmoothing in cases where the underlying rates exhibit discontinuities, while giving equally good results in cases of smooth gradientlike or highly autocorrelated rates. The methodology is illustrated on an epidemiological application to data on a rare cancer in France.
An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants
 Biometrika
, 2006
"... Maximum likelihood parameter estimation and sampling from Bayesian posterior distributions are problematic when the probability density for the parameter of interest involves an intractable normalising constant which is also a function of that parameter. In this paper, an auxiliary variable method i ..."
Abstract

Cited by 89 (4 self)
 Add to MetaCart
(Show Context)
Maximum likelihood parameter estimation and sampling from Bayesian posterior distributions are problematic when the probability density for the parameter of interest involves an intractable normalising constant which is also a function of that parameter. In this paper, an auxiliary variable method is presented which requires only that independent samples can be drawn from the unnormalised density at any particular parameter value. The proposal distribution is constructed so that the normalising constant cancels from the Metropolis–Hastings ratio. The method is illustrated by producing posterior samples for parameters of the Ising model given a particular lattice realisation.
Auxiliary Variable Methods for Markov Chain Monte Carlo with Applications
 Journal of the American Statistical Association
, 1997
"... Suppose one wishes to sample from the density ß(x) using Markov chain Monte Carlo (MCMC). An auxiliary variable u and its conditional distribution ß(ujx) can be defined, giving the joint distribution ß(x; u) = ß(x)ß(ujx). A MCMC scheme which samples over this joint distribution can lead to substanti ..."
Abstract

Cited by 84 (1 self)
 Add to MetaCart
(Show Context)
Suppose one wishes to sample from the density ß(x) using Markov chain Monte Carlo (MCMC). An auxiliary variable u and its conditional distribution ß(ujx) can be defined, giving the joint distribution ß(x; u) = ß(x)ß(ujx). A MCMC scheme which samples over this joint distribution can lead to substantial gains in efficiency compared to standard approaches. The revolutionary algorithm of Swendsen and Wang (1987) is one such example. In addition to reviewing the SwendsenWang algorithm and its generalizations, this paper introduces a new auxiliary variable method called partial decoupling. Two applications in Bayesian image analysis are considered. The first is a binary classification problem in which partial decoupling out performs SW and single site Metropolis. The second is a PET reconstruction which uses the gray level prior of Geman and McClure (1987). A generalized SwendsenWang algorithm is developed for this problem, which reduces the computing time to the point that MCMC is a viabl...
Inference in Curved Exponential Family Models for Networks
 Journal of Computational and Graphical Statistics
, 2006
"... Network data arise in a wide variety of applications. Although descriptive statistics for networks abound in the literature, the science of fitting statistical models to complex network data is still in its infancy. The models considered in this article are based on exponential families; therefore, ..."
Abstract

Cited by 80 (11 self)
 Add to MetaCart
(Show Context)
Network data arise in a wide variety of applications. Although descriptive statistics for networks abound in the literature, the science of fitting statistical models to complex network data is still in its infancy. The models considered in this article are based on exponential families; therefore, we refer to them as exponential random graph models (ERGMs). Although ERGMs are easy to postulate, maximum likelihood estimation of parameters in these models is very difficult. In this article, we first review the method of maximum likelihood estimation using Markov chain Monte Carlo in the context of fitting linear ERGMs. We then extend this methodology to the situation where the model comes from a curved exponential family. The curved exponential family methodology is applied to new specifications of ERGMs, proposed by Snijders et al. (2004), having nonlinear parameters to represent structural properties of networks such as transitivity and heterogeneity of degrees. We review the difficult topic of implementing likelihood ratio tests for these models, then apply all these modelfitting and testing techniques to the estimation of linear and nonlinear parameters for a collaboration network between partners in a New England law firm.
Assessing approximate inference for binary Gaussian process classification
 Journal of Machine Learning Research
, 2005
"... Gaussian process priors can be used to define flexible, probabilistic classification models. Unfortunately exact Bayesian inference is analytically intractable and various approximation techniques have been proposed. In this work we review and compare Laplace’s method and Expectation Propagation for ..."
Abstract

Cited by 76 (3 self)
 Add to MetaCart
(Show Context)
Gaussian process priors can be used to define flexible, probabilistic classification models. Unfortunately exact Bayesian inference is analytically intractable and various approximation techniques have been proposed. In this work we review and compare Laplace’s method and Expectation Propagation for approximate Bayesian inference in the binary Gaussian process classification model. We present a comprehensive comparison of the approximations, their predictive performance and marginal likelihood estimates to results obtained by MCMC sampling. We explain theoretically and corroborate empirically the advantages of Expectation Propagation compared to Laplace’s method. Keywords: Gaussian process priors, probabilistic classification, Laplace’s approximation, expectation propagation, marginal likelihood, evidence, MCMC
Distance dependent Chinese restaurant processes
"... We develop the distance dependent Chinese restaurant process (CRP), a flexible class of distributions over partitions that allows for nonexchangeability. This class can be used to model dependencies between data in infinite clustering models, including dependencies across time or space. We examine t ..."
Abstract

Cited by 56 (5 self)
 Add to MetaCart
(Show Context)
We develop the distance dependent Chinese restaurant process (CRP), a flexible class of distributions over partitions that allows for nonexchangeability. This class can be used to model dependencies between data in infinite clustering models, including dependencies across time or space. We examine the properties of the distance dependent CRP, discuss its connections to Bayesian nonparametric mixture models, and derive a Gibbs sampler for both observed and mixture settings. We study its performance with timedependent models and three text corpora. We show that relaxing the assumption of exchangeability with distance dependent CRPs can provide a better fit to sequential data. We also show its alternative formulation of the traditional CRP leads to a fastermixing Gibbs sampling algorithm than the one based on the original formulation. 1.