Results 1  10
of
64
An Introduction to MCMC for Machine Learning
, 2003
"... This purpose of this introductory paper is threefold. First, it introduces the Monte Carlo method with emphasis on probabilistic machine learning. Second, it reviews the main building blocks of modern Markov chain Monte Carlo simulation, thereby providing and introduction to the remaining papers of ..."
Abstract

Cited by 235 (2 self)
 Add to MetaCart
This purpose of this introductory paper is threefold. First, it introduces the Monte Carlo method with emphasis on probabilistic machine learning. Second, it reviews the main building blocks of modern Markov chain Monte Carlo simulation, thereby providing and introduction to the remaining papers of this special issue. Lastly, it discusses new interesting research horizons.
Slice sampling
 Annals of Statistics
, 2000
"... Abstract. Markov chain sampling methods that automatically adapt to characteristics of the distribution being sampled can be constructed by exploiting the principle that one can sample from a distribution by sampling uniformly from the region under the plot of its density function. A Markov chain th ..."
Abstract

Cited by 159 (5 self)
 Add to MetaCart
Abstract. Markov chain sampling methods that automatically adapt to characteristics of the distribution being sampled can be constructed by exploiting the principle that one can sample from a distribution by sampling uniformly from the region under the plot of its density function. A Markov chain that converges to this uniform distribution can be constructed by alternating uniform sampling in the vertical direction with uniform sampling from the horizontal ‘slice ’ defined by the current vertical position, or more generally, with some update that leaves the uniform distribution over this slice invariant. Variations on such ‘slice sampling ’ methods are easily implemented for univariate distributions, and can be used to sample from a multivariate distribution by updating each variable in turn. This approach is often easier to implement than Gibbs sampling, and more efficient than simple Metropolis updates, due to the ability of slice sampling to adaptively choose the magnitude of changes made. It is therefore attractive for routine and automated use. Slice sampling methods that update all variables simultaneously are also possible. These methods can adaptively choose the magnitudes of changes made to each variable, based on the local properties of the density function. More ambitiously, such methods could potentially allow the sampling to adapt to dependencies between variables by constructing local quadratic approximations. Another approach is to improve sampling efficiency by suppressing random walks. This can be done using ‘overrelaxed ’ versions of univariate slice sampling procedures, or by using ‘reflective ’ multivariate slice sampling methods, which bounce off the edges of the slice.
Bayesian methods for hidden markov models
 Journal of the American Statistical Association
"... ..."
Parameter Expansion for Data Augmentation
 Journal of the American Statistical Association
, 1999
"... Viewing the observed data of a statistical model as incomplete and augmenting its missing parts are useful for clarifying concepts and central to the invention of two wellknown statistical algorithms: expectationmaximization (EM) and data augmentation. Recently, Liu, Rubin, and Wu (1998) demonstra ..."
Abstract

Cited by 65 (2 self)
 Add to MetaCart
Viewing the observed data of a statistical model as incomplete and augmenting its missing parts are useful for clarifying concepts and central to the invention of two wellknown statistical algorithms: expectationmaximization (EM) and data augmentation. Recently, Liu, Rubin, and Wu (1998) demonstrate that expanding the parameter space along with augmenting the missing data is useful for accelerating iterative computation in an EM algorithm. The main purpose of this article is to rigorously define a parameter expanded data augmentation (PXDA) algorithm and to study its theoretical properties. The PXDA is a special way of using auxiliary variables to accelerate Gibbs sampling algorithms and is closely related to reparameterization techniques. Theoretical results concerning the convergence rate of the PXDA algorithm and the choice of prior for the expansion parameter are obtained. In order to understand the role of the expansion parameter, we establish a new theory for iterative condi...
Generalizing swendsenwang to sampling arbitrary posterior probabilities
, 2007
"... Abstract—Many vision tasks can be formulated as graph partition problems that minimize energy functions. For such problems, the Gibbs sampler [9] provides a general solution but is very slow, while other methods, such as Ncut [24] and graph cuts [4], [22], are computationally effective but only work ..."
Abstract

Cited by 53 (13 self)
 Add to MetaCart
Abstract—Many vision tasks can be formulated as graph partition problems that minimize energy functions. For such problems, the Gibbs sampler [9] provides a general solution but is very slow, while other methods, such as Ncut [24] and graph cuts [4], [22], are computationally effective but only work for specific energy forms [17] and are not generally applicable. In this paper, we present a new inference algorithm that generalizes the SwendsenWang method [25] to arbitrary probabilities defined on graph partitions. We begin by computing graph edge weights, based on local image features. Then, the algorithm iterates two steps. 1) Graph clustering: It forms connected components by cutting the edges probabilistically based on their weights. 2) Graph relabeling: It selects one connected component and flips probabilistically, the coloring of all vertices in the component simultaneously. Thus, it realizes the split, merge, and regrouping of a “chunk ” of the graph, in contrast to Gibbs sampler that flips a single vertex.We prove that this algorithm simulates ergodic and reversibleMarkov chain jumps in the space of graph partitions and is applicable to arbitrary posterior probabilities or energy functions defined on graphs. We demonstrate the algorithm on two typical problems in computer vision—image segmentation and stereo vision. Experimentally, we show that it is 100400 times faster in CPU time than the classical Gibbs sampler and 2040 times faster then the DDMCMC segmentation algorithm [27]. For stereo, we compare performance with graph cuts and belief propagation. We also show that our algorithm can automatically infer generativemodels and obtain satisfactory results (better than the graphic cuts or belief propagation) in the same amount of time.
Convergence of slice sampler Markov chains
, 1998
"... In this paper, we analyse theoretical properties of the slice sampler. We find that the algorithm has extremely robust geometric ergodicity properties. For the case of just one auxiliary variable, we demonstrate that the algorithm is stochastic monotone, and deduce analytic bounds on the total varia ..."
Abstract

Cited by 52 (9 self)
 Add to MetaCart
In this paper, we analyse theoretical properties of the slice sampler. We find that the algorithm has extremely robust geometric ergodicity properties. For the case of just one auxiliary variable, we demonstrate that the algorithm is stochastic monotone, and deduce analytic bounds on the total variation distance from stationarity of the method using FosterLyapunov drift condition methodology.
An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants
 Biometrika
, 2006
"... Maximum likelihood parameter estimation and sampling from Bayesian posterior distributions are problematic when the probability density for the parameter of interest involves an intractable normalising constant which is also a function of that parameter. In this paper, an auxiliary variable method i ..."
Abstract

Cited by 51 (2 self)
 Add to MetaCart
Maximum likelihood parameter estimation and sampling from Bayesian posterior distributions are problematic when the probability density for the parameter of interest involves an intractable normalising constant which is also a function of that parameter. In this paper, an auxiliary variable method is presented which requires only that independent samples can be drawn from the unnormalised density at any particular parameter value. The proposal distribution is constructed so that the normalising constant cancels from the Metropolis–Hastings ratio. The method is illustrated by producing posterior samples for parameters of the Ising model given a particular lattice realisation.
Markov Chain Monte Carlo Methods Based on `Slicing' the Density Function
, 1997
"... . One way to sample from a distribution is to sample uniformly from the region under the plot of its density function. A Markov chain that converges to this uniform distribution can be constructed by alternating uniform sampling in the vertical direction with uniform sampling from the horizontal `sl ..."
Abstract

Cited by 47 (0 self)
 Add to MetaCart
. One way to sample from a distribution is to sample uniformly from the region under the plot of its density function. A Markov chain that converges to this uniform distribution can be constructed by alternating uniform sampling in the vertical direction with uniform sampling from the horizontal `slice' defined by the current vertical position. Variations on such `slice sampling' methods can easily be implemented for univariate distributions, and can be used to sample from a multivariate distribution by updating each variable in turn. This approach is often easier to implement than Gibbs sampling, and may be more efficient than easilyconstructed versions of the Metropolis algorithm. Slice sampling is therefore attractive in routine Markov chain Monte Carlo applications, and for use by software that automatically generates a Markov chain sampler from a model specification. One can also easily devise overrelaxed versions of slice sampling, which sometimes greatly improve sampling effici...
EQUIENERGY SAMPLER WITH APPLICATIONS IN STATISTICAL INFERENCE AND STATISTICAL MECHANICS
, 2006
"... We introduce a new sampling algorithm, the equienergy sampler, for efficient statistical sampling and estimation. Complementary to the widely used temperaturedomain methods, the equienergy sampler, utilizing the temperature–energy duality, targets the energy directly. The focus on the energy func ..."
Abstract

Cited by 28 (4 self)
 Add to MetaCart
We introduce a new sampling algorithm, the equienergy sampler, for efficient statistical sampling and estimation. Complementary to the widely used temperaturedomain methods, the equienergy sampler, utilizing the temperature–energy duality, targets the energy directly. The focus on the energy function not only facilitates efficient sampling, but also provides a powerful means for statistical estimation, for example, the calculation of the density of states and microcanonical averages in statistical mechanics. The equienergy sampler is applied to a variety of problems, including exponential regression in statistics, motif sampling in computational biology and protein folding in biophysics.
Fully Bayesian Estimation of Gibbs Hyperparameters for Emission Computed Tomography Data
 IEEE Transactions on Medical Imaging
, 1997
"... In recent years, many investigators have proposed Gibbs prior models to regularize images reconstructed from emission computed tomography data. Unfortunately, hyperparameters used to specify Gibbs priors can greatly influence the degree of regularity imposed by such priors, and as a result, numerous ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
In recent years, many investigators have proposed Gibbs prior models to regularize images reconstructed from emission computed tomography data. Unfortunately, hyperparameters used to specify Gibbs priors can greatly influence the degree of regularity imposed by such priors, and as a result, numerous procedures have been proposed to estimate hyperparameter values from observed image data. Many of these procedures attempt to maximize the joint posterior distribution on the image scene. To implement these methods, approximations to the joint posterior densities are required, because the dependence of the Gibbs partition function on the hyperparameter values is unknown. In this paper, we use recent results in Markov Chain Monte Carlo sampling to estimate the relative values of Gibbs partition functions, and using these values, sample from joint posterior distributions on image scenes. This allows for a fully Bayesian procedure which does not fix the hyperparameters at some estimated or spe...