Results 1 
6 of
6
Probabilistic Inference Using Markov Chain Monte Carlo Methods
, 1993
"... Probabilistic inference is an attractive approach to uncertain reasoning and empirical learning in artificial intelligence. Computational difficulties arise, however, because probabilistic models with the necessary realism and flexibility lead to complex distributions over highdimensional spaces. R ..."
Abstract

Cited by 562 (20 self)
 Add to MetaCart
Probabilistic inference is an attractive approach to uncertain reasoning and empirical learning in artificial intelligence. Computational difficulties arise, however, because probabilistic models with the necessary realism and flexibility lead to complex distributions over highdimensional spaces. Related problems in other fields have been tackled using Monte Carlo methods based on sampling using Markov chains, providing a rich array of techniques that can be applied to problems in artificial intelligence. The "Metropolis algorithm" has been used to solve difficult problems in statistical physics for over forty years, and, in the last few years, the related method of "Gibbs sampling" has been applied to problems of statistical inference. Concurrently, an alternative method for solving problems in statistical physics by means of dynamical simulation has been developed as well, and has recently been unified with the Metropolis algorithm to produce the "hybrid Monte Carlo" method. In computer science, Markov chain sampling is the basis of the heuristic optimization technique of "simulated annealing", and has recently been used in randomized algorithms for approximate counting of large sets. In this review, I outline the role of probabilistic inference in artificial intelligence, present the theory of Markov chains, and describe various Markov chain Monte Carlo algorithms, along with a number of supporting techniques. I try to present a comprehensive picture of the range of methods that have been developed, including techniques from the varied literature that have not yet seen wide application in artificial intelligence, but which appear relevant. As illustrative examples, I use the problems of probabilistic inference in expert systems, discovery of latent classes from data, and Bayesian learning for neural networks.
SubregionAdaptive Integration of Functions Having a Dominant Peak
 JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
, 1993
"... Many statistical multiple integration problems involve integrands that have a dominant peak. In applying numerical methods to solve these problems, statisticians have paid relatively little attention to existing quadrature methods and available software developed in the numerical analysis literature ..."
Abstract

Cited by 20 (5 self)
 Add to MetaCart
Many statistical multiple integration problems involve integrands that have a dominant peak. In applying numerical methods to solve these problems, statisticians have paid relatively little attention to existing quadrature methods and available software developed in the numerical analysis literature. One reason these methods have been largely overlooked, even though they are known to be more efficient than Monte Carlo for wellbehaved problems of low dimensionality, may be that when applied naively they are poorly suited for peakedintegrand problems. In this paper we use transformations based on "splitt" distributions to allow the integrals to be efficiently computed using a subregionadaptive numerical integration algorithm. Our splitt distributions are modifications of those suggested by Geweke (1989) and may also be used to define Monte Carlo importance functions. We then compare our approach to Monte Carlo. In the several examples we examine here, we find subregionadaptive inte...
SphericalRadial Integration Rules for Bayesian Computation
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 1997
"... The common numerical problem in Bayesian analysis is the numerical integration of the posterior. In high dimensions, this problem becomes too formidable for fixed quadrature methods, and Monte Carlo integration is the usual approach. Through the use of modal standardization and a sphericalradial tr ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
The common numerical problem in Bayesian analysis is the numerical integration of the posterior. In high dimensions, this problem becomes too formidable for fixed quadrature methods, and Monte Carlo integration is the usual approach. Through the use of modal standardization and a sphericalradial transformation, we reparameterize in terms of a radius r and point z on the surface of the sphere in d dimensions. We propose two types of methods for sphericalradial integration. A completely randomized method uses randomly placed abscissas for the radial integration and for the sphere surface. A mixed method uses fixed quadrature (Simpson's rule) on the radius and randomized spherical integration. The mixed methods show superior accuracy in comparisons, require little or no assumptions, and provide diagnostics to detect difficult problems. Moreover, if the posterior is close to the multivariate normal, the mixed methods can give remarkable accuracy.
Alternatives to the Gibbs Sampling Scheme
, 1992
"... A variation of the Gibbs sampling scheme is defined by driving the simulated Markov chain by the conditional distributions of an approximation to the posterior rather than the posterior distribution itself. Choosing a multivariate normal mixture form for the approximation enables reparametrization w ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
A variation of the Gibbs sampling scheme is defined by driving the simulated Markov chain by the conditional distributions of an approximation to the posterior rather than the posterior distribution itself. Choosing a multivariate normal mixture form for the approximation enables reparametrization which is crucial to improve convergence in the Gibbs sampler. Using an approximation to the posterior density also opens the possiblity to include a learning process about the  in the operational sense of evaluating posterior integrals  unknown posterior density in the algorithm. While ideally this should be done using available pointwise evaluations of the posterior density, this is too difficult in a general framework and we use instead the currently available Monte Carlo sample to adjust the approximating density. This is done using a simple multivariate implementation of the mixture of Dirichlet density estimation algorithm. Keywords: Markov chain Monte Carlo, Bayesian sampling, stocha...
Variance Reduction
"... elation among x i1 ,K , x iM in such a way that s *2 < M 1 var p g x ij ( ) [ ] . If in addition the cost of generating the M tuple is insignificantly greater than the cost of generating M independent variables from p x ( ) , then I N M provides a computationally more efficient approximatio ..."
Abstract
 Add to MetaCart
elation among x i1 ,K , x iM in such a way that s *2 < M 1 var p g x ij ( ) [ ] . If in addition the cost of generating the M tuple is insignificantly greater than the cost of generating M independent variables from p x ( ) , then I N M provides a computationally more efficient approximation of I than does I N . There are numerous variants on this technique. This section takes up four that account for most use of the method: antithetic variables, systematic sampling, conditional expectations, and control variables. The scope for combining these variance reduction techniques with the methods of Section 4 or Section 6 is enormous. Rather than list all the pos
. Variance reduction
"... .46> ,K , x iM in such a way that s *2 < M 1 var p g x ij () [] . If in addition the cost of generating the M tuple is insignificantly greater than the cost of generating M independent variables from p x () , then I N M provides a computationally more efficient approximation of I than does ..."
Abstract
 Add to MetaCart
.46> ,K , x iM in such a way that s *2 < M 1 var p g x ij () [] . If in addition the cost of generating the M tuple is insignificantly greater than the cost of generating M independent variables from p x () , then I N M provides a computationally more efficient approximation of I than does I N . There are numerous variants on this technique. This section takes up four that account for most use of the method: antithetic variables, systematic sampling, conditional expectations, and control variables. The scope for combining these variance reduction techniques with the methods of Section 4 or Section 6 is enormous. Rather than list all the possibilities, the purpose here is to provide some