## Stochastic approximation in Monte Carlo computation (2007)

Venue: | Journal of the American Statistical Association |

Citations: | 24 - 13 self |

### BibTeX

@ARTICLE{Liang07stochasticapproximation,

author = {Faming Liang and Chuanhai Liu and Raymond J. Carroll},

title = {Stochastic approximation in Monte Carlo computation},

journal = {Journal of the American Statistical Association},

year = {2007},

volume = {102},

pages = {305--320}

}

### OpenURL

### Abstract

The Wang-Landau algorithm is an adaptive Markov chain Monte Carlo algorithm to calculate the spectral density for a physical system. A remarkable feature of the algorithm is that it is not trapped by local energy minima, which is very important for systems with rugged energy landscapes. This feature has led to many successful applications of the algo-rithm in statistical physics and biophysics. However, there does not exist rigorous theory to support its convergence, and the estimates produced by the algorithm can only reach a limited statistical accuracy. In this paper, we propose the stochastic approximation Monte Carlo (SAMC) algorithm, which overcomes the shortcomings of the Wang-Landau algorithm. We establish a theorem concerning its convergence. The estimates produced by SAMC can be improved continuously as the simulation goes on. SAMC also extends applications of the Wang-Landau algorithm to continuum systems. The potential uses of SAMC in statistics are discussed through two classes of applications, importance sampling and model selection. The results show that SAMC can work as a general importance sampling algorithm and a

### Citations

4206 |
Stochastic Relaxation, Gibbs distribution, and the Bayesian Restoration of Images
- Geman, Geman
- 1984
(Show Context)
Citation Context ...4) is reduced to 1. This note is also applicable to the SAMC-model-selection algorithm. The MCMC algorithm employed in step (b) of the above two algorithms can be the MH algorithm, the Gibbs sampler (=-=Geman and Geman, 1984-=-) or any other advanced MCMC algorithms, such as simulated tempering, parallel tempering, evolutionary Monte Carlo and SAMC-importance-resampling (discussed in Section 3). When the distribution f(M, ϑ... |

3635 | An Introduction to the Bootstrap
- Efron, Tibshirani
- 1993
(Show Context)
Citation Context ...sed on a single-point trial distribution. To compare the accuracy of the mix-MCMLEs and single-MCMLEs, we conducted the following experiment based on the principle of the parametric bootstrap method (=-=Efron and Tibshirani, 1993-=-). Let T 1 = ∑ i∈D si and T 2 = 1 ∑ 2 i∈D si ( ∑ j∈N(i) sj ) . It is easy to see that T = (T 1, T 2) forms a sufficient statistic of (α, β). Given an estimate (α, β), we can reversely estimate the ... |

2637 |
Equation of state calculations by fast computing machines
- Metropolis, Rosenbluth, et al.
(Show Context)
Citation Context ...ich, for convenience, is written in the following form, p(x) = cp0(x), x ∈ X , (1) where c is a constant and X is the sample space. As known by many researchers, the Metropolis-Hastings (MH) sampler (=-=Metropolis et al, 1953-=-; Hastings, 1970) is prone to becoming trapped in local energy minima when the energy landscape of the distribution is rugged (in terms of physics, − log{p0(x)} is called the energy function of the di... |

1432 |
Monte Carlo sampling methods using Markov chains and their applications
- Hastings
- 1970
(Show Context)
Citation Context ... written in the following form, p(x) = cp0(x), x ∈ X , (1) where c is a constant and X is the sample space. As known by many researchers, the Metropolis-Hastings (MH) sampler (Metropolis et al, 1953; =-=Hastings, 1970-=-) is prone to becoming trapped in local energy minima when the energy landscape of the distribution is rugged (in terms of physics, − log{p0(x)} is called the energy function of the distribution). Ove... |

1290 |
Spatial interaction and the statistical analysis of lattice systems (with Discussion
- Besag
- 1974
(Show Context)
Citation Context ... complex, for example, when the distribution f(M|D) has well separated multiple modes, or when there are tiny probability models, but, of interest to us. 5.2 Numerical Results The autologistic model (=-=Besag, 1974-=-) has been widely used for spatial data analysis, see, e.g., Preisler (1993) or Augustin et al (1996). Let s = {si : i ∈ D} denote a configuration of the model, where the binary response si ∈ {−1, +1}... |

985 | Reversible jump Markov chain Monte Carlo computation and Bayesian model determination,” Biometrika
- Green
- 1995
(Show Context)
Citation Context ... f(M, ϑM|D), it follows from (7) that �g (t) i /�g(t) j = eθti−θtj forms a consistent estimator for the Bayes factor of the models Mi and Mj, 1 ≤ i, j ≤ m. We note that reversible jump MCMC (RJMCMC) (=-=Green, 1995-=-) can also estimate the Bayes factors of m models simultaneously. For comparison, in the following we give explicitly the iterative procedures of the two methods for Bayesian model selection. Let Q(Mi... |

879 | Markov chains for exploring posterior distributions
- Tierney
- 1994
(Show Context)
Citation Context ...ccepted, set (M (t+1) , ϑ (t+1) ) = (M ∗ , ϑ ∗ ); otherwise, set (M (t+1) , ϑ (t+1) ) = (M (t) , ϑ (t) ). (d) Set Ξ (t+1) i = Ξ (t) i + I(M (t+1) = Mi) for i = 1, . . . , m. The standard MCMC theory (=-=Tierney, 1994-=-) implies that as t → ∞, Ξ (t) i /Ξ(t) j consistent estimator for the Bayes factor of model Mi and model Mj. forms a We note that the form of the RJMCMC algorithm described above is not the most gener... |

874 | Inference from iterative simulation using multiple sequences - Gelman, Rubin - 1992 |

639 | A stochastic approximation method - Robbins, Monro - 1951 |

405 | Adaptive Algorithms and Stochastic Approximations - Benveniste, Métivier, et al. - 1990 |

295 |
Statistical Analysis of Non-Lattice Data
- Besag
- 1975
(Show Context)
Citation Context ... be close to the true parameter point, otherwise, a large value of n would be required for the convergence of (19). Sherman et al (2006) set (α ∗ , β ∗ ) to be the maximum pseudo-likelihood estimate (=-=Besag, 1975-=-) of (α, β), which is the MLE of the pseudo-likelihood function P L(α, β|s) = � exp{si(α + β � j∈N(i) sj)} exp{α + β � j∈N(i) sj} + exp{−α − β � . (21) j∈N(i) sj} i∈D We repeated Sherman et al’s proce... |

279 |
Nonuniversal critical dynamics in monte carlo simulations
- Swendsen, Wang
- 1987
(Show Context)
Citation Context ...ithms have been proposed to overcome this problem, mainly based on the following two ideas. The first idea is the use of auxiliary variables. Included in this category are the SwendsenWang algorithm (=-=Swendsen and Wang, 1987-=-), simulated tempering (Marinari and Parisi, 1992; Geyer and Thompson, 1995), parallel tempering (Geyer, 1991, Hukushima and Nemoto, 1996), evolutionary Monte Carlo (Liang and Wong, 2001), etc. In the... |

229 |
Constrained Monte Carlo Maximum Likelihood for Dependent Data.” [Online]. Available: http://www.jstor.org/stable/2345852
- Geyer, Thompson
(Show Context)
Citation Context ...he importance samples collected above, we also estimated the parameters (α, β) for the cancer data shown in Figure 4(a). The estimation can be done using the Monte 21sCarlo maximum likelihood method (=-=Geyer and Thompson, 1992-=-; Geyer, 1994) as follows. Let p ∗ (s) = c ∗ p ∗ 0(s) denote an arbitrary trial distribution for (15), where p ∗ 0(s) is completely specified and c ∗ is an unknown constant. Let ψ(α, β, s) = ϕ(α, β)p(... |

222 |
Markov chain Monte Carlo maximum likelihood
- Geyer
- 1991
(Show Context)
Citation Context ...auxiliary variables. Included in this category are the SwendsenWang algorithm (Swendsen and Wang, 1987), simulated tempering (Marinari and Parisi, 1992; Geyer and Thompson, 1995), parallel tempering (=-=Geyer, 1991-=-, Hukushima and Nemoto, 1996), evolutionary Monte Carlo (Liang and Wong, 2001), etc. In these algorithms, the temperature is typically treated as an auxiliary variable. Simulations at high temperature... |

217 | Monte Carlo Strategies - Liu - 2001 |

192 | JS:Minorization Conditions and Convergence Rates for Markov Chain Monte Carlo - Rosenthal - 1995 |

175 |
Annealing Markov chain Monte Carlo with applications to ancestral inference
- Geyer, Thompson
- 1995
(Show Context)
Citation Context ...lowing two ideas. The first idea is the use of auxiliary variables. Included in this category are the SwendsenWang algorithm (Swendsen and Wang, 1987), simulated tempering (Marinari and Parisi, 1992; =-=Geyer and Thompson, 1995-=-), parallel tempering (Geyer, 1991, Hukushima and Nemoto, 1996), evolutionary Monte Carlo (Liang and Wong, 2001), etc. In these algorithms, the temperature is typically treated as an auxiliary variabl... |

169 | Simulated tempering: A new Monte Carlo scheme
- Marinari, Parisi
- 1992
(Show Context)
Citation Context ...em, mainly based on the following two ideas. The first idea is the use of auxiliary variables. Included in this category are the SwendsenWang algorithm (Swendsen and Wang, 1987), simulated tempering (=-=Marinari and Parisi, 1992-=-; Geyer and Thompson, 1995), parallel tempering (Geyer, 1991, Hukushima and Nemoto, 1996), evolutionary Monte Carlo (Liang and Wong, 2001), etc. In these algorithms, the temperature is typically treat... |

169 | Rates of convergence of the Hastings and Metropolis algorithms. The Annals of Statistics
- Mengersen, Tweedie
- 1996
(Show Context)
Citation Context ...numerical experience indicates that SAMC should have some type of convergence 25 (22) (23)seven when the minorisation condition does not hold, in a manner similar to the MetropolisHastings algorithm (=-=Mengersen and Tweedie, 1996-=-). A further study in this direction is of some interest. Appendix: Theoretical Results on SAMC The appendix is organized as follows. In part I, we describe a theorem for the convergence of the SAMC a... |

147 | Efficient Metropolis jumping rules - Gelman, Roberts, et al. - 1994 |

131 | Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms
- Roberts, Tweedie
- 1996
(Show Context)
Citation Context ...istribution q(x, y) satisfy the following condition. For every x ∈ X , there exist ɛ1 > 0 and ɛ2 > 0 such that |x − y| ≤ ɛ1 =⇒ q(x, y) ≥ ɛ2. (4) This is a natural condition in a study of MCMC theory (=-=Roberts and Tweedie, 1996-=-). In practice, this kind of proposal can be designed easily for both continuum and discrete systems. For a continuum system, q(x, y) can be set to the random walk Gaussian proposal y ∼ N(x, σ 2 I) wi... |

119 | General state space Markov chains and MCMC algorithms. Probability Surveys - Roberts, Rosenthal - 2004 |

109 | Convergence of a stochastic approximation version of the em algorithm. The Annals of Statistics - Delyon, Lavielle, et al. - 1999 |

86 |
Nonphysical sampling distributions in Monte Carlo free-energy estimation: Umbrella sampling
- Torrie, Valleau
(Show Context)
Citation Context ... The vertical bars show the ±one-standard-deviation of the average of the estimates. (b) Box-plots of {ɛf(Ei)} obtained in 100 runs of SAMC. Example 2 As pointed out by Liu (2001), umbrella sampling (=-=Torrie and Valleau, 1977-=-) can be seen as a precursor of many advanced Monte Carlo algorithms, including simulated tempering, multicanonical and thus WL, GWL and SAMC. Although umbrella sampling was proposed originally for es... |

74 |
Efficient, multiple-range random walk algorithm to calculate the density of states
- Wang, Landau
(Show Context)
Citation Context ...ng algorithm, where the trial distribution is learned dynamically from past samples. Related works include the 1/k-ensemble algorithm (Hesselbo and Stinchcombe, 1995), the Wang-Landau (WL) algorithm (=-=Wang and Landau, 2001-=-), and the generalized Wang-Landau (GWL) algorithm (Liang, 2004, 2005). They are different from the multicanonical algorithm only in the specification and/or the learning scheme for the trial distribu... |

73 | Adaptive Markov chain Monte Carlo through regeneration - Gilks, Roberts, et al. - 1998 |

71 | Exchange Monte Carlo method and application to spin glass simulations
- Hukushima, Nemoto
- 1996
(Show Context)
Citation Context ...iables. Included in this category are the SwendsenWang algorithm (Swendsen and Wang, 1987), simulated tempering (Marinari and Parisi, 1992; Geyer and Thompson, 1995), parallel tempering (Geyer, 1991, =-=Hukushima and Nemoto, 1996-=-), evolutionary Monte Carlo (Liang and Wong, 2001), etc. In these algorithms, the temperature is typically treated as an auxiliary variable. Simulations at high temperatures broaden sampling of the sa... |

69 | An autologistic model for the spatial distribution of wildlife - Augustin, Mugglestone, et al. - 1996 |

69 | On the convergence of Monte Carlo maximum likelihood calculations
- Geyer
- 1994
(Show Context)
Citation Context ...ected above, we also estimated the parameters (α, β) for the cancer data shown in Figure 4(a). The estimation can be done using the Monte 21sCarlo maximum likelihood method (Geyer and Thompson, 1992; =-=Geyer, 1994-=-) as follows. Let p ∗ (s) = c ∗ p ∗ 0(s) denote an arbitrary trial distribution for (15), where p ∗ 0(s) is completely specified and c ∗ is an unknown constant. Let ψ(α, β, s) = ϕ(α, β)p(s|α, β), and ... |

64 | Hidden markov models and disease mapping - Green, Richardson - 2002 |

63 |
Weighted average importance sampling and defensive mixture distributions
- Hesterberg
- 1995
(Show Context)
Citation Context ...ting importance weights. How to specify an appropriate trial density for a general distribution has, of course, been a long standing and difficult problem in statistics. The defensive mixture method (=-=Hesterberg, 1995-=-) suggests the following trial density p ∗ (x) = λp(x) + (1 − λ)�p(x), (10) where 0 < λ < 1 and �p(x) is another density. However, in practice, p ∗ (x) is rather poor. Although the resulting importanc... |

56 | Monte Carlo methods of inference for implicit statistical models - Diggle, Gratton - 1984 |

52 | Stability of stochastic approximation under verifiable conditions
- Andrieu, Moulines, et al.
(Show Context)
Citation Context ...e probability measure of the Markov chain {(xt, θt)}, started in (x0, θ0), and implicitly defined by the sequences {γt}. Define D(x, A) = infy∈A |x − y|. Theorem 6.2 (Theorem 5.5 and Proposition 6.1, =-=Andrieu et al, 2005-=-) Assume the conditions (A1) and (A4) hold, and there exists a drift function V (x) such that supx∈X V (x) < ∞ and the drift condition holds (refer to Andrieu et al (2005) for the description of the c... |

47 |
Real-parameter evolutionary Monte Carlo with applications to Bayesian Mixture models
- Liang, Wong
(Show Context)
Citation Context ...algorithm (Swendsen and Wang, 1987), simulated tempering (Marinari and Parisi, 1992; Geyer and Thompson, 1995), parallel tempering (Geyer, 1991, Hukushima and Nemoto, 1996), evolutionary Monte Carlo (=-=Liang and Wong, 2001-=-), etc. In these algorithms, the temperature is typically treated as an auxiliary variable. Simulations at high temperatures broaden sampling of the sample space, and thus are able to help the system ... |

43 | Monte Carlo Statistical Methods, 2nd Edition - Robert, Casella - 2004 |

37 |
Slice Sampling (with discussion
- Neal
- 2003
(Show Context)
Citation Context ... are discussed. SAMC can work as a general importance sampling method and a model selection sampler when the model space is complex. As with many other Monte Carlo algorithms, such as slice sampling (=-=Neal, 2003-=-), SAMC also suffers from the curse of dimensionality. For example, consider the modified witch’s hat 24sdistribution studied in Geyer and Thompson (1995), ⎧ ⎪⎨ 1 + β, x ∈ [0, α] p(x) = ⎪⎩ k , 1, x ∈ ... |

34 | Population Monte Carlo - Cappé, Guillin, et al. - 2004 |

30 |
Maximum likelihood estimation for spatial models by Markov chain Monte Carlo stochastic approximation
- Gu, Zhu
- 2001
(Show Context)
Citation Context ...hain Monte Carlo for solving maximum likelihood estimation problems (Younes, 1988, 1999; Moyeed and Baddeley, 1991; Gu and Kong, 1998; Gelfand and Banerjee, 1998; Delyon, Lavielle and Moulines, 1999; =-=Gu and Zhu, 2001-=-). The critical difference between SAMC and other stochastic approximation MCMC algorithms is at sample space partitioning. With our use 6sof partitioning, many new applications can be established in ... |

29 |
A stochastic approximation algorithm with MCMC method for incomplete data estimation problems
- Gu
- 1998
(Show Context)
Citation Context ... Lai (2003) for an overview on the subject. Recently, it has been used with Markov chain Monte Carlo for solving maximum likelihood estimation problems (Younes, 1988, 1999; Moyeed and Baddeley, 1991; =-=Gu and Kong, 1998-=-; Gelfand and Banerjee, 1998; Delyon, Lavielle and Moulines, 1999; Gu and Zhu, 2001). The critical difference between SAMC and other stochastic approximation MCMC algorithms is at sample space partiti... |

23 |
Metastability of the folded states of globular proteins
- Honeycutt, Thirumalai
- 1990
(Show Context)
Citation Context ...ons. As shown by Hesselbo and Stinchcombe (1995) and Liang (2005), biasing sampling to low energy regions often improves the ergodicity of the simulation. Our numerical results on BLN protein models (=-=Honeycutt and Thirumalai, 1990-=-) also strongly support this point. Due to space limitations, these results will be reported elsewhere. 6• The choice of t0 and the number of iterations. To estimate g, γt should be very close to 0 a... |

21 | Stochastic approximation - Lai - 2003 |

19 |
Stochastic approximation of the MLE for a spatial point pattern
- Moyeed, Baddeley
- 1991
(Show Context)
Citation Context ...tochastic systems. Refer to Lai (2003) for an overview on the subject. Recently, it has been used with Markov chain Monte Carlo for solving maximum likelihood estimation problems (Younes, 1988, 1999; =-=Moyeed and Baddeley, 1991-=-; Gu and Kong, 1998; Gelfand and Banerjee, 1998; Delyon, Lavielle and Moulines, 1999; Gu and Zhu, 2001). The critical difference between SAMC and other stochastic approximation MCMC algorithms is at s... |

18 |
Dynamically weighted importance sampling in Monte Carlo computation
- Liang
- 2002
(Show Context)
Citation Context ... algorithm only in the specification and/or the learning scheme for the trial distribution. Other work included in this category is dynamic weighting (Wong and Liang, 1997; Liu, Liang and Wong, 2001, =-=Liang, 2002-=-), where the acceptance rate of the MH moves is adjusted dynamically with an importance weight which carries the information of past samples. Among the algorithms described above, the WL algorithm has... |

18 |
Dynamic weighting in Monte Carlo and optimization
- Wong, Liang
- 1997
(Show Context)
Citation Context ...2005). They are different from the multicanonical algorithm only in the specification and/or the learning scheme for the trial distribution. Other work included in this category is dynamic weighting (=-=Wong and Liang, 1997-=-; Liu, Liang and Wong, 2001, Liang, 2002), where the acceptance rate of the MH moves is adjusted dynamically with an importance weight which carries the information of past samples. Among the algorith... |

15 | Modelling spatial patterns of trees attacked by bark-beetles - Preisler - 1993 |

14 |
Multicanonical Algorithms for 1st Order Phase-Transitions. Phys
- Berg, Neuhaus
- 1991
(Show Context)
Citation Context ...high temperatures broaden sampling of the sample space, and thus are able to help the system escape from local energy minima. The second idea is the use of past samples. The multicanonical algorithm (=-=Berg and Neuhaus, 1991-=-) is apparently the first work in this direction. This algorithm is essentially a dynamic importance sampling algorithm, where the trial distribution is learned dynamically from past samples. Related ... |

14 | The normal kernel coupler: An adaptive markov chain monte carlo method for efficiently sampling from multi-modal distributions - Warnes - 2001 |

14 |
Estimation and Annealing for Gibbsian Fields
- Younes
- 1988
(Show Context)
Citation Context ...ion and control of stochastic systems. Refer to Lai (2003) for an overview on the subject. Recently, it has been used with Markov chain Monte Carlo for solving maximum likelihood estimation problems (=-=Younes, 1988-=-, 1999; Moyeed and Baddeley, 1991; Gu and Kong, 1998; Gelfand and Banerjee, 1998; Delyon, Lavielle and Moulines, 1999; Gu and Zhu, 2001). The critical difference between SAMC and other stochastic appr... |

12 | Monte carlo simulation and global optimization without parameters - Hesselbo, Stinchcombe - 1995 |

8 | Generalized Wang-Landau Algorithm for Monte Carlo Computation - Liang - 2005 |