## Stable Limit Laws for Harmonic Mean Estimators of Marginal Likelihoods (2010)

### BibTeX

@MISC{Wolpert10stablelimit,

author = {Robert L. Wolpert and Scott C. Schmidler},

title = {Stable Limit Laws for Harmonic Mean Estimators of Marginal Likelihoods},

year = {2010}

}

### OpenURL

### Abstract

The task of calculating marginal likelihoods arises in a wide array of statistical inference problems, including the evaluation of Bayes factors for model selection and hypothesis testing. Although Markov chain Monte Carlo methods have simplified many posterior calculations needed for practical Bayesian analysis, the evaluation of marginal likelihoods remains difficult. We consider the behavior of the wellknown harmonic mean estimator (Newton and Raftery, 1994) of the marginal likelihood, which converges almost-surely but may have infinite variance and so may not obey a central limit theorem. We give examples illustrating the convergence in distribution of the harmonic mean estimator to a one-sided stable law with characteristic exponent 1 < α < 2. While the harmonic mean estimator does converge almost surely, we show that it does so at rate n −ǫ where ǫ = 1 − α −1 is often as small as 0.10 or 0.01. In such a case, the reduction of Monte Carlo sampling error by a factor of two requires increasing the Monte Carlo sample size by a factor of 2 1/ǫ, or in excess of 2.5 ·10 30 when ǫ = 0.01. We explore the possibility of estimating the parameters of the limiting stable distribution to provide accelerated convergence.

### Citations

1454 |
An Introduction to Probability Theory and
- Feller
- 1968
(Show Context)
Citation Context ...for Sn = ∑ j≤n Yj will be normal, the special case α = 2 of the stable. The limit will be stable of index α ∈ (0, 2) if instead P[|Yj| > y] = k(y) y−α as y → ∞ for a slowly-varying function k(·) > 0 (=-=Feller 1971-=-, IX.8, or Gnedenko and Kolmogorov 1968, Chap. 7, §35); it will be one-sided (or “fully skewed”) stable if also yα P[Yj < −y] → 0, whereupon β = 1. That will be the case of interest to us below. 2.1 S... |

979 | Bayes factors
- Kass, Raftery
- 1995
(Show Context)
Citation Context ... of (often iterative, numerical) likelihood maximization. In the Bayesian paradigm, hypotheses are tested and models selected by evaluating their posterior probabilities or calculating Bayes factors (=-=Kass and Raftery, 1995-=-), and predictions made by averaging over models weighted by their associated posterior probabilities (Clyde and George, 2004). In each case the key quantity is the marginal probability density functi... |

680 |
MrBayes: Bayesian inference of phylogenetic trees
- Huelsenbeck, Ronquist
- 2001
(Show Context)
Citation Context ...his situation almost never arises in practice, nevertheless, the HME remains popular due to its attractive simplicity, and it is used widely in some applications areas such as Bayesian phylogenetics (=-=Huelsenbeck and Ronquist, 2001-=-; Nylander et al., 22004; Drummond and Rambaut, 2007) although awareness of alternatives is increasing (Lartillot and Philippe, 2006). The HME is also described favorably in some popular introductory... |

382 | Asymptotic Statistics - Vaart - 1998 |

324 | Marginal Likelihood from the Gibbs Output
- Chib
- 1995
(Show Context)
Citation Context ...lems that can be solved numerically using Bayesian methods, the problem of evaluating fm(x) remains difficult and has received considerable attention (Gelfand and Dey, 1994; Newton and Raftery, 1994; =-=Chib, 1995-=-; Meng and Wong, 1996; Raftery, 1996; Gelman and Meng, 1998; Chib and Jeliazkov, 2001; Han and Carlin, 2001; Meng and Schilling, 2002; Sinharay and Stern, 2005). Newton and Raftery (1994) note that th... |

307 |
Limit distributions for sums of independent random variables
- Gnedenko, Kolmogorov
- 1968
(Show Context)
Citation Context ... be normal, the special case α = 2 of the stable. The limit will be stable of index α ∈ (0, 2) if instead P[|Yj| > y] = k(y) y−α as y → ∞ for a slowly-varying function k(·) > 0 (Feller 1971, IX.8, or =-=Gnedenko and Kolmogorov 1968-=-, Chap. 7, §35); it will be one-sided (or “fully skewed”) stable if also yα P[Yj < −y] → 0, whereupon β = 1. That will be the case of interest to us below. 2.1 Stable laws for Markovian sequences When... |

267 |
A simple general approach to inference about the tail of a distribution
- Hill
- 1975
(Show Context)
Citation Context ... the ch.f.), regression methods based on the empirical ch.f. estimate (or “ece”) (Kogon and Williams, 1998), wavelet approaches (Antoniadis et al., 2006), and tail-behavior methods based on extremes (=-=Hill, 1975-=-). Many of these consider only the symmetric (β = 0, δ = 0) case and most break down when α is close to one. A promising approach pioneered by Press (1972), improved by Kogon and Williams (1998), desc... |

203 |
Constrained Monte Carlo maximum likelihood for dependent data
- Geyer, Thompson
- 1992
(Show Context)
Citation Context ...E as a special case (Meng and Wong, 1996). Other methods where similar effects are likely to arise include methods for handling intractable normalizing constants in likelihood functions (Geyer, 1991; =-=Geyer and Thompson, 1992-=-; Geyer, 1994), including the recent algorithm of Møller et al. (2006) involving exact sampling and its generalizations (Andrieu et al., 2010). All of the methods mentioned have in common the use of i... |

182 |
Bayesian model choice: asymptotics and exact calculations
- Gelfand, Dey
- 1994
(Show Context)
Citation Context ...ods have broadened dramatically the class of problems that can be solved numerically using Bayesian methods, the problem of evaluating fm(x) remains difficult and has received considerable attention (=-=Gelfand and Dey, 1994-=-; Newton and Raftery, 1994; Chib, 1995; Meng and Wong, 1996; Raftery, 1996; Gelman and Meng, 1998; Chib and Jeliazkov, 2001; Han and Carlin, 2001; Meng and Schilling, 2002; Sinharay and Stern, 2005). ... |

175 |
Central limit theorem for additive functionals of reversible Markov processes and applications to simple exclusions
- Kipnis, Varadhan
- 1986
(Show Context)
Citation Context ...) show that the partial sums of any strongly mixing sequence can converge only to a stable distribution with 0 < α ≤ 2. When the Yj’s have finite variance, CLTs hold under standard mixing conditions (=-=Kipnis and Varadhan, 1986-=-; Chan and Geyer, 1994; Roberts and Rosenthal, 2004). In particular if Eπ(|Yj| 2+ǫ ) < ∞ for some ǫ > 0, or ǫ ≥ 0 when the Markov chain is reversible, it suffices that the chain be geometrically ergod... |

169 |
Markov chain Monte Carlo : stochastic simulation for Bayesian inference
- Gamerman
(Show Context)
Citation Context ...aut, 2007) although awareness of alternatives is increasing (Lartillot and Philippe, 2006). The HME is also described favorably in some popular introductory texts on MCMC (Gilks et al., 1996, Ch. 10; =-=Gamerman and Lopes, 2006-=-, §7.2.1) and continues to be the subject of research (Raftery et al., 2007). The behavior of the HME is also relevant to other methods aimed at related problems. For example bridge sampling estimates... |

147 |
Fundamental of statistical exponential families with applications in statistical decision theory
- Brown
- 1986
(Show Context)
Citation Context ...ed by some Euclidean subset Θ ∈ R p , that can be expressed in the form f(x | θ) = e θ·T(x)−A(θ) h(x) for some p-dimensional statistic T : X → R p , real-valued A : Θ → R, and nonnegative h : X → R+ (=-=Brown, 1986-=-, Ch. 1). It is well-known that T+ ≡ ∑ T(xi) is a sufficient statistic for an iid random sample ⃗x = (x1, · · · , xr), that ∇A( ˆ θ) = ¯ T ≡ T+/r at the maximum likelihood estimator (MLE) ˆ θ(⃗x), and... |

145 | Simulating normalizing constants: From importance sampling to bridge sampling to path sampling
- Gelman, Meng
- 1997
(Show Context)
Citation Context ...an methods, the problem of evaluating fm(x) remains difficult and has received considerable attention (Gelfand and Dey, 1994; Newton and Raftery, 1994; Chib, 1995; Meng and Wong, 1996; Raftery, 1996; =-=Gelman and Meng, 1998-=-; Chib and Jeliazkov, 2001; Han and Carlin, 2001; Meng and Schilling, 2002; Sinharay and Stern, 2005). Newton and Raftery (1994) note that the harmonic mean identity f −1 m = Eπfm(x | θ) −1 (where π(θ... |

133 |
Approximate Bayesian Inference with the Weighted Likelihood Bootstrap
- Newton, Raftery
- 1994
(Show Context)
Citation Context ...lified many posterior calculations needed for practical Bayesian analysis, the evaluation of marginal likelihoods remains difficult. We consider the behavior of the wellknown harmonic mean estimator (=-=Newton and Raftery, 1994-=-) of the marginal likelihood, which converges almost-surely but may have infinite variance and so may not obey a central limit theorem. We give examples illustrating the convergence in distribution of... |

109 | Simulating ratios of normalizing constants via a simple identity: A theoretical exploration
- Meng, Wong
- 1996
(Show Context)
Citation Context ...n be solved numerically using Bayesian methods, the problem of evaluating fm(x) remains difficult and has received considerable attention (Gelfand and Dey, 1994; Newton and Raftery, 1994; Chib, 1995; =-=Meng and Wong, 1996-=-; Raftery, 1996; Gelman and Meng, 1998; Chib and Jeliazkov, 2001; Han and Carlin, 2001; Meng and Schilling, 2002; Sinharay and Stern, 2005). Newton and Raftery (1994) note that the harmonic mean ident... |

106 | General state space markov chains and mcmc algorithms. Probability Surveys
- Roberts, Rosenthal
(Show Context)
Citation Context ...ng sequence can converge only to a stable distribution with 0 < α ≤ 2. When the Yj’s have finite variance, CLTs hold under standard mixing conditions (Kipnis and Varadhan, 1986; Chan and Geyer, 1994; =-=Roberts and Rosenthal, 2004-=-). In particular if Eπ(|Yj| 2+ǫ ) < ∞ for some ǫ > 0, or ǫ ≥ 0 when the Markov chain is reversible, it suffices that the chain be geometrically ergodic, i.e., that for some number ρ ∈ (0, 1) and funct... |

101 |
Marginal Likelihood from Metropolis-Hastings Output
- Chib, Jeliazkov
- 2001
(Show Context)
Citation Context ... of evaluating fm(x) remains difficult and has received considerable attention (Gelfand and Dey, 1994; Newton and Raftery, 1994; Chib, 1995; Meng and Wong, 1996; Raftery, 1996; Gelman and Meng, 1998; =-=Chib and Jeliazkov, 2001-=-; Han and Carlin, 2001; Meng and Schilling, 2002; Sinharay and Stern, 2005). Newton and Raftery (1994) note that the harmonic mean identity f −1 m = Eπfm(x | θ) −1 (where π(θ) = fm(x | θ)π0(θ)/fm(x) d... |

100 | BEAST: Bayesian evolutionary analysis by sampling trees
- Drummond, Rambaut
- 2007
(Show Context)
Citation Context ...ss, the HME remains popular due to its attractive simplicity, and it is used widely in some applications areas such as Bayesian phylogenetics (Huelsenbeck and Ronquist, 2001; Nylander et al., 22004; =-=Drummond and Rambaut, 2007-=-) although awareness of alternatives is increasing (Lartillot and Philippe, 2006). The HME is also described favorably in some popular introductory texts on MCMC (Gilks et al., 1996, Ch. 10; Gamerman ... |

72 | Calcul des probabilités - Lévy - 1925 |

68 | Simple consistent estimators of stable distribution parameters - McCulloch - 1986 |

58 | On the convergence of Monte Carlo maximum likelihood calculations
- Geyer
- 1994
(Show Context)
Citation Context ...and Wong, 1996). Other methods where similar effects are likely to arise include methods for handling intractable normalizing constants in likelihood functions (Geyer, 1991; Geyer and Thompson, 1992; =-=Geyer, 1994-=-), including the recent algorithm of Møller et al. (2006) involving exact sampling and its generalizations (Andrieu et al., 2010). All of the methods mentioned have in common the use of importance rat... |

54 | Bayesian phylogenetic analysis of combined data - Nylander, Ronquist, et al. - 2004 |

49 | An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants - Moller, Pettitt, et al. - 2006 |

44 |
Hypothesis testing and model selection
- Raftery
- 1996
(Show Context)
Citation Context ...ly using Bayesian methods, the problem of evaluating fm(x) remains difficult and has received considerable attention (Gelfand and Dey, 1994; Newton and Raftery, 1994; Chib, 1995; Meng and Wong, 1996; =-=Raftery, 1996-=-; Gelman and Meng, 1998; Chib and Jeliazkov, 2001; Han and Carlin, 2001; Meng and Schilling, 2002; Sinharay and Stern, 2005). Newton and Raftery (1994) note that the harmonic mean identity f −1 m = Eπ... |

36 |
Markov chain Monte Carlo methods for computing Bayes factors
- Han, Carlin
- 2001
(Show Context)
Citation Context ...ns difficult and has received considerable attention (Gelfand and Dey, 1994; Newton and Raftery, 1994; Chib, 1995; Meng and Wong, 1996; Raftery, 1996; Gelman and Meng, 1998; Chib and Jeliazkov, 2001; =-=Han and Carlin, 2001-=-; Meng and Schilling, 2002; Sinharay and Stern, 2005). Newton and Raftery (1994) note that the harmonic mean identity f −1 m = Eπfm(x | θ) −1 (where π(θ) = fm(x | θ)π0(θ)/fm(x) denotes the posterior d... |

33 | Computing Bayes factors using thermodynamic integration. Systematic Biology 55
- Lartillot, Philippe
- 2006
(Show Context)
Citation Context ...idely in some applications areas such as Bayesian phylogenetics (Huelsenbeck and Ronquist, 2001; Nylander et al., 22004; Drummond and Rambaut, 2007) although awareness of alternatives is increasing (=-=Lartillot and Philippe, 2006-=-). The HME is also described favorably in some popular introductory texts on MCMC (Gilks et al., 1996, Ch. 10; Gamerman and Lopes, 2006, §7.2.1) and continues to be the subject of research (Raftery et... |

33 | Some limit theorems for stationary Markov chains. Theory Probab - Nagaev - 1957 |

31 |
Comment on “Markov chains for exploring posterior distributions
- Chan, Geyer
- 1994
(Show Context)
Citation Context ...s of any strongly mixing sequence can converge only to a stable distribution with 0 < α ≤ 2. When the Yj’s have finite variance, CLTs hold under standard mixing conditions (Kipnis and Varadhan, 1986; =-=Chan and Geyer, 1994-=-; Roberts and Rosenthal, 2004). In particular if Eπ(|Yj| 2+ǫ ) < ∞ for some ǫ > 0, or ǫ ≥ 0 when the Markov chain is reversible, it suffices that the chain be geometrically ergodic, i.e., that for som... |

28 | Maximum likelihood estimation of stable parameters - Nolan - 2001 |

25 | Characteristic function based estimation of stable parameters - Kogon, Williams - 1998 |

24 | Estimation in univariate and multivariate stable distributions - Press - 1972 |

23 | Estimating the Integrated Likelihood Via Posterior Simulation Using the Harmonic Mean Identity
- Raftery, Newton, et al.
- 2006
(Show Context)
Citation Context ...ippe, 2006). The HME is also described favorably in some popular introductory texts on MCMC (Gilks et al., 1996, Ch. 10; Gamerman and Lopes, 2006, §7.2.1) and continues to be the subject of research (=-=Raftery et al., 2007-=-). The behavior of the HME is also relevant to other methods aimed at related problems. For example bridge sampling estimates the ratio of normalizing constants for two unnormalized densities, and con... |

20 | Model uncertainty
- Clyde, George
- 2004
(Show Context)
Citation Context ... by evaluating their posterior probabilities or calculating Bayes factors (Kass and Raftery, 1995), and predictions made by averaging over models weighted by their associated posterior probabilities (=-=Clyde and George, 2004-=-). In each case the key quantity is the marginal probability density function fm(x) at the observed data vector x for each model m under consideration, and only in simple special cases can this be obt... |

19 | Limit theorems for additive functionals of a Markov Chain. Available at arXiv:0809.0177v1 - Jara, Komorowski, et al. - 2009 |

18 | On the asymptotic theory of estimation and testing hypotheses - Cam - 1956 |

15 | Wrap Bridge Sampling
- Meng, Schilling
- 2002
(Show Context)
Citation Context ...eceived considerable attention (Gelfand and Dey, 1994; Newton and Raftery, 1994; Chib, 1995; Meng and Wong, 1996; Raftery, 1996; Gelman and Meng, 1998; Chib and Jeliazkov, 2001; Han and Carlin, 2001; =-=Meng and Schilling, 2002-=-; Sinharay and Stern, 2005). Newton and Raftery (1994) note that the harmonic mean identity f −1 m = Eπfm(x | θ) −1 (where π(θ) = fm(x | θ)π0(θ)/fm(x) denotes the posterior density for prior π0(θ)) im... |

14 | Stable limits for partial sums of dependent random variables - Davis - 1983 |

8 |
PLU robust Bayesian decision theory: Point estimation
- Ramsay, Novick
- 1980
(Show Context)
Citation Context ...e distributions in exponential families, where the marginal likelihood f(⃗x) = ec(τ+T+, β+r)−c(τ, β) is available explicitly, but the prior distributions commonly recommended for such problems (e.g., =-=Ramsay and Novick, 1980-=-) have heavier tails than conjugate priors, and so may be expected to lead to even slower convergence. 3.6 Example 6: Bernstein-von Mises Under suitable regularity conditions every posterior distribut... |

7 | Stable Distributions - Borak, Härdle, et al. - 2005 |

7 |
A continuous representation of the family of stable law distributions
- Cheng, Liu
- 1997
(Show Context)
Citation Context ...iδγω − γ|ω| {1 + iβ sgn ω 2 π log |ω|}) (2) α = 1 for some index α ∈ (0, 2], skewness −1 ≤ β ≤ 1, rate γ > 0, and location −∞ < δ < ∞. For non-zero β this family has a sharp discontinuity 3at α = 1 (=-=Cheng and Liu, 1997-=-) because of the tangent term. The (M) parametrization StM(α, β, γ, δ) = StA(α, β, γ, δ∗ ) of Zolotarev (1986, p.11) overcomes this by shifting the location to δ∗ = δ − β tan πα leading to a 2 continu... |

4 | An empirical comparison of methods for computing Bayes factors in generalized linear mixed models
- Sinharay, Stern
- 2005
(Show Context)
Citation Context ...tion (Gelfand and Dey, 1994; Newton and Raftery, 1994; Chib, 1995; Meng and Wong, 1996; Raftery, 1996; Gelman and Meng, 1998; Chib and Jeliazkov, 2001; Han and Carlin, 2001; Meng and Schilling, 2002; =-=Sinharay and Stern, 2005-=-). Newton and Raftery (1994) note that the harmonic mean identity f −1 m = Eπfm(x | θ) −1 (where π(θ) = fm(x | θ)π0(θ)/fm(x) denotes the posterior density for prior π0(θ)) implies that for (possibly d... |

3 | on-line at http://www.math.sfu.ca/~cbm/aands - Abramowitz, Stegun, et al. - 1964 |

3 |
Posterior sampling in the presence of unknown normalising constants: An adaptive pseudo-marginal approach
- Andrieu, Berthelsen, et al.
- 2010
(Show Context)
Citation Context ...lizing constants in likelihood functions (Geyer, 1991; Geyer and Thompson, 1992; Geyer, 1994), including the recent algorithm of Møller et al. (2006) involving exact sampling and its generalizations (=-=Andrieu et al., 2010-=-). All of the methods mentioned have in common the use of importance ratios which, if unbounded, can lead to infinite variance and inapplicability of the central limit theorem. In this paper condition... |

3 | Markov chain Monte Carlo Maximum likelihood - J - 1991 |

2 |
Wavelet-based estimation for univariate stable laws
- Antoniadis, Feuerverger, et al.
- 2006
(Show Context)
Citation Context ... numerical evaluation of the pdf using the inverse Fourier transform of the ch.f.), regression methods based on the empirical ch.f. estimate (or “ece”) (Kogon and Williams, 1998), wavelet approaches (=-=Antoniadis et al., 2006-=-), and tail-behavior methods based on extremes (Hill, 1975). Many of these consider only the symmetric (β = 0, δ = 0) case and most break down when α is close to one. A promising approach pioneered by... |

2 | Improved estimation of the stable laws - Besbeas, Morgan - 2008 |

2 |
Sur deux problèmes de M. Kolmogoroff concernant les chaînes dénombrables. Bulletin de la Société Mathématique de France
- Döblin
- 1938
(Show Context)
Citation Context ...|Yj| 2 ) = ∞ one might similarly expect convergence to an α-stable law if dependence is not too strong. Nagaev (1957) proved that when Yj form a stationary Markov chain satisfying Döblin’s condition (=-=Döblin, 1938-=-) with marginal distribution belonging to a stable domain of attraction with 0 < α < 2, then Sn will also have stable limit with index α. Davis (1983) shows this for any stationary sequence satisfying... |

2 | Independent and stationay sequences of random variables - Ibragimov, Linnik - 1971 |