## Estimating the integrated likelihood via posterior simulation using the harmonic mean identity (2007)

### Cached

### Download Links

- [www.stat.washington.edu]
- [www.stat.washington.edu:80]
- [stat.washington.edu]
- [www.stat.washington.edu]
- [www.csss.washington.edu]
- [www.stat.duke.edu]
- [www.csss.washington.edu]
- [www.stat.washington.edu]
- [www.stat.washington.edu:80]
- [stat.washington.edu]
- [www.stat.washington.edu]
- [www.stat.washington.edu]

Venue: | Bayesian Statistics |

Citations: | 23 - 2 self |

### BibTeX

@INPROCEEDINGS{Raftery07estimatingthe,

author = {Adrian E. Raftery and Michael A. Newton and Jaya M. Satagopan and Pavel N. Krivitsky},

title = {Estimating the integrated likelihood via posterior simulation using the harmonic mean identity},

booktitle = {Bayesian Statistics},

year = {2007},

pages = {1--45}

}

### OpenURL

### Abstract

The integrated likelihood (also called the marginal likelihood or the normalizing constant) is a central quantity in Bayesian model selection and model averaging. It is defined as the integral over the parameter space of the likelihood times the prior density. The Bayes factor for model comparison and Bayesian testing is a ratio of integrated likelihoods, and the model weights in Bayesian model averaging are proportional to the integrated likelihoods. We consider the estimation of the integrated likelihood from posterior simulation output, aiming at a generic method that uses only the likelihoods from the posterior simulation iterations. The key is the harmonic mean identity, which says that the reciprocal of the integrated likelihood is equal to the posterior harmonic mean of the likelihood. The simplest estimator based on the identity is thus the harmonic mean of the likelihoods. While this is an unbiased and simulation-consistent estimator, its reciprocal can have infinite variance and so it is unstable in general. We describe two methods for stabilizing the harmonic mean estimator. In the first one, the parameter space is reduced in such a way that the modified estimator involves a harmonic mean of heavier-tailed densities, thus resulting in a finite variance estimator. The resulting

### Citations

2320 |
Estimating the dimension of a model
- Schwarz
- 1978
(Show Context)
Citation Context ...), (15) 2 where ˆ θ is the maximum likelihood estimator, so that ℓ( ˆ θ) = ℓmax, the maximum achievable loglikelihood. In general, under regularity conditions, log π(y) = log ˆπBIC(y) + OP (1), (16) (=-=Schwarz 1978-=-). so that the relative error in log ˆπBIC(y) tends to zero asymptotically. If the prior π(θ) is a normal unit information prior, then the approximation is more accurate and the OP (1) term in (16) is... |

1250 | Bayesian Data Analysis - Gelman, Carlin, et al. - 1995 |

1242 |
Information theory and an extension of the maximum likelihood principle
- Akaike
- 1973
(Show Context)
Citation Context ...entially has a different “n” associated with it, corresponding to the number of data points involved in estimating it. In a similar way, we can write down a posterior simulation-based version of AIC (=-=Akaike 1973-=-). AIC can be defined as which we can estimate by AIC = 2ℓmax − 2d, (19) AICM = 2 ˆ ℓmax − 2 ˆ d (20) = 2 ˆ ℓmax − 4s 2 ℓ (21) = 2( ¯ ℓ − s 2 ℓ). (22) Thus AICM is seen to be a very simply computed pe... |

986 | Bayes factors
- Kass, Raftery
- 1995
(Show Context)
Citation Context ...integrated likelihood, also called the marginal likelihood or the normalizing constant, is an important quantity in Bayesian model comparison and testing: it is the key component of the Bayes factor (=-=Kass and Raftery 1995-=-; Chipman, George, and McCulloch 2001). The Bayes factor is the ratio of the integrated likelihoods for the two models being compared. When taking account of model uncertainty using Bayesian model ave... |

416 | Inference of population structure using multilocus genotype data: dominant markers and null alleles - Falush, Stephens, et al. - 2007 |

326 | Marginal likelihood from the Gibbs output
- Chib
- 1995
(Show Context)
Citation Context ...M(y) is consistent as the simulation size B increases, its precision is not guaranteed. The simplicity of the harmonic mean estimator (2) is its main advantage over other more specialized techniques (=-=Chib 1995-=-; Green 1995; Meng and Wong 1996; Raftery 1996; Lewis and Raftery 1997; DiCiccio, Kass, Raftery, and Wasserman 1997; Chib and Jeliazkov 2001). It uses only within-model posterior samples and likelihoo... |

259 | Bayesian model selection in social research
- Raftery
- 1995
(Show Context)
Citation Context ...d likelihood, also called the marginal likelihood or the normalizing constant, is an important quantity in Bayesian model comparison and testing: it is the key component of the Bayes factor (Kass and =-=Raftery 1995-=-; Chipman, George, and McCulloch 2001). The Bayes factor is the ratio of the integrated likelihoods for the two models being compared. When taking account of model uncertainty using Bayesian model ave... |

157 | Latent Space Approaches to Social Network Analysis - Hoff, Raftery, et al. - 2001 |

126 | A reference Bayesian test for nested hypotheses and its relationship to the schwarz criterion
- Kass, Wasserman
- 1995
(Show Context)
Citation Context ...or in log ˆπBIC(y) tends to zero asymptotically. If the prior π(θ) is a normal unit information prior, then the approximation is more accurate and the OP (1) term in (16) is replaced by OP (n −1/2 ) (=-=Kass and Wasserman 1995-=-; Raftery 1995). We have that α = d/2, and so − log(1 − λ) in (14) corresponds to log(n) in (15). We already have estimates of ℓmax and α in (14), and so to obtain an estimate of the integrated likeli... |

112 | Computational and inferential difficulties with mixture posterior distributions - Celeux, Hurn, et al. - 2000 |

111 | Dealing with label-switching in mixture models
- Stephens
- 2000
(Show Context)
Citation Context ...). A similar problem arises when there is near posterior nonidentifiability such as label-switching in mixture models or random effects without identifying constraints (Celeux, Hurn, and Robert 2000; =-=Stephens 2000-=-). One way around this is to use a posterior mode of θ instead of ¯ θ, but Richardson (2002) gave several examples of mixture models where pD with this definition inadequately penalizes model complexi... |

101 |
Marginal Likelihood from Metropolis-Hastings Output
- Chib, Jeliazkov
- 2001
(Show Context)
Citation Context ...timator (2) is its main advantage over other more specialized techniques (Chib 1995; Green 1995; Meng and Wong 1996; Raftery 1996; Lewis and Raftery 1997; DiCiccio, Kass, Raftery, and Wasserman 1997; =-=Chib and Jeliazkov 2001-=-). It uses only within-model posterior samples and likelihood evaluations which are often 1savailable anyway as part of posterior sampling. A major drawback of the harmonic mean estimator is its compu... |

94 | Bayesian Model Averaging: A Tutorial (with Discussion - Hoeting, Madigan, et al. - 1999 |

84 | The practical implementation of Bayesian model selection - Chipman, George, et al. - 2001 |

81 | Markov Chain Monte Carlo Methods for Stochastic Volatility Models - Chib, Nardari, et al. - 2002 |

80 |
Approximate Bayesian inference by the weighted likelihood bootstrap (with discussion
- Newton, Raftery
- 1994
(Show Context)
Citation Context ...most obvious esimator from this, the sample posterior harmonic mean of the likelihoods, is unbiased and simulation24sconsistent, but does not have finite variance in general and so is often unstable (=-=Newton and Raftery 1994-=-). We have investigated two approaches to more stable estimation of the integrated likelihood using the harmonic mean identity. The first is to reduce the parameter space and then use the sample poste... |

66 | Computing Bayes factors by combining simulation and asymptotic approximations - DiCiccio, Kass, et al. - 1995 |

59 |
Computing Bayes factors using a generalization of the Savage-Dickey density ratio
- Verdinelli, Wasserman
- 1995
(Show Context)
Citation Context ...ve been proposed for estimating Bayes factors, or ratios of integrated likelihoods, but not the integrated likelihoods themselves. These include the Savage-Dickey 27sratio and a generalization of it (=-=Verdinelli and Wasserman 1995-=-), and bridge sampling (Meng and Wong 1996; Mira and Nicholls 2004). Johnson (1999) has proposed a method for estimating the integrated likelihood that involves simulating from a second density as wel... |

57 | Bayesian Statistics and Marketing - Rossi, Allenby, et al. - 2005 |

44 |
Hypothesis testing and model selection
- Raftery
- 1996
(Show Context)
Citation Context ... increases, its precision is not guaranteed. The simplicity of the harmonic mean estimator (2) is its main advantage over other more specialized techniques (Chib 1995; Green 1995; Meng and Wong 1996; =-=Raftery 1996-=-; Lewis and Raftery 1997; DiCiccio, Kass, Raftery, and Wasserman 1997; Chib and Jeliazkov 2001). It uses only within-model posterior samples and likelihood evaluations which are often 1savailable anyw... |

44 | der Linde (2002). Bayesian measures of model complexity and fit - Spiegelhalter, Best, et al. |

36 | Markov Chain Monte Carlo methods for computing Bayes factors: A comparative review - Han, Carlin - 2001 |

35 | Latent space approaches to social network analysis - xxi, Raftery, et al. - 2002 |

34 | R2WinBUGS: A package for running WinBUGS from R - Sturtz, Ligges, et al. - 2005 |

33 | Estimating Bayes factors via posterior simulation with the LaplaceMetropolis estimator
- Lewis, Raftery
- 1997
(Show Context)
Citation Context ...s precision is not guaranteed. The simplicity of the harmonic mean estimator (2) is its main advantage over other more specialized techniques (Chib 1995; Green 1995; Meng and Wong 1996; Raftery 1996; =-=Lewis and Raftery 1997-=-; DiCiccio, Kass, Raftery, and Wasserman 1997; Chib and Jeliazkov 2001). It uses only within-model posterior samples and likelihood evaluations which are often 1savailable anyway as part of posterior ... |

31 | A Novitiate in a period of change: An experimental and case study of social relationships - Sampson - 1968 |

30 | On the Relationship Between Markov Chain Monte Carlo Methods for Model Uncertainty - Godsill - 2001 |

26 | Approximation and consistency of Bayes factors as model dimension grows - Berger, Ghosh, et al. - 2003 |

22 | Marginal likelihood and Bayes factors for Dirichlet process mixture models - Basu, Chib - 2003 |

22 |
Fisherian inference in likelihood and prequential frames of reference
- Dawid, A
- 1991
(Show Context)
Citation Context ...(α, 1), (10) where ℓmax is the maximum achievable loglikelihood, and α = d/2 where d is the dimension of the parameter θ, i.e. the number of parameters in the underlying model (Bickel and Ghosh 1990; =-=Dawid 1991-=-). In (10), a Gamma(α, λ −1 ) distribution with shape parameter α and scale parameter λ has the density fX(x) = xα−1 exp(−x/λ) Γ(α)λ α . (11) 15sWith this definition, E(X) = αλ, and Var(X) = αλ 2 . Th... |

21 | Efficient Bayes factor estimation from the reversible jump output - Bartolucci, Scaccia, et al. - 2006 |

21 |
A decomposition for the likelihood ratio statistic and the Bartlett correction--A Bayesian argument, Ann
- Bickel, Ghosh
- 1990
(Show Context)
Citation Context ...en by ℓmax − ℓt ∼ Gamma(α, 1), (10) where ℓmax is the maximum achievable loglikelihood, and α = d/2 where d is the dimension of the parameter θ, i.e. the number of parameters in the underlying model (=-=Bickel and Ghosh 1990-=-; Dawid 1991). In (10), a Gamma(α, λ −1 ) distribution with shape parameter α and scale parameter λ has the density fX(x) = xα−1 exp(−x/λ) Γ(α)λ α . (11) 15sWith this definition, E(X) = αλ, and Var(X)... |

20 | Importance-Weighted Marginal Bayesian Posterior Density Estimation - Chen - 1994 |

20 |
Bayesian estimation of finite mixture distributions
- Diebolt, Robert
- 1994
(Show Context)
Citation Context ...t be well defined in situations where the meaning of ¯ θ is not clear, such as multinomial parameters, or finite mixture models where the unobserved group memberships are included in the MCMC scheme (=-=Diebolt and Robert 1994-=-). A similar problem arises when there is near posterior nonidentifiability such as label-switching in mixture models or random effects without identifying constraints (Celeux, Hurn, and Robert 2000; ... |

20 |
The Schwartz criterion and related methods for normal linear models
- Pauler
- 1998
(Show Context)
Citation Context ...when there are no data relevant to that parameter, assigning no penalty in that case, which seems appropriate. In general, determining nk involves assessing the Fisher or observed information for θk (=-=Pauler 1998-=-), but we will take as a rough approximation the number of data points that participate in the estimation of θk. This leads to a modified definition of BICM. Parameters are divided into classes accord... |

18 |
Reversible-Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination
- Green
- 1995
(Show Context)
Citation Context ...sistent as the simulation size B increases, its precision is not guaranteed. The simplicity of the harmonic mean estimator (2) is its main advantage over other more specialized techniques (Chib 1995; =-=Green 1995-=-; Meng and Wong 1996; Raftery 1996; Lewis and Raftery 1997; DiCiccio, Kass, Raftery, and Wasserman 1997; Chib and Jeliazkov 2001). It uses only within-model posterior samples and likelihood evaluation... |

18 | Transdimensional Markov chains: A decade of progress and future perspectives - Sisson - 2005 |

18 | 2000), ‘Bayesian information criterion for censored survival models - Volinsky, Raftery |

16 | Asymptotics and the theory of inference - Reid |

16 | A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo - Satagopan, Yandell, et al. - 1996 |

15 | Erroneous results in “Marginal likelihood from the Gibbs output - Neal - 1999 |

14 | Sequential ordinal modeling with applications to survival data - Albert, Chib - 2001 |

14 | Computing normalizing constants for finite mixture models via incremental mixture importance sampling (IMIS - Steele, Raftery, et al. - 2006 |

13 |
Scale mixtures of normality
- Andrews, Mallows
- 1974
(Show Context)
Citation Context ...ilize the harmonic mean estimator and obtain estimates that are much more accurate, but still easy to calculate. Another application of our first stabilization approach includes robust linear models (=-=Andrews and Mallows 1974-=-; Carlin and Louis 1996). The robust linear model has an error term distributed as Z/ √ U, where Z and U are independent, Z has a centered normal distribution, and U has a χ 2 distribution. The standa... |

13 | at al. Statistical issues in the search for genes affecting quantitative traits in experimental populations. Stat. Sci - Doerge |

7 | Bridge Estimation of the Probability Density at a Point
- Mira, Nicholls
- 2004
(Show Context)
Citation Context ...kelihoods, but not the integrated likelihoods themselves. These include the Savage-Dickey 27sratio and a generalization of it (Verdinelli and Wasserman 1995), and bridge sampling (Meng and Wong 1996; =-=Mira and Nicholls 2004-=-). Johnson (1999) has proposed a method for estimating the integrated likelihood that involves simulating from a second density as well as the posterior; it seems that for its performance to be good t... |

7 | Bayesian data analysis (2 nd Ed.). Boca Raton, FL - Gelman, Carlin, et al. - 2004 |

6 | A Comparison of Marginal Likelihood Computation Methods - Bos - 2002 |

5 | Posterior Distributions on Normalizing Constants - Johnson - 1999 |

5 | Estimation of posterior density functions from a posterior sample - Oh - 1999 |