## Mixtures of g-priors for Bayesian variable selection (2008)

Venue: | Journal of the American Statistical Association |

Citations: | 36 - 4 self |

### BibTeX

@ARTICLE{Liang08mixturesof,

author = {Feng Liang and Rui Paulo and German Molina and Merlise A. Clyde and Jim O. Berger},

title = {Mixtures of g-priors for Bayesian variable selection},

journal = {Journal of the American Statistical Association},

year = {2008},

pages = {423}

}

### OpenURL

### Abstract

Zellner’s g-prior remains a popular conventional prior for use in Bayesian variable selection, despite several undesirable consistency issues. In this paper, we study mixtures of g-priors as an alternative to default g-priors that resolve many of the problems with the original formulation, while maintaining the computational tractability that has made the g-prior so popular. We present theoretical properties of the mixture g-priors and provide real and simulated examples to compare the mixture formulation with fixed g-priors, Empirical Bayes approaches and other default procedures.

### Citations

2057 |
Handbook of mathematical functions
- Abramowitz, Stegun
- 1972
(Show Context)
Citation Context ... p(g | Y, Mγ) = pγ + a − 2 2 2F1( n−1 pγ+a 2 , 1; 2 ; R2 γ) (1 + g)(n−1−pγ−a)/2[ 1 + (1 − R 2 γ )g] −(n−1)/2 where 2F1(a, b; c; z) in the normalizing constant is the Gaussian hypergeometric function (=-=Abramowitz and Stegun 1970-=-, Section 15). The integral representing 2F1(a, b; c; z) is convergent for real |z| < 1 with c > b > 0 and for z = ±1 only if c > a + b and b > 0. As the normalizing constant in the prior on g is also... |

984 | Bayes factors - Kass, Raftery - 1995 |

526 |
Theory of probability
- Jeffreys
- 1961
(Show Context)
Citation Context ...onstants cancel in the posterior distribution of the model-specific parameters. However, these constants remain in marginal likelihoods leading to indeterminate model probabilities and Bayes factors (=-=Jeffreys 1961-=-; Berger and Pericchi 2001). To avoid indeterminacies in posterior model probabilities, proper priors for βγ under each model are usually required. Conventional proper priors for variable selection in... |

331 |
Variable Selection via Gibbs Sampling
- George, McCulloch
- 1993
(Show Context)
Citation Context ...ng history (Leamer 1978a,b; Mitchell and Beauchamp 1988; Zellner 1971, Sec 10.4), the advent of Markov chain Monte Carlo methods catalyzed Bayesian model selection and averaging in regression models (=-=George and McCulloch 1993-=-, 1997; Geweke 1996; Raftery, Madigan and Hoeting 1997; Smith and Kohn 1996; Clyde and George 2004; Hoeting, Madigan, Raftery and Volinsky 1999). Prior density choice for Bayesian model selection and ... |

235 | Estimating optimal transformation for multiple regression and correlation (with discussion - Breiman, Friedman - 1985 |

234 | Subset selection in regression - Miller - 2002 |

205 |
An Introduction to Bayesian Inference in Econometrics
- Zellner
- 1917
(Show Context)
Citation Context ...ntegrating the likelihood with respect to the prior distribution for model specific parameters θγ. Whereas Bayesian variable selection has a long history (Leamer 1978a,b; Mitchell and Beauchamp 1988; =-=Zellner 1971-=-, Sec 10.4), the advent of Markov chain Monte Carlo methods catalyzed Bayesian model selection and averaging in regression models (George and McCulloch 1993, 1997; Geweke 1996; Raftery, Madigan and Ho... |

197 |
Accurate Approximations for Posterior Moments and Marginal Densities
- Tierney, Kadane
- 1986
(Show Context)
Citation Context ...mpirical Bayes solution) and leads to an improved normal approximation to the integral as the variable of integration is no longer restricted. Details of the fully exponential Laplace approximations (=-=Tierney and Kadane 1986-=-) of order O(n −1 ) to the expression (17), and of order O(n −2 ) for the ratios in (18) and (19) are given in Appendix A. 4 Consistency So far in this paper, we have considered several alternatives t... |

186 | Bayesian model averaging for linear regression models
- Raftery, Madigan, et al.
- 1997
(Show Context)
Citation Context ... in the normal linear model have been based on the conjugate Normal-Gamma family for θγ or limiting versions, allowing closed form calculations of all marginal likelihoods (George and McCulloch 1997; =-=Raftery et al. 1997-=-; Berger and Pericchi 2001). Zellner’s (1986) g-prior for βγ, βγ | φ, Mγ ∼ N ( 0, g φ (XTγ Xγ) −1 ) has been widely adopted because of its computational efficiency in evaluating marginal likelihoods a... |

183 |
Specification Searches: Ad Hoc Inference with Nonexperimental Data
- Leamer
- 1978
(Show Context)
Citation Context ...= Θγ p(Y | θγ, Mγ)p(θγ | Mγ)dθγ obtained by integrating the likelihood with respect to the prior distribution for model specific parameters θγ. Whereas Bayesian variable selection has a long history (=-=Leamer 1978-=-a,b; Mitchell and Beauchamp 1988; Zellner 1971, Sec 10.4), the advent of Markov chain Monte Carlo methods catalyzed Bayesian model selection and averaging in regression models (George and McCulloch 19... |

162 |
The selection of prior distributions by formal rules
- Kass, Wasserman
- 1996
(Show Context)
Citation Context ... model spaces, such as in nonparametric regression using spline and wavelet bases. Thus, it is often necessary to resort to specification of priors using some formal method (Berger and Pericchi 2001; =-=Kass and Wasserman 1996-=-). In general, the use of improper priors for model specific parameters is not permitted in the context of model selection, as improper priors are determined only up to an arbitrary multiplicative con... |

146 | Model Selection and the Principle of Minimum Description Length
- Hansen, Yu
- 1975
(Show Context)
Citation Context ...ication for α, βγ and φ under Mγ. Most references to g-priors in the variable selection literature refer to the above version (Berger and Pericchi 2001; George and Foster 2000; Clyde and George 2000; =-=Hansen and Yu 2001-=-; Fernández et al. 2001). Continuing with this tradition, we will also refer to the priors in (3-4) simply as Zellner’s g-prior. A major advantage of Zellner’s g-prior is the computational efficiency ... |

136 | Nonparametric regression using Bayesian variable selection
- Smith, Kohn
- 1996
(Show Context)
Citation Context ... the advent of Markov chain Monte Carlo methods catalyzed Bayesian model selection and averaging in regression models (George and McCulloch 1993, 1997; Geweke 1996; Raftery, Madigan and Hoeting 1997; =-=Smith and Kohn 1996-=-; Clyde and George 2004; Hoeting, Madigan, Raftery and Volinsky 1999). Prior density choice for Bayesian model selection and 2model averaging, however, remains an open area (Clyde and George 2004; Be... |

126 | The risk inflation criterion for multiple regression - Foster, George - 1994 |

126 | Approaches for Bayesian variable selection
- George, McCulloch
- 1997
(Show Context)
Citation Context ...iors for variable selection in the normal linear model have been based on the conjugate Normal-Gamma family for θγ or limiting versions, allowing closed form calculations of all marginal likelihoods (=-=George and McCulloch 1997-=-; Raftery et al. 1997; Berger and Pericchi 2001). Zellner’s (1986) g-prior for βγ, βγ | φ, Mγ ∼ N ( 0, g φ (XTγ Xγ) −1 ) has been widely adopted because of its computational efficiency in evaluating m... |

126 | A Reference Bayesian Test for Nested Hypotheses and its Relationship to the Schwarz - Kass, Wasserman - 1995 |

125 | Finding the observed information matrix when using the EM algorithm - Louis - 1982 |

117 | Calibration and empirical Bayes Variable Selection
- George, Foster
- 2000
(Show Context)
Citation Context ...( 0, g φ (XT −1 γ Xγ) as a default prior specification for α, βγ and φ under Mγ. Most references to g-priors in the variable selection literature refer to the above version (Berger and Pericchi 2001; =-=George and Foster 2000-=-; Clyde and George 2000; Hansen and Yu 2001; Fernández et al. 2001). Continuing with this tradition, we will also refer to the priors in (3-4) simply as Zellner’s g-prior. A major advantage of Zellner... |

102 |
J.J.: Bayesian variable selection in linear regression
- Mitchell, Beauchamp
- 1988
(Show Context)
Citation Context ...γ)p(θγ | Mγ)dθγ obtained by integrating the likelihood with respect to the prior distribution for model specific parameters θγ. Whereas Bayesian variable selection has a long history (Leamer 1978a,b; =-=Mitchell and Beauchamp 1988-=-; Zellner 1971, Sec 10.4), the advent of Markov chain Monte Carlo methods catalyzed Bayesian model selection and averaging in regression models (George and McCulloch 1993, 1997; Geweke 1996; Raftery, ... |

95 | Benchmark priors for Bayesian model averaging - Fernández, Ley, et al. - 2001 |

93 | Bayesian model averaging: A tutorial (with discussion). Statistical Science 14, 382–401. [A corrected version is available online at www.stat.washington.edu/www/research/online/ hoeting1999.pdf - Hoeting, Madigan, et al. - 1999 |

86 | Empirical Bayes selection of wavelet thresholds
- Johnstone, Silverman
- 2005
(Show Context)
Citation Context ...etting, this is not an unreasonable restriction. For the large p small n setting, independent priors on regression coefficients have become popular for inducing shrinkage (Wolfe, Godsill and Ng 2004; =-=Johnstone and Silverman 2005-=-). Many of these priors are represented as scale mixtures of normals and therefore may be thought of as scale mixtures of independent g-priors. In conjunction with point masses at zero, these independ... |

73 |
On assessing prior distributions and bayesian regression analysis with g-prior distributions. Bayesian Inference and Decision
- Zellner
- 1986
(Show Context)
Citation Context ... most importantly, because of its simple, understandable interpretation as arising from the analysis of a conceptual sample generated using the same design matrix X as employed in the current sample (=-=Zellner 1986-=-; George and McCulloch 1997; Smith and Kohn 1996; Fernández, Ley and Steel 2001). George and Foster (2000) showed how g could be calibrated based on many popular model selection criteria, such as AIC,... |

69 | Flexible empirical Bayes estimation for wavelets
- Clyde, George
- 2000
(Show Context)
Citation Context ... a default prior specification for α, βγ and φ under Mγ. Most references to g-priors in the variable selection literature refer to the above version (Berger and Pericchi 2001; George and Foster 2000; =-=Clyde and George 2000-=-; Hansen and Yu 2001; Fernández et al. 2001). Continuing with this tradition, we will also refer to the priors in (3-4) simply as Zellner’s g-prior. A major advantage of Zellner’s g-prior is the compu... |

63 | Variable Selection and Model Comparison in Regression
- Geweke
- 1996
(Show Context)
Citation Context ...ell and Beauchamp 1988; Zellner 1971, Sec 10.4), the advent of Markov chain Monte Carlo methods catalyzed Bayesian model selection and averaging in regression models (George and McCulloch 1993, 1997; =-=Geweke 1996-=-; Raftery, Madigan and Hoeting 1997; Smith and Kohn 1996; Clyde and George 2004; Hoeting, Madigan, Raftery and Volinsky 1999). Prior density choice for Bayesian model selection and 2model averaging, ... |

48 | Optimal predictive model selection
- Barbieri, Berger
(Show Context)
Citation Context ...sterior probability model (HPM), selection of the median probability model (MPM) which is defined as the model where a variable is included if the marginal inclusion probability p(βj ̸= 0 | Y) > 1/2 (=-=Barbieri and Berger 2004-=-), and Bayesian Model Averaging (BMA). In both HPM and MPM, the point estimate is the posterior mean of βγ under the selected model. For BIC, the log marginal for model Mγ is defined as log p(Y | Mγ) ... |

39 | The Variable Selection Problem
- George
- 2000
(Show Context)
Citation Context ... prior specification for α, βγ and φ under Mγ. Most references to g-priors in the variable selection literature refer to the above version (Berger and Pericchi 2001; George and Foster 2000; Clyde and =-=George 2000-=-; Hansen and Yu 2001; Fernández et al. 2001). Continuing with this tradition, we will also refer to the priors in (3-4) simply as Zellner’s g-prior. A major advantage of Zellner’s g-prior is the compu... |

33 | Proper Bayes minimax estimators of multivariate normal mean - Strawderman - 1971 |

32 |
Objective Bayesian methods for model selection: introduction and comparison (with discussion
- Berger, Pericchi
- 2001
(Show Context)
Citation Context ...96; Clyde and George 2004; Hoeting, Madigan, Raftery and Volinsky 1999). Prior density choice for Bayesian model selection and 2model averaging, however, remains an open area (Clyde and George 2004; =-=Berger and Pericchi 2001-=-). Subjective elicitation of priors for model-specific coefficients is often precluded, particularly in high-dimensional model spaces, such as in nonparametric regression using spline and wavelet base... |

30 | Bayes factors and marginal distributions in invariant situations - Berger, Pericchi, et al. - 1998 |

23 | Bayesian variable selection and regularisation for time–frequency surface estimation - Wolfe, Godsill, et al. - 2004 |

21 |
Posterior odds ratios for selected regression hypotheses
- Zellner, Siow
- 1980
(Show Context)
Citation Context ...ity) that the full model has been parameterized in a block orthogonal fashion such that 1 T [Xγ, X−γ] = 0 and X T γ X−γ = 0, in order to justify treating α and βγ as common parameters to both models (=-=Zellner and Siow 1980-=-). This leads to the following g-priors for the full-based Bayes factors, Mγ : p(α, φ, βγ) ∝ 1/φ, MF : p(α, φ, βγ) ∝ 1/φ, β −γ | φ ∼ N ( 0, g φ (XT−γX−γ) −1 ) , (7) with the resulting Bayes factor for... |

20 | Model uncertainty
- Clyde, George
- 2004
(Show Context)
Citation Context ... chain Monte Carlo methods catalyzed Bayesian model selection and averaging in regression models (George and McCulloch 1993, 1997; Geweke 1996; Raftery, Madigan and Hoeting 1997; Smith and Kohn 1996; =-=Clyde and George 2004-=-; Hoeting, Madigan, Raftery and Volinsky 1999). Prior density choice for Bayesian model selection and 2model averaging, however, remains an open area (Clyde and George 2004; Berger and Pericchi 2001)... |

18 | 2002a): “Laplace approximations for hypergeometric functions with matrix argument - Butler, Wood |

14 | A comment on d. v. lindley’s statistical paradox - Bartlett - 1957 |

14 | Participation in illegitimate activities: Ehrlich revisited - Vandaele - 1978 |

12 | Empirical Bayes vs. Fully Bayes variable selection - Cui, George - 2008 |

9 |
Discussion of “Model averaging and model search strategies” by M. Clyde
- George
- 1999
(Show Context)
Citation Context ...ior on ω which induces a uniform prior over the model size and therefore favors models with small or large sizes and contrast this to EB estimates of ω. Other types of priors include dilution priors (=-=George 1999-=-) that “dilute” probabilities across neighborhood of similar models, and priors that correct the so-called “selection effect” in choice among many models (Jeffreys 1961; Zellner and Min 1997). While w... |

8 | Bayes factors and approximations for variance component models - Pauler, Wakefield, et al. - 1999 |

6 |
Regression selection strategies and revealed priors
- Leamer
- 1978
(Show Context)
Citation Context ...= Θγ p(Y | θγ, Mγ)p(θγ | Mγ)dθγ obtained by integrating the likelihood with respect to the prior distribution for model specific parameters θγ. Whereas Bayesian variable selection has a long history (=-=Leamer 1978-=-a,b; Mitchell and Beauchamp 1988; Zellner 1971, Sec 10.4), the advent of Markov chain Monte Carlo methods catalyzed Bayesian model selection and averaging in regression models (George and McCulloch 19... |

6 |
Baysian analysis, model selection and prediction
- Zellner, Min
- 1997
(Show Context)
Citation Context ...e dilution priors (George 1999) that “dilute” probabilities across neighborhood of similar models, and priors that correct the so-called “selection effect” in choice among many models (Jeffreys 1961; =-=Zellner and Min 1997-=-). While we have assumed that Xγ is full rank, the g-prior formulation may be extended to the nonfull rank setting such as in ANOVA models by replacing the inverse of X T γ Xγ in the g-prior with a ge... |

4 | Objective Bayes Variable Selection - Casella, Moreno - 2006 |

1 | based on expanding a smooth unimodal function h(θ) in a Taylor’s series expansion about ˆ θ, the mode of h. The Laplace approximation leads to an O(n−1 ) approximation to the integral, ∫ exp(h(θ)) dθ ≈ √ 2π ˆσh h( ˆ θ) (26) Θ where [ −d2h(θ) ˆσh = dθ2 ∣ ∣ - approximation - 1986 |

1 | 32 are several alternatives to the standard Laplace approximation. One approach when the mode is on the boundary is to use a Laplace approximation over the expanded parameter space as in - Pauler, Kass - 1999 |

1 | Mγ ′: By the result in Fernández et al. (2001), (RSSγ/RSSγ ′)n/2 converges in distribution to exp(χ2 p /2). Combining this result with the fact that the first term goes to zero γ ′−pγ (since pγ ′ > pγ), we have that the Bayes factor converges to zero. (c) - Mγ |