## Bayesian Adaptive Sampling for Variable Selection and Model Averaging

### Cached

### Download Links

Citations: | 9 - 4 self |

### BibTeX

@MISC{Clyde_bayesianadaptive,

author = {Merlise Clyde and Joyee Ghosh and Michael Littman},

title = {Bayesian Adaptive Sampling for Variable Selection and Model Averaging},

year = {}

}

### OpenURL

### Abstract

For the problem of model choice in linear regression, we introduce a Bayesian adaptive sampling algorithm (BAS), that samples models without replacement from the space of models. For problems that permit enumeration of all models BAS is guaranteed to enumerate the model space in 2 p iterations where p is the number of potential variables under consideration. For larger problems where sampling is required, we provide conditions under which BAS provides perfect samples without replacement. When the sampling probabilities in the algorithm are the marginal variable inclusion probabilities, BAS may be viewed as sampling models “near ” the median probability model of Barbieri and Berger. As marginal inclusion probabilities are not known in advance we discuss several strategies to estimate adaptively the marginal inclusion probabilities within BAS. We illustrate the performance of the algorithm using simulated and real data and show that BAS can outperform Markov chain Monte Carlo methods. The algorithm is implemented in the R package BAS available at CRAN.

### Citations

239 |
On the Mathematical Foundations of Theoretical Statistics
- Fisher
- 1922
(Show Context)
Citation Context ...ion size i.e. (9) T = 2 p , these estimates recover the true marginal posterior inclusion probabilities, πj. This is similar in spirit to a desirable finite sample property called Fisher-consistency (=-=Fisher 1922-=-) where the estimator will recover the population quantities when applied to the entire population. Adaptive updating does come with a price. If we update ρ to ˆπ (t) , we also have to 9re-normalize ... |

226 | Bayesian graphical models for discrete data - Madigan, York - 1995 |

209 |
A generalization of sampling without replacement from a finite universe
- Horvitz, Thompson
- 1952
(Show Context)
Citation Context ...qual to 1/2 corresponds to equal probability sampling or SRSWOR initially. The estimated marginal inclusion probabilities using (11) after the first U draws are a ratio of Horvitz-Thompson estimates (=-=Horvitz and Thompson 1952-=-) and are approximately unbiased (Thompson 1992, Ch. 6). 3.2.2 P-value Calibration A simple strategy is to calibrate p-values to Bayes factors and then probabilities using the results of Selke et al. ... |

186 | Bayesian model averaging for linear regression models
- Raftery, Madigan, et al.
- 1997
(Show Context)
Citation Context ...n dimension. Because leaps considers all dimensions, this can be inefficient in large problems. Stochastic search variable selection, SSVS, (George and McCulloch 1997) and the related MC 3 algorithm (=-=Raftery et al. 1997-=-) are popular Markov Chain Monte Carlo (MCMC) algorithms that can be viewed as providing a (dependent) stochastic sample of models from the posterior distribution on Γ. Historically, the conjugate Nor... |

136 | Nonparametric regression using Bayesian variable selection
- Smith, Kohn
- 1996
(Show Context)
Citation Context ... from the posterior distribution on Γ. Historically, the conjugate Normal-Gamma family of prior distributions has received widespread attention for model choice in linear models (Raftery et al. 1997; =-=Smith and Kohn 1996-=-; George and McCulloch 1997) as marginal likelihoods can be evaluated analytically. (5) 3Of these, Zellner’s g-prior (Zellner 1986) remains perhaps the most popular conventional prior distribution wi... |

126 | Approaches for Bayesian variable selection
- George, McCulloch
- 1997
(Show Context)
Citation Context ... 1999) which attempts to find the q “best” models of a given dimension. Because leaps considers all dimensions, this can be inefficient in large problems. Stochastic search variable selection, SSVS, (=-=George and McCulloch 1997-=-) and the related MC 3 algorithm (Raftery et al. 1997) are popular Markov Chain Monte Carlo (MCMC) algorithms that can be viewed as providing a (dependent) stochastic sample of models from the posteri... |

108 | Regression by Leaps and Bounds - Furnival, Wilson - 1974 |

93 |
Bayesian model averaging: A tutorial (with discussion). Statistical Science 14, 382–401. [A corrected version is available online at www.stat.washington.edu/www/research/online/ hoeting1999.pdf
- Hoeting, Madigan, et al.
- 1999
(Show Context)
Citation Context ...to determine an optimal model, or to make inferences and predictions based on BMA. The BMA package in R utilizes deterministic sampling using the leaps and bounds algorithm (Furnival and Wilson 1974; =-=Hoeting et al. 1999-=-) which attempts to find the q “best” models of a given dimension. Because leaps considers all dimensions, this can be inefficient in large problems. Stochastic search variable selection, SSVS, (Georg... |

75 | Automatic Bayesian Curve Fitting - Denison, Mallick, et al. - 1998 |

73 |
On assessing prior distributions and bayesian regression analysis with g-prior distributions. Bayesian Inference and Decision
- Zellner
- 1986
(Show Context)
Citation Context ...ntion for model choice in linear models (Raftery et al. 1997; Smith and Kohn 1996; George and McCulloch 1997) as marginal likelihoods can be evaluated analytically. (5) 3Of these, Zellner’s g-prior (=-=Zellner 1986-=-) remains perhaps the most popular conventional prior distribution with marginal likelihoods that may be expressed as a simple function of the model R2 . Mixtures of Zellner’s g-prior, such as the Zel... |

50 | Prediction via orthogonalized model mixing
- Clyde, Desimone, et al.
- 1996
(Show Context)
Citation Context ... 4algorithms on several criteria. Sections 5 and 6 illustrate the method in two real data sets: the U.S. crime data, where enumeration is feasible, and the moderate dimension protein construct data (=-=Clyde et al. 1996-=-) where exhaustive search is not possible. In Section 7 we conclude with recommendations and a discussion of possible extensions. 2 SAMPLING WITHOUT REPLACEMENT Sampling without replacement from a fin... |

48 | Optimal predictive model selection - Barbieri, Berger |

43 |
Bayesian Model Averaging and Model Search Strategies,” in Bayesian Statistic 6
- Clyde
- 1999
(Show Context)
Citation Context ...” samples without replacement from Γ. In practice, the sequence of conditional probabilities are generally unknown unless there is additional structure in the problem, such as posterior independence (=-=Clyde 1999-=-) as in the case of design matrices with orthogonal columns or limited dependence such as a Markov property. Otherwise in the general case, the computational complexity of finding all conditional prob... |

36 | Mixtures of g-priors for Bayesian variable selection
- Liang, Paulo, et al.
(Show Context)
Citation Context ...egression parameters were chosen as α = 2, β = (−0.48, 8.72, −1.76, −1.87, 0, 0, 0, 0, 4.00, 0, 0, 0,0, 0,0) ′ and φ = 1. For the parameters in each model (1), we use Zellner’s g-prior (Zellner 1986; =-=Liang et al. 2008-=-) with g = n, p(α, φ | γ) ∝ 1/φ, βγ | γ, φ ∼ Npγ ( 0, g(Xγ ′ Xγ) −1 /φ ) (14) and pγ is the rank of Xγ, which leads to the marginal likelihood of a model proportional to n−pγ −1 p(Y | Mγ) ∝ (1 + g) 2 ... |

31 | Centroid estimation in discrete high-dimensional spaces with applications in biology - Carvalho, Lawrence - 2008 |

24 | Evolutionary Monte Carlo: Applications to Cp model sampling and change point problem
- Liang, Wong
- 2000
(Show Context)
Citation Context ...advanced algorithms for the variable selection problem that utilize other proposals include adaptive MCMC (Nott (5) 3and Kohn 2005), Swendsen-Wang (Nott and Green 2004) and Evolutionary Monte Carlo (=-=Liang and Wong 2000-=-; Wilson et al. 2010; Bottolo and Richardson 2008). Historically, the conjugate Normal-Gamma family of prior distributions has received widespread attention for model choice in linear models (Raftery ... |

21 | Adaptive Sampling for Bayesian Variable Selection - Nott, Kohn - 2005 |

16 | Bayesian variable selection and the Swendsen-Wang algorithm
- Nott, Green
- 2004
(Show Context)
Citation Context ...poorly when covariates are highly correlated; more advanced algorithms for the variable selection problem that utilize other proposals include adaptive MCMC (Nott (5) 3and Kohn 2005), Swendsen-Wang (=-=Nott and Green 2004-=-) and Evolutionary Monte Carlo (Liang and Wong 2000; Wilson et al. 2010; Bottolo and Richardson 2008). Historically, the conjugate Normal-Gamma family of prior distributions has received widespread at... |

14 | Participation in illegitimate activities: Ehrlich revisited - Vandaele - 1978 |

9 |
Discussion of “Model averaging and model search strategies” by M. Clyde
- George
- 1999
(Show Context)
Citation Context ...ion probabilities converge almost surely to p(Mγ | Y) and πj, respectively. The second approach is based on the estimates of model probabilities normalized over a subset of models (Clyde et al. 1996; =-=George 1999-=-). In (10-11), the probability of any unsampled model is estimated as zero, while the Monte Carlo frequencies for models in U are noisy versions of the conditional probabilities of models restricted t... |

5 | Calibration of p-values for testing precise null hypotheses, The American Statistician 55 - Selke, Bayarri, et al. - 2001 |

3 | Protein construct storage: Bayesian variable selection and prediction with mixtures - Clyde, Parmigiani - 1998 |

3 |
Evolutionary stochastic search
- Bottolo, Richardson
- 2008
(Show Context)
Citation Context ...ction problem that utilize other proposals include adaptive MCMC (Nott (5) 3and Kohn 2005), Swendsen-Wang (Nott and Green 2004) and Evolutionary Monte Carlo (Liang and Wong 2000; Wilson et al. 2010; =-=Bottolo and Richardson 2008-=-). Historically, the conjugate Normal-Gamma family of prior distributions has received widespread attention for model choice in linear models (Raftery et al. 1997; Smith and Kohn 1996; George and McCu... |

3 |
A note on the bias in estimating posterior probabilities in variable selection,” Discussion Paper 2010-11
- Clyde, Ghosh
- 2010
(Show Context)
Citation Context ... I(γ ∈ U) (15) p(Y | Mγ)p(Mγ) ∑ γ∈U ˆπ RM j ∑ = γ∈U γj ˆp RM (Mγ | Y). (16) While biased under repeated sampling, the re-normalized estimates are both Fisher consistent and asymptotically consistent (=-=Clyde and Ghosh 2010-=-). 4 SIMULATED DATA We compare BAS to SRSWOR and MCMC methods using simulated data with p = 15 and n = 100 so that the exact posterior model probabilities may be obtained by enumeration of the model s... |

1 |
A note on the bias in estimating posterior marginal inclusion probabilities
- Clyde, Ghosh
- 2009
(Show Context)
Citation Context ...) ∑ I(γ ∈ U) (12) p(Y | Mγ)p(Mγ) γ∈U ˆπ RM j = ∑ γ∈U γj ˆp RM (Mγ | Y). (13) While biased under repeated sampling, the re-normalized estimates are both Fisher consistent and asymptotically consistent(=-=Clyde and Ghosh 2009-=-). 4 SIMULATED DATA We compare BAS to SRSWOR and MCMC methods using simulated data with p = 15 and n = 100 so that the exact posterior model probabilities may be obtained by enumeration of the model s... |

1 | Participation in illegitimate activities - Ehrlich revisited,” in Deterrence and Incapacitation - unknown authors - 1978 |

1 | Bayesian Model Search and Multilevel Inference for SNP Association Studies
- Wilson, Iversen, et al.
- 2010
(Show Context)
Citation Context ...or the variable selection problem that utilize other proposals include adaptive MCMC (Nott (5) 3and Kohn 2005), Swendsen-Wang (Nott and Green 2004) and Evolutionary Monte Carlo (Liang and Wong 2000; =-=Wilson et al. 2010-=-; Bottolo and Richardson 2008). Historically, the conjugate Normal-Gamma family of prior distributions has received widespread attention for model choice in linear models (Raftery et al. 1997; Smith a... |