Results 1  10
of
99
Calibration and Empirical Bayes Variable Selection
 Biometrika
, 1997
"... this paper, is that with F =2logp. This choice was proposed by Foster &G eorge (1994) where it was called the Risk Inflation Criterion (RIC) because it asymptotically minimises the maximum predictive risk inflation due to selection when X is orthogonal. This choice and its minimax property were ..."
Abstract

Cited by 146 (21 self)
 Add to MetaCart
this paper, is that with F =2logp. This choice was proposed by Foster &G eorge (1994) where it was called the Risk Inflation Criterion (RIC) because it asymptotically minimises the maximum predictive risk inflation due to selection when X is orthogonal. This choice and its minimax property were also discovered independently by Donoho & Johnstone (1994) in the wavelet regression context, where they refer to it as the universal hard thresholding rule
Benchmark Priors for Bayesian Model Averaging
 FORTHCOMING IN THE JOURNAL OF ECONOMETRICS
, 2001
"... In contrast to a posterior analysis given a particular sampling model, posterior model probabilities in the context of model uncertainty are typically rather sensitive to the specification of the prior. In particular, “diffuse” priors on modelspecific parameters can lead to quite unexpected consequ ..."
Abstract

Cited by 131 (5 self)
 Add to MetaCart
In contrast to a posterior analysis given a particular sampling model, posterior model probabilities in the context of model uncertainty are typically rather sensitive to the specification of the prior. In particular, “diffuse” priors on modelspecific parameters can lead to quite unexpected consequences. Here we focus on the practically relevant situation where we need to entertain a (large) number of sampling models and we have (or wish to use) little or no subjective prior information. We aim at providing an “automatic” or “benchmark” prior structure that can be used in such cases. We focus on the Normal linear regression model with uncertainty in the choice of regressors. We propose a partly noninformative prior structure related to a Natural Conjugate gprior specification, where the amount of subjective information requested from the user is limited to the choice of a single scalar hyperparameter g0j. The consequences of different choices for g0j are examined. We investigate theoretical properties, such as consistency of the implied Bayesian procedure. Links with classical information criteria are provided. More importantly, we examine the finite sample implications of several choices of g0j in a simulation study. The use of the MC3 algorithm of Madigan and York (1995), combined with efficient coding in Fortran, makes it feasible to conduct large simulations. In addition to posterior criteria, we shall also compare the predictive performance of different priors. A classic example concerning the economics of crime will also be provided and contrasted with results in the literature. The main findings of the paper will lead us to propose a “benchmark” prior specification in a linear regression context with model uncertainty.
Predictive Model Selection
 Journal of the Royal Statistical Society, Ser. B
, 1995
"... this article we propose three criteria that can be used to address model selection. These emphasize observables rather than parameters and are based on a certain Bayesian predictive density. They have a unifying basis that is simple and interpretable,are free of asymptotic de#nitions,and allow the i ..."
Abstract

Cited by 81 (5 self)
 Add to MetaCart
this article we propose three criteria that can be used to address model selection. These emphasize observables rather than parameters and are based on a certain Bayesian predictive density. They have a unifying basis that is simple and interpretable,are free of asymptotic de#nitions,and allow the incorporation of prior information. Moreover,two of these criteria are readily calibrated.
Joint Bayesian Model Selection and Estimation of Noisy Sinusoids via Reversible Jump MCMC
, 1999
"... In this paper, the problem of joint Bayesian model selection and parameter estimation for sinusoids in white Gaussian noise is addressed. An original Bayesian model is proposed that allows us to define a posterior distribution on the parameter space. All Bayesian inference is then based on this dist ..."
Abstract

Cited by 59 (3 self)
 Add to MetaCart
In this paper, the problem of joint Bayesian model selection and parameter estimation for sinusoids in white Gaussian noise is addressed. An original Bayesian model is proposed that allows us to define a posterior distribution on the parameter space. All Bayesian inference is then based on this distribution. Unfortunately, a direct evaluation of this distribution and of its features, including posterior model probabilities, requires evaluation of some complicated highdimensional integrals. We develop an efficient stochastic algorithm based on reversible jump Markov chain Monte Carlo methods to perform the Bayesian computation. A convergence result for this algorithm is established. In simulation, it appears that the performance of detection based on posterior model probabilities outperforms conventional detection schemes.
Model Selection and Accounting for Model Uncertainty in Linear Regression Models
, 1993
"... We consider the problems of variable selection and accounting for model uncertainty in linear regression models. Conditioning on a single selected model ignores model uncertainty, and thus leads to the underestimation of uncertainty when making inferences about quantities of interest. The complete B ..."
Abstract

Cited by 50 (6 self)
 Add to MetaCart
We consider the problems of variable selection and accounting for model uncertainty in linear regression models. Conditioning on a single selected model ignores model uncertainty, and thus leads to the underestimation of uncertainty when making inferences about quantities of interest. The complete Bayesian solution to this problem involves averaging over all possible models when making inferences about quantities of interest. This approach is often not practical. In this paper we offer two alternative approaches. First we describe a Bayesian model selection algorithm called "Occam's "Window" which involves averaging over a reduced set of models. Second, we describe a Markov chain Monte Carlo approach which directly approximates the exact solution. Both these model averaging procedures provide better predictive performance than any single model which might reasonably have been selected. In the extreme case where there are many candidate predictors but there is no relationship between any of them and the response, standard variable selection procedures often choose some subset of variables that yields a high R² and a highly significant overall F value. We refer to this unfortunate phenomenon as "Freedman's Paradox" (Freedman, 1983). In this situation, Occam's vVindow usually indicates the null model as the only one to be considered, or else a small number of models including the null model, thus largely resolving the paradox.
Mixtures of gpriors for Bayesian variable selection
 Journal of the American Statistical Association
, 2008
"... Zellner’s gprior remains a popular conventional prior for use in Bayesian variable selection, despite several undesirable consistency issues. In this paper, we study mixtures of gpriors as an alternative to default gpriors that resolve many of the problems with the original formulation, while mai ..."
Abstract

Cited by 46 (4 self)
 Add to MetaCart
(Show Context)
Zellner’s gprior remains a popular conventional prior for use in Bayesian variable selection, despite several undesirable consistency issues. In this paper, we study mixtures of gpriors as an alternative to default gpriors that resolve many of the problems with the original formulation, while maintaining the computational tractability that has made the gprior so popular. We present theoretical properties of the mixture gpriors and provide real and simulated examples to compare the mixture formulation with fixed gpriors, Empirical Bayes approaches and other default procedures.
On the effect of prior assumptions in Bayesian model averaging with applications to growth regression
, 2008
"... Abstract. We consider the problem of variable selection in linear regression models. Bayesian model averaging has become an important tool in empirical settings with large numbers of potential regressors and relatively limited numbers of observations. We examine the effect of a variety of prior assu ..."
Abstract

Cited by 44 (3 self)
 Add to MetaCart
(Show Context)
Abstract. We consider the problem of variable selection in linear regression models. Bayesian model averaging has become an important tool in empirical settings with large numbers of potential regressors and relatively limited numbers of observations. We examine the effect of a variety of prior assumptions on the inference concerning model size, posterior inclusion probabilities of regressors and on predictive performance. We illustrate these issues in the context of crosscountry growth regressions using three datasets with 41 to 67 potential drivers of growth and 72 to 93 observations. Finally, we recommend priors for use in this and related contexts.
Robust Full Bayesian Learning for Radial Basis Networks
, 2001
"... We propose a hierachical full Bayesian model for radial basis networks. This model treats the model dimension (number of neurons), model parameters,... ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
We propose a hierachical full Bayesian model for radial basis networks. This model treats the model dimension (number of neurons), model parameters,...
Understanding the use of unlabelled data in predictive modelling
 Statistical Science
, 2006
"... The incorporation of unlabelled data in statistical machine learning methods for prediction, including regression and classification, has demonstrated the potential for improved accuracy in prediction in a number of recent examples. The statistical basis for this semisupervised analysis does not, h ..."
Abstract

Cited by 21 (10 self)
 Add to MetaCart
(Show Context)
The incorporation of unlabelled data in statistical machine learning methods for prediction, including regression and classification, has demonstrated the potential for improved accuracy in prediction in a number of recent examples. The statistical basis for this semisupervised analysis does not, however, appear to have been well delineated in the literature to date. Nor, perhaps, are statisticians as fully engaged in the vigourous research in this area of machine learning as might be desired. Much of the theoretical work in the literature has focused, for example, on geometric and structural properties of the unlabeled data in the context of particular algorithms, rather than probabilistic and statistical questions. This paper overviews the fundamental statistical foundations for predictive modelling and the general questions associated with unlabelled data, highlighting the relevance of venerable concepts of sampling design and prior specification. This theory, illustrated with a series of simple but central examples, shows precisely when, why and how unlabelled data matter.