Results 11  20
of
81
On Bayesian Model and Variable Selection Using MCMC
, 1997
"... Introduction A Bayesian approach to model selection proceeds as follows. Suppose that the data y are considered to have been generated by a model m, one of a set M of competing models. Each model specifies the distribution of Y , f(yjm; fi m ) apart from an unknown parameter vector fi m 2 Bm , wh ..."
Abstract

Cited by 51 (5 self)
 Add to MetaCart
Introduction A Bayesian approach to model selection proceeds as follows. Suppose that the data y are considered to have been generated by a model m, one of a set M of competing models. Each model specifies the distribution of Y , f(yjm; fi m ) apart from an unknown parameter vector fi m 2 Bm , where Bm is the set of all possible values for the coefficients of model m. If f(m) is the prior probability of model m, then the posterior probability is given by f(mjy) = f(m)f(y jm) P m2M f(m)f(y jm)
Model Selection and Accounting for Model Uncertainty in Linear Regression Models
, 1993
"... We consider the problems of variable selection and accounting for model uncertainty in linear regression models. Conditioning on a single selected model ignores model uncertainty, and thus leads to the underestimation of uncertainty when making inferences about quantities of interest. The complete B ..."
Abstract

Cited by 47 (6 self)
 Add to MetaCart
We consider the problems of variable selection and accounting for model uncertainty in linear regression models. Conditioning on a single selected model ignores model uncertainty, and thus leads to the underestimation of uncertainty when making inferences about quantities of interest. The complete Bayesian solution to this problem involves averaging over all possible models when making inferences about quantities of interest. This approach is often not practical. In this paper we offer two alternative approaches. First we describe a Bayesian model selection algorithm called "Occam's "Window" which involves averaging over a reduced set of models. Second, we describe a Markov chain Monte Carlo approach which directly approximates the exact solution. Both these model averaging procedures provide better predictive performance than any single model which might reasonably have been selected. In the extreme case where there are many candidate predictors but there is no relationship between any of them and the response, standard variable selection procedures often choose some subset of variables that yields a high R² and a highly significant overall F value. We refer to this unfortunate phenomenon as "Freedman's Paradox" (Freedman, 1983). In this situation, Occam's vVindow usually indicates the null model as the only one to be considered, or else a small number of models including the null model, thus largely resolving the paradox.
Bayesian model averaging
 STAT.SCI
, 1999
"... Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to overcon dent inferences and decisions tha ..."
Abstract

Cited by 42 (0 self)
 Add to MetaCart
Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to overcon dent inferences and decisions that are more risky than one thinks they are. Bayesian model averaging (BMA) provides a coherent mechanism for accounting for this model uncertainty. Several methods for implementing BMA haverecently emerged. We discuss these methods and present anumber of examples. In these examples, BMA provides improved outofsample predictive performance. We also provide a catalogue of
The variable selection problem
 Journal of the American Statistical Association
, 2000
"... The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables ..."
Abstract

Cited by 39 (2 self)
 Add to MetaCart
The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables or predictors, but there is uncertainty about which subset to use. This vignette reviews some of the key developments which have led to the wide variety of approaches for this problem. 1
Accounting for Model Uncertainty in Survival Analysis Improves Predictive Performance
 In Bayesian Statistics 5
, 1995
"... Survival analysis is concerned with finding models to predict the survival of patients or to assess the efficacy of a clinical treatment. A key part of the modelbuilding process is the selection of the predictor variables. It is standard to use a stepwise procedure guided by a series of significanc ..."
Abstract

Cited by 39 (12 self)
 Add to MetaCart
Survival analysis is concerned with finding models to predict the survival of patients or to assess the efficacy of a clinical treatment. A key part of the modelbuilding process is the selection of the predictor variables. It is standard to use a stepwise procedure guided by a series of significance tests to select a single model, and then to make inference conditionally on the selected model. However, this ignores model uncertainty, which can be substantial. We review the standard Bayesian model averaging solution to this problem and extend it to survival analysis, introducing partial Bayes factors to do so for the Cox proportional hazards model. In two examples, taking account of model uncertainty enhances predictive performance, to an extent that could be clinically useful. 1 Introduction From 1974 to 1984 the Mayo Clinic conducted a doubleblinded randomized clinical trial involving 312 patients to compare the drug DPCA with a placebo in the treatment of primary biliary cirrhosis...
Estimating Bayes Factors via Posterior Simulation with the LaplaceMetropolis Estimator
 Journal of the American Statistical Association
, 1994
"... The key quantity needed for Bayesian hypothesis testing and model selection is the marginal likelihood for a model, also known as the integrated likelihood, or the marginal probability of the data. In this paper we describe a way to use posterior simulation output to estimate marginal likelihoods. W ..."
Abstract

Cited by 33 (11 self)
 Add to MetaCart
The key quantity needed for Bayesian hypothesis testing and model selection is the marginal likelihood for a model, also known as the integrated likelihood, or the marginal probability of the data. In this paper we describe a way to use posterior simulation output to estimate marginal likelihoods. We describe the basic LaplaceMetropolis estimator for models without random effects. For models with random effects the compound LaplaceMetropolis estimator is introduced. This estimator is applied to data from the World Fertility Survey and shown to give accurate results. Batching of simulation output is used to assess the uncertainty involved in using the compound LaplaceMetropolis estimator. The method allows us to test for the effects of independent variables in a random effects model, and also to test for the presence of the random effects. KEY WORDS: LaplaceMetropolis estimator; Random effects models; Marginal likelihoods; Posterior simulation; World Fertility Survey. 1 Introduction...
Bayesian Model Averaging in proportional hazard models: Assessing the risk of a stroke
 Applied Statistics
, 1997
"... Evaluating the risk of stroke is important in reducing the incidence of this devastating disease. Here, we apply Bayesian model averaging to variable selection in Cox proportional hazard models in the context of the Cardiovascular Health Study, a comprehensive investigation into the risk factors for ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
Evaluating the risk of stroke is important in reducing the incidence of this devastating disease. Here, we apply Bayesian model averaging to variable selection in Cox proportional hazard models in the context of the Cardiovascular Health Study, a comprehensive investigation into the risk factors for stroke. We introduce a technique based on the leaps and bounds algorithm which e ciently locates and ts the best models in the very large model space and thereby extends all subsets regression to Cox models. For each independent variable considered, the method provides the posterior probability that it belongs in the model. This is more directly interpretable than the corresponding Pvalues, and also more valid in that it takes account of model uncertainty. Pvalues from models preferred by stepwise methods tend to overstate the evidence for the predictive value of a variable. In our data Bayesian model averaging predictively outperforms standard model selection methods for assessing
A method for simultaneous variable selection and outlier identification in linear regression
 COMPUTATIONAL STATISTICS & DATA ANALYSIS
, 1996
"... ..."
Hypothesis Testing and Model Selection Via Posterior Simulation
 In Practical Markov Chain
, 1995
"... Introduction To motivate the methods described in this chapter, consider the following inference problem in astronomy (Soubiran, 1993). Until fairly recently, it has been believed that the Galaxy consists of two stellar populations, the disk and the halo. More recently, it has been hypothesized tha ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
Introduction To motivate the methods described in this chapter, consider the following inference problem in astronomy (Soubiran, 1993). Until fairly recently, it has been believed that the Galaxy consists of two stellar populations, the disk and the halo. More recently, it has been hypothesized that there are in fact three stellar populations, the old (or thin) disk, the thick disk, and the halo, distinguished by their spatial distributions, their velocities, and their metallicities. These hypotheses have different implications for theories of the formation of the Galaxy. Some of the evidence for deciding whether there are two or three populations is shown in Figure 1, which shows radial and rotational velocities for n = 2; 370 stars. A natural model for this situation is a mixture model with J components, namely y i = J X j=1 ae j
Inference in modelbased cluster analysis
, 1995
"... A new approach to cluster analysis has been introduced based on parsimonious geometric modelling of the withingroup covariance matrices in a mixture of multivariate normal distributions, using hierarchical agglomeration and iterative relocation. It works well and is widely used via the MCLUST softw ..."
Abstract

Cited by 23 (7 self)
 Add to MetaCart
A new approach to cluster analysis has been introduced based on parsimonious geometric modelling of the withingroup covariance matrices in a mixture of multivariate normal distributions, using hierarchical agglomeration and iterative relocation. It works well and is widely used via the MCLUST software available in SPLUS and StatLib. However, it has several limitations: there is no assessment of the uncertainty about the classification, the partition can be suboptimal, parameter estimates are biased, the shape matrix has to be specified by the user, prior group probabilities are assumed to be equal, the method for choosing the number of groups is based on a crude approximation, and no formal way of choosing between the various possible models is included. Here, we propose a new approach which overcomes all these difficulties. It consists of exact Bayesian inference via Gibbs sampling, and the calculation of Bayes factors (for choosing the model and the number of groups) from the output using the LaplaceMetropolis estimator. It works well in several real and simulated examples.