• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Approximate Bayes Factors and Accounting for Model Uncertainty in Generalized Linear Models (1993)

by Adrian E. Raftery
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 60
Next 10 →

Model Selection and Accounting for Model Uncertainty in Linear Regression Models

by Adrian Raftery, David Madigan, Jennifer Hoeting , 1993
"... We consider the problems of variable selection and accounting for model uncertainty in linear regression models. Conditioning on a single selected model ignores model uncertainty, and thus leads to the underestimation of uncertainty when making inferences about quantities of interest. The complete B ..."
Abstract - Cited by 40 (6 self) - Add to MetaCart
We consider the problems of variable selection and accounting for model uncertainty in linear regression models. Conditioning on a single selected model ignores model uncertainty, and thus leads to the underestimation of uncertainty when making inferences about quantities of interest. The complete Bayesian solution to this problem involves averaging over all possible models when making inferences about quantities of interest. This approach is often not practical. In this paper we offer two alternative approaches. First we describe a Bayesian model selection algorithm called "Occam's "Window" which involves averaging over a reduced set of models. Second, we describe a Markov chain Monte Carlo approach which directly approximates the exact solution. Both these model averaging procedures provide better predictive performance than any single model which might reasonably have been selected. In the extreme case where there are many candidate predictors but there is no relationship between any of them and the response, standard variable selection procedures often choose some subset of variables that yields a high R² and a highly significant overall F value. We refer to this unfortunate phenomenon as "Freedman's Paradox" (Freedman, 1983). In this situation, Occam's vVindow usually indicates the null model as the only one to be considered, or else a small number of models including the null model, thus largely resolving the paradox.

Accounting for Model Uncertainty in Survival Analysis Improves Predictive Performance

by Adrian Raftery, David Madigan, Chris T. Volinsky - In Bayesian Statistics 5 , 1995
"... Survival analysis is concerned with finding models to predict the survival of patients or to assess the efficacy of a clinical treatment. A key part of the model-building process is the selection of the predictor variables. It is standard to use a stepwise procedure guided by a series of significanc ..."
Abstract - Cited by 37 (12 self) - Add to MetaCart
Survival analysis is concerned with finding models to predict the survival of patients or to assess the efficacy of a clinical treatment. A key part of the model-building process is the selection of the predictor variables. It is standard to use a stepwise procedure guided by a series of significance tests to select a single model, and then to make inference conditionally on the selected model. However, this ignores model uncertainty, which can be substantial. We review the standard Bayesian model averaging solution to this problem and extend it to survival analysis, introducing partial Bayes factors to do so for the Cox proportional hazards model. In two examples, taking account of model uncertainty enhances predictive performance, to an extent that could be clinically useful. 1 Introduction From 1974 to 1984 the Mayo Clinic conducted a double-blinded randomized clinical trial involving 312 patients to compare the drug DPCA with a placebo in the treatment of primary biliary cirrhosis...

On Bayesian Model and Variable Selection Using MCMC

by Petros Dellaportas, Jonathan J. Forster, Ioannis Ntzoufras , 1997
"... Introduction A Bayesian approach to model selection proceeds as follows. Suppose that the data y are considered to have been generated by a model m, one of a set M of competing models. Each model specifies the distribution of Y , f(yjm; fi m ) apart from an unknown parameter vector fi m 2 Bm , wh ..."
Abstract - Cited by 34 (2 self) - Add to MetaCart
Introduction A Bayesian approach to model selection proceeds as follows. Suppose that the data y are considered to have been generated by a model m, one of a set M of competing models. Each model specifies the distribution of Y , f(yjm; fi m ) apart from an unknown parameter vector fi m 2 Bm , where Bm is the set of all possible values for the coefficients of model m. If f(m) is the prior probability of model m, then the posterior probability is given by f(mjy) = f(m)f(y jm) P m2M f(m)f(y jm)

Bayesian model averaging

by Jennifer A. Hoeting, David Madigan , Adrian E. Raftery, Chris T. Volinsky - STAT.SCI , 1999
"... Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to over-con dent inferences and decisions tha ..."
Abstract - Cited by 29 (0 self) - Add to MetaCart
Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to over-con dent inferences and decisions that are more risky than one thinks they are. Bayesian model averaging (BMA) provides a coherent mechanism for accounting for this model uncertainty. Several methods for implementing BMA haverecently emerged. We discuss these methods and present anumber of examples. In these examples, BMA provides improved out-of-sample predictive performance. We also provide a catalogue of

Estimating Bayes Factors via Posterior Simulation with the Laplace-Metropolis Estimator

by Steven M. Lewis, Adrian E. Raftery - Journal of the American Statistical Association , 1994
"... The key quantity needed for Bayesian hypothesis testing and model selection is the marginal likelihood for a model, also known as the integrated likelihood, or the marginal probability of the data. In this paper we describe a way to use posterior simulation output to estimate marginal likelihoods. W ..."
Abstract - Cited by 26 (10 self) - Add to MetaCart
The key quantity needed for Bayesian hypothesis testing and model selection is the marginal likelihood for a model, also known as the integrated likelihood, or the marginal probability of the data. In this paper we describe a way to use posterior simulation output to estimate marginal likelihoods. We describe the basic LaplaceMetropolis estimator for models without random effects. For models with random effects the compound Laplace-Metropolis estimator is introduced. This estimator is applied to data from the World Fertility Survey and shown to give accurate results. Batching of simulation output is used to assess the uncertainty involved in using the compound Laplace-Metropolis estimator. The method allows us to test for the effects of independent variables in a random effects model, and also to test for the presence of the random effects. KEY WORDS: Laplace-Metropolis estimator; Random effects models; Marginal likelihoods; Posterior simulation; World Fertility Survey. 1 Introduction...

The variable selection problem

by Edward I. George - Journal of the American Statistical Association , 2000
"... The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables ..."
Abstract - Cited by 25 (1 self) - Add to MetaCart
The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables or predictors, but there is uncertainty about which subset to use. This vignette reviews some of the key developments which have led to the wide variety of approaches for this problem. 1

Hypothesis Testing and Model Selection Via Posterior Simulation

by Adrian E. Raftery - In Practical Markov Chain , 1995
"... Introduction To motivate the methods described in this chapter, consider the following inference problem in astronomy (Soubiran, 1993). Until fairly recently, it has been believed that the Galaxy consists of two stellar populations, the disk and the halo. More recently, it has been hypothesized tha ..."
Abstract - Cited by 21 (1 self) - Add to MetaCart
Introduction To motivate the methods described in this chapter, consider the following inference problem in astronomy (Soubiran, 1993). Until fairly recently, it has been believed that the Galaxy consists of two stellar populations, the disk and the halo. More recently, it has been hypothesized that there are in fact three stellar populations, the old (or thin) disk, the thick disk, and the halo, distinguished by their spatial distributions, their velocities, and their metallicities. These hypotheses have different implications for theories of the formation of the Galaxy. Some of the evidence for deciding whether there are two or three populations is shown in Figure 1, which shows radial and rotational velocities for n = 2; 370 stars. A natural model for this situation is a mixture model with J components, namely y i = J X j=1 ae j

Inference in model-based cluster analysis

by Halima Bensmail, Gilles Celeux, Adrian E. Raftery, Christian P. Robert , 1995
"... A new approach to cluster analysis has been introduced based on parsimonious geometric modelling of the within-group covariance matrices in a mixture of multivariate normal distributions, using hierarchical agglomeration and iterative relocation. It works well and is widely used via the MCLUST softw ..."
Abstract - Cited by 21 (7 self) - Add to MetaCart
A new approach to cluster analysis has been introduced based on parsimonious geometric modelling of the within-group covariance matrices in a mixture of multivariate normal distributions, using hierarchical agglomeration and iterative relocation. It works well and is widely used via the MCLUST software available in S-PLUS and StatLib. However, it has several limitations: there is no assessment of the uncertainty about the classification, the partition can be suboptimal, parameter estimates are biased, the shape matrix has to be specified by the user, prior group probabilities are assumed to be equal, the method for choosing the number of groups is based on a crude approximation, and no formal way of choosing between the various possible models is included. Here, we propose a new approach which overcomes all these difficulties. It consists of exact Bayesian inference via Gibbs sampling, and the calculation of Bayes factors (for choosing the model and the number of groups) from the output using the Laplace-Metropolis estimator. It works well in several real and simulated examples.

Bayesian Model Averaging in proportional hazard models: Assessing the risk of a stroke

by Chris T. Volinsky, David Madigan, Adrian E. Raftery, Richard A. Kronmal - Applied Statistics , 1997
"... Evaluating the risk of stroke is important in reducing the incidence of this devastating disease. Here, we apply Bayesian model averaging to variable selection in Cox proportional hazard models in the context of the Cardiovascular Health Study, a comprehensive investigation into the risk factors for ..."
Abstract - Cited by 20 (5 self) - Add to MetaCart
Evaluating the risk of stroke is important in reducing the incidence of this devastating disease. Here, we apply Bayesian model averaging to variable selection in Cox proportional hazard models in the context of the Cardiovascular Health Study, a comprehensive investigation into the risk factors for stroke. We introduce a technique based on the leaps and bounds algorithm which e ciently locates and ts the best models in the very large model space and thereby extends all subsets regression to Cox models. For each independent variable considered, the method provides the posterior probability that it belongs in the model. This is more directly interpretable than the corresponding P-values, and also more valid in that it takes account of model uncertainty. P-values from models preferred by stepwise methods tend to overstate the evidence for the predictive value of a variable. In our data Bayesian model averaging predictively outperforms standard model selection methods for assessing

A method for simultaneous variable selection and outlier identification in linear regression

by Jennifer Hoeting , Adrian E. Raftery , David Madigan - COMPUTATIONAL STATISTICS & DATA ANALYSIS , 1996
"... ..."
Abstract - Cited by 18 (6 self) - Add to MetaCart
Abstract not found
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University