Results 11  20
of
95
The variable selection problem
 Journal of the American Statistical Association
, 2000
"... The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables ..."
Abstract

Cited by 39 (2 self)
 Add to MetaCart
The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables or predictors, but there is uncertainty about which subset to use. This vignette reviews some of the key developments which have led to the wide variety of approaches for this problem. 1
Accounting for Model Uncertainty in Survival Analysis Improves Predictive Performance
 In Bayesian Statistics 5
, 1995
"... Survival analysis is concerned with finding models to predict the survival of patients or to assess the efficacy of a clinical treatment. A key part of the modelbuilding process is the selection of the predictor variables. It is standard to use a stepwise procedure guided by a series of significanc ..."
Abstract

Cited by 39 (12 self)
 Add to MetaCart
Survival analysis is concerned with finding models to predict the survival of patients or to assess the efficacy of a clinical treatment. A key part of the modelbuilding process is the selection of the predictor variables. It is standard to use a stepwise procedure guided by a series of significance tests to select a single model, and then to make inference conditionally on the selected model. However, this ignores model uncertainty, which can be substantial. We review the standard Bayesian model averaging solution to this problem and extend it to survival analysis, introducing partial Bayes factors to do so for the Cox proportional hazards model. In two examples, taking account of model uncertainty enhances predictive performance, to an extent that could be clinically useful. 1 Introduction From 1974 to 1984 the Mayo Clinic conducted a doubleblinded randomized clinical trial involving 312 patients to compare the drug DPCA with a placebo in the treatment of primary biliary cirrhosis...
Bayes model averaging with selection of regressors
 Journal of the Royal Statistical Society. Series B, Statistical Methodology
, 2002
"... Summary. When a number of distinct models contend for use in prediction, the choice of a single model can offer rather unstable predictions. In regression, stochastic search variable selection with Bayesian model averaging offers a cure for this robustness issue but at the expense of requiring very ..."
Abstract

Cited by 33 (8 self)
 Add to MetaCart
Summary. When a number of distinct models contend for use in prediction, the choice of a single model can offer rather unstable predictions. In regression, stochastic search variable selection with Bayesian model averaging offers a cure for this robustness issue but at the expense of requiring very many predictors. Here we look at Bayes model averaging incorporating variable selection for prediction. This offers similar meansquare errors of prediction but with a vastly reduced predictor space. This can greatly aid the interpretation of the model. It also reduces the cost if measured variables have costs. The development here uses decision theory in the context of the multivariate general linear model. In passing, this reduced predictor space Bayes model averaging is contrasted with singlemodel approximations. A fast algorithm for updating regressions in the Markov chain Monte Carlo searches for posterior inference is developed, allowing many more variables than observations to be contemplated. We discuss the merits of absolute rather than proportionate shrinkage in regression, especially when there are more variables than observations. The methodology is illustrated on a set of spectroscopic data used for measuring the amounts of different sugars in an aqueous solution.
Statistical Themes and Lessons for Data Mining
, 1997
"... Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field that has attracted much attention in a very short period of time. This article highlights some statist ..."
Abstract

Cited by 32 (3 self)
 Add to MetaCart
Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field that has attracted much attention in a very short period of time. This article highlights some statistical themes and lessons that are directly relevant to data mining and attempts to identify opportunities where close cooperation between the statistical and computational communities might reasonably provide synergy for further progress in data analysis.
MCMC Methods for Computing Bayes Factors: A Comparative Review
 Journal of the American Statistical Association
, 2000
"... this paper we review several of these methods, and subsequently compare them in the context of two examples, the first a simple regression example, and the second a much more challenging hierarchical longitudinal model of the kind often encountered in biostatistical practice. We find that the joint ..."
Abstract

Cited by 31 (1 self)
 Add to MetaCart
this paper we review several of these methods, and subsequently compare them in the context of two examples, the first a simple regression example, and the second a much more challenging hierarchical longitudinal model of the kind often encountered in biostatistical practice. We find that the joint modelparameter space search methods perform adequately but can be difficult to program and tune, while the marginal likelihood methods are often less troublesome and require less in the way of additional coding. Our results suggest that the latter methods may be most appropriate for practitioners working in many standard model choice settings, while the former remain important for comparing large numbers of models, or models whose parameters cannot be easily updated in relatively few blocks. We caution however that all of the methods we compare require significant human and computer effort, suggesting that less formal Bayesian model choice methods may offer a more realistic alternative in many cases.
Concerning Bayesian Motion Segmentation, Model Averaging, Matching and the Trifocal Tensor
 In European Conference on Computer Vision
, 1998
"... . Motion segmentation involves identifying regions of the image that correspond to independently moving objects. The number of independently moving objects, and type of motion model for each of the objects is unknown a priori. In order to perform motion segmentation, the problems of model select ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
. Motion segmentation involves identifying regions of the image that correspond to independently moving objects. The number of independently moving objects, and type of motion model for each of the objects is unknown a priori. In order to perform motion segmentation, the problems of model selection, robust estimation and clustering must all be addressed simultaneously. Here we place the three problems into a common Bayesian framework; investigating the use of model averagingrepresenting a motion by a combination of modelsas a principled way for motion segmentation of images. The final result is a fully automatic algorithm for clustering that works in the presence of noise and outliers. 1 Introduction Detection of independently moving objects is an essential but often neglected precursor to problems in computer vision e.g. e#cient video compression [3], video editing, surveillance, smart tracking of objects etc. The work in this paper stems from the desire to develop a g...
A method for simultaneous variable selection and outlier identification in linear regression
 COMPUTATIONAL STATISTICS & DATA ANALYSIS
, 1996
"... ..."
Inference for Deterministic Simulation Models: The Bayesian Melding Approach
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2000
"... Deterministic simulation models are used in many areas of science, engineering and policymaking. Typically, they are complex models that attempt to capture underlying mechanisms in considerable detail, and they have many userspecified inputs. The inputs are often specified by some form of trialan ..."
Abstract

Cited by 25 (4 self)
 Add to MetaCart
Deterministic simulation models are used in many areas of science, engineering and policymaking. Typically, they are complex models that attempt to capture underlying mechanisms in considerable detail, and they have many userspecified inputs. The inputs are often specified by some form of trialanderror approach in which plausible values are postulated, the corresponding outputs inspected, and the inputs modified until plausible outputs are obtained. Here we address the issue of more formal inference for such models. Raftery et al. (1995a) proposed the Bayesian synthesis approach in which the available information about both inputs and outputs was encoded in a probability distribution and inference was made by restricting this distribution to the submanifold specifid by the model. Wolpert (1995) showed that this is subject to the Borel paradox, according to which the results can depend on the parameterization of the model. We show that this dependence is due to the presence of a prior on the outputs. We propose a modified approach, called Bayesian melding, which takes full account of information and uncertainty about both inputs and outputs to the model, while avoiding the Borel paradox. This is done by recognizing the existence of two priors, one implicit and one explicit, on each input and output � these are combined via logarithmic pooling. Bayesian melding is then
Hypothesis Testing and Model Selection Via Posterior Simulation
 In Practical Markov Chain
, 1995
"... Introduction To motivate the methods described in this chapter, consider the following inference problem in astronomy (Soubiran, 1993). Until fairly recently, it has been believed that the Galaxy consists of two stellar populations, the disk and the halo. More recently, it has been hypothesized tha ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
Introduction To motivate the methods described in this chapter, consider the following inference problem in astronomy (Soubiran, 1993). Until fairly recently, it has been believed that the Galaxy consists of two stellar populations, the disk and the halo. More recently, it has been hypothesized that there are in fact three stellar populations, the old (or thin) disk, the thick disk, and the halo, distinguished by their spatial distributions, their velocities, and their metallicities. These hypotheses have different implications for theories of the formation of the Galaxy. Some of the evidence for deciding whether there are two or three populations is shown in Figure 1, which shows radial and rotational velocities for n = 2; 370 stars. A natural model for this situation is a mixture model with J components, namely y i = J X j=1 ae j
A Discussion of Parameter and Model Uncertainty in Insurance
 in Insurance,” Insurance: Mathematics and Economics
, 2000
"... In this paper we consider the process of modelling uncertainty. In particular we are concerned with making inferences about some quantity of interest which, at present, has been unobserved. Examples of such a quantity include the probability of ruin of a surplus process, the accumulation of an inves ..."
Abstract

Cited by 22 (7 self)
 Add to MetaCart
In this paper we consider the process of modelling uncertainty. In particular we are concerned with making inferences about some quantity of interest which, at present, has been unobserved. Examples of such a quantity include the probability of ruin of a surplus process, the accumulation of an investment, the level or surplus or deficit in a pension fund and the future volume of new business in an insurance company. Uncertainty in this quantity of interest, y, arises from three sources: . uncertainty due to the stochastic nature of a given model; . uncertainty in the values of the parameters in a given model; . uncertainty in the model underlying what we are able to observe and determining the quantity of interest. It is common in actuarial science to find that the first source of uncertainty is the only one which receives rigorous attention. A limited amount of research in recent years has considered the effect of parameter uncertainty, while there is still considerable scope ...