Results 1  10
of
45
Spike and slab variable selection: frequentist and bayesian strategies
 The Annals of Statistics
"... Variable selection in the linear regression model takes many apparent faces from both frequentist and Bayesian standpoints. In this paper we introduce a variable selection method referred to as a rescaled spike and slab model. We study the importance of prior hierarchical specifications and draw con ..."
Abstract

Cited by 44 (7 self)
 Add to MetaCart
Variable selection in the linear regression model takes many apparent faces from both frequentist and Bayesian standpoints. In this paper we introduce a variable selection method referred to as a rescaled spike and slab model. We study the importance of prior hierarchical specifications and draw connections to frequentist generalized ridge regression estimation. Specifically, we study the usefulness of continuous bimodal priors to model hypervariance parameters, and the effect scaling has on the posterior mean through its relationship to penalization. Several model selection strategies, some frequentist and some Bayesian in nature, are developed and studied theoretically. We demonstrate the importance of selective shrinkage for effective variable selection in terms of risk misclassification, and show this is achieved using the posterior from a rescaled spike and slab model. We also show how to verify a procedure’s ability to reduce model uncertainty in finite samples using a specialized forward selection strategy. Using this tool, we illustrate the effectiveness of rescaled spike and slab models in reducing model uncertainty. 1. Introduction. We
The variable selection problem
 Journal of the American Statistical Association
, 2000
"... The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables ..."
Abstract

Cited by 44 (3 self)
 Add to MetaCart
The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables or predictors, but there is uncertainty about which subset to use. This vignette reviews some of the key developments which have led to the wide variety of approaches for this problem. 1
Model selection by resampling penalization
, 2007
"... We present a new family of model selection algorithms based on the resampling heuristics. It can be used in several frameworks, do not require any knowledge about the unknown law of the data, and may be seen as a generalization of local Rademacher complexities and Vfold crossvalidation. In the cas ..."
Abstract

Cited by 20 (11 self)
 Add to MetaCart
We present a new family of model selection algorithms based on the resampling heuristics. It can be used in several frameworks, do not require any knowledge about the unknown law of the data, and may be seen as a generalization of local Rademacher complexities and Vfold crossvalidation. In the case example of leastsquare regression on histograms, we prove oracle inequalities, and that these algorithms are naturally adaptive to both the smoothness of the regression function and the variability of the noise level. Then, interpretating Vfold crossvalidation in terms of penalization, we enlighten the question of choosing V. Finally, a simulation study illustrates the strength of resampling penalization algorithms against some classical ones, in particular with heteroscedastic data.
Consistency of cross validation for comparing regression procedures. Annals of Statistics, Accepted paper
"... Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finitedimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistenc ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
(Show Context)
Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finitedimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistency of cross validation when applied to compare between parametric and nonparametric methods or within nonparametric methods. We show that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1. Our results reveal interesting behavior of cross validation. When comparing two models (procedures) converging at the same nonparametric rate, in contrast to the parametric case, it turns out that the proportion of data used for evaluation in CV does not need to be dominating in size. Furthermore, it can even be of a smaller order than the proportion for estimation while not affecting the consistency property.
Model Selection
 In The Handbook Of Financial Time Series
, 2008
"... Model selection has become an ubiquitous statistical activity in the last decades, none the least due to the computational ease with which many statistical models can be fitted to data with the help of modern computing equipment. In this article we provide an introduction to the statistical aspect ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
Model selection has become an ubiquitous statistical activity in the last decades, none the least due to the computational ease with which many statistical models can be fitted to data with the help of modern computing equipment. In this article we provide an introduction to the statistical aspects and implications of model selection and we review the relevant literature. 1.1 A General Formulation When modeling data Y, a researcher often has available a menu of competing candidate models which could be used to describe the data. Let M denote the collection of these candidate models. Each model M, i.e., each element of M, can – from a mathematical point of view – be viewed as a collection of probability distributions for Y implied by the model. That is, M is given by M = {Pη: η ∈ H}, where Pη denotes a probability distribution for Y and H represents the ‘parameter ’ space (which can be different across different models M). The ‘parameter ’ space H need not be finitedimensional. Often, the ‘parameter ’ η will be partitioned into (η1, η2) where η1 is a finitedimensional parameter whereas η2 is infinitedimensional. In case the parameterization is identified, i.e., the map η → Pη is injective on H, we will often not distinguish between M and H and will use them synonymously. The model selection problem is now to select – based on the data Y – a model M ̂ = M̂(Y) in M such that M ̂ is a ‘good ’ model for the data Y. Of course, the sense, in which the selected model should be a ‘good ’ model, needs to be made precise and is a crucial point in the analysis. This is particularly important if – as is usually the case – selecting the model M ̂ is not the final
Selecting Hidden Markov Model State Number with CrossValidated Likelihood
 Computational Statistics
"... Abstract: The problem of estimating the number of hidden states in a hidden Markov model is considered. Emphasis is placed on crossvalidated likelihood criteria. Using crossvalidation to assess the number of hidden states allows to circumvent the well documented ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Abstract: The problem of estimating the number of hidden states in a hidden Markov model is considered. Emphasis is placed on crossvalidated likelihood criteria. Using crossvalidation to assess the number of hidden states allows to circumvent the well documented
Comparing Learning Methods for Classification
, 2006
"... We address the consistency property of cross validation (CV) for classification. Sufficient conditions are obtained on the data splitting ratio to ensure that the better classifier between two candidates will be favored by CV with probability approaching 1. Interestingly, it turns out that for compa ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
We address the consistency property of cross validation (CV) for classification. Sufficient conditions are obtained on the data splitting ratio to ensure that the better classifier between two candidates will be favored by CV with probability approaching 1. Interestingly, it turns out that for comparing two general learning methods, the ratio of the training sample size and the evaluation size does not have to approach 0 for consistency in selection, as is required for comparing parametric regression models (Shao (1993)). In fact, the ratio may be allowed to converge to infinity or any positive constant, depending on the situation. In addition, we also discuss confidence intervals and sequential instability in selection for comparing classifiers. Key words and phrases: Classification, comparing learning methods, consistency in selection, cross validation paradox, sequential instability.
Density estimation via crossvalidation: Model selection point of view
, 2009
"... The problem of model selection by crossvalidation is addressed in the density estimation framework. Extensively used in practice, crossvalidation (CV) remains poorly understood, especially in the nonasymptotic setting which is the main concern of this work. A recurrent problem with CV is the comp ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
The problem of model selection by crossvalidation is addressed in the density estimation framework. Extensively used in practice, crossvalidation (CV) remains poorly understood, especially in the nonasymptotic setting which is the main concern of this work. A recurrent problem with CV is the computation time it involves. This drawback is overcome here thanks to closedform expressions for the CV estimator of the risk for a broad class of widespread estimators: projection estimators. In order to shed new lights on CV procedures with respect to the cardinality p of the test set, the CV estimator is interpreted as a penalized criterion with a random penalty. For instance, the amount of penalization is shown to increase with p. A theoretical assessment of the CV performance is carried out thanks to two oracle inequalities applying to respectively bounded or squareintegrable densities. For several collections of models, adaptivity results with respect to Hölder and Besov spaces are derived as well.
A new approach to fitting linear models in high dimensional spaces
, 2000
"... This thesis presents a new approach to fitting linear models, called “pace regression”, which also overcomes the dimensionality determination problem. Its optimality in minimizing the expected prediction loss is theoretically established, when the number of free parameters is infinitely large. In th ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
This thesis presents a new approach to fitting linear models, called “pace regression”, which also overcomes the dimensionality determination problem. Its optimality in minimizing the expected prediction loss is theoretically established, when the number of free parameters is infinitely large. In this sense, pace regression outperforms existing procedures for fitting linear models. Dimensionality determination, a special case of fitting linear models, turns out to be a natural byproduct. A range of simulation studies are conducted; the results support the theoretical analysis. Through the thesis, a deeper understanding is gained of the problem of fitting linear models. Many key issues are discussed. Existing procedures, namely OLS, AIC, BIC, RIC, CIC, CV(d), BS(m), RIDGE, NNGAROTTE and LASSO, are reviewed and compared, both theoretically and empirically, with the new methods. Estimating a mixing distribution is an indispensable part of pace regression. A measurebased minimum distance approach, including probability measures and nonnegative measures, is proposed, and strongly consistent estimators are produced. Of all minimum distance methods for estimating a mixing distribution, only the
VFOLD CROSSVALIDATION IMPROVED: VFOLD PENALIZATION
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2008
"... We study the efficiency of Vfold crossvalidation (VFCV) for model selection from the nonasymptotic viewpoint, and suggest an improvement on it, which we call “Vfold penalization”. Considering a particular (though simple) regression problem, we prove that VFCV with a bounded V is suboptimal for m ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
We study the efficiency of Vfold crossvalidation (VFCV) for model selection from the nonasymptotic viewpoint, and suggest an improvement on it, which we call “Vfold penalization”. Considering a particular (though simple) regression problem, we prove that VFCV with a bounded V is suboptimal for model selection, because it “overpenalizes ” all the more that V is large. Hence, asymptotic optimality requires V to go to infinity. However, when the signaltonoise ratio is low, it appears that overpenalizing is necessary, so that the optimal V is not always the larger one, despite of the variability issue. This is confirmed by some simulated data. In order to improve on the prediction performance of VFCV, we define a new model selection procedure, called “Vfold penalization” (penVF). It is a Vfold subsampling version of Efron’s bootstrap penalties, so that it has the same computational cost as VFCV, while being more flexible. In a heteroscedastic regression framework, assuming the models to have a particular structure, we prove that penVF satisfies a nonasymptotic oracle inequality with a leading constant that tends to 1 when the sample size goes to infinity. In particular, this implies adaptivity to the smoothness of the regression function, even with a highly heteroscedastic noise. Moreover, it is easy to overpenalize with penVF, independently from the V parameter. A simulation study shows that this results in a significant improvement on VFCV in nonasymptotic situations.