Results 1 - 10
of
11
A note on the LASSO and related procedures in model selection
- STATISTICA SINICA
, 2004
"... The Lasso, the Forward Stagewise regression and the Lars are closely re-lated procedures recently proposed for linear regression problems. Each of them can produce sparse models and can be used both for estimation and variable selection. In practical implementations these algorithms are typically tu ..."
Abstract
-
Cited by 28 (5 self)
- Add to MetaCart
The Lasso, the Forward Stagewise regression and the Lars are closely re-lated procedures recently proposed for linear regression problems. Each of them can produce sparse models and can be used both for estimation and variable selection. In practical implementations these algorithms are typically tuned to achieve optimal prediction accuracy. We show that, when the predic-tion accuracy is used as the criterion to choose the tuning parameter, in general these procedures are not consistent in terms of variable selection. That is, the sets of variables selected are not consistent at finding the true set of important variables. In particular, we show that for any sample size n, when there are superfluous variables in the linear regression model and the design matrix is orthogonal, the probability of the procedures correctly identifying the true set of important variables is less than a constant (smaller than one) not depending on n. This result is also shown to hold for two dimensional problems with gen-eral correlated design matrices. The results indicate that in problems where
The variable selection problem
- Journal of the American Statistical Association
, 2000
"... The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables or predictors, but there is uncertainty about which subset to use. This vignette reviews some of the key developments which have led to the wide variety of approaches for this problem. 1
Spike and slab variable selection: frequentist and bayesian strategies
- The Annals of Statistics
"... Variable selection in the linear regression model takes many apparent faces from both frequentist and Bayesian standpoints. In this paper we introduce a variable selection method referred to as a rescaled spike and slab model. We study the importance of prior hierarchical specifications and draw con ..."
Abstract
-
Cited by 24 (5 self)
- Add to MetaCart
Variable selection in the linear regression model takes many apparent faces from both frequentist and Bayesian standpoints. In this paper we introduce a variable selection method referred to as a rescaled spike and slab model. We study the importance of prior hierarchical specifications and draw connections to frequentist generalized ridge regression estimation. Specifically, we study the usefulness of continuous bimodal priors to model hypervariance parameters, and the effect scaling has on the posterior mean through its relationship to penalization. Several model selection strategies, some frequentist and some Bayesian in nature, are developed and studied theoretically. We demonstrate the importance of selective shrinkage for effective variable selection in terms of risk misclassification, and show this is achieved using the posterior from a rescaled spike and slab model. We also show how to verify a procedure’s ability to reduce model uncertainty in finite samples using a specialized forward selection strategy. Using this tool, we illustrate the effectiveness of rescaled spike and slab models in reducing model uncertainty. 1. Introduction. We
Bayesian Statistics
- in WWW', Computing Science and Statistics
, 1989
"... ∗ Signatures are on file in the Graduate School. This dissertation presents two topics from opposite disciplines: one is from a parametric realm and the other is based on nonparametric methods. The first topic is a jackknife maximum likelihood approach to statistical model selection and the second o ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
∗ Signatures are on file in the Graduate School. This dissertation presents two topics from opposite disciplines: one is from a parametric realm and the other is based on nonparametric methods. The first topic is a jackknife maximum likelihood approach to statistical model selection and the second one is a convex hull peeling depth approach to nonparametric massive multivariate data analysis. The second topic includes simulations and applications on massive astronomical data. First, we present a model selection criterion, minimizing the Kullback-Leibler distance by using the jackknife method. Various model selection methods have been developed to choose a model of minimum Kullback-Liebler distance to the true model, such as Akaike information criterion (AIC), Bayesian information criterion (BIC), Minimum description length (MDL), and Bootstrap information criterion. Likewise, the jackknife method chooses a model of minimum Kullback-Leibler distance through bias reduction. This bias, which is inevitable in model
Identification of nonlinear additive autoregressive models
- B
, 2004
"... Summary. We propose a lag selection method for non-linear additive autoregressive models that is based on spline estimation and the Bayes information criterion. The additive structure of the autoregression function is used to overcome the ‘curse of dimensionality’, whereas the spline estimators effe ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
Summary. We propose a lag selection method for non-linear additive autoregressive models that is based on spline estimation and the Bayes information criterion. The additive structure of the autoregression function is used to overcome the ‘curse of dimensionality’, whereas the spline estimators effectively take into account such a structure in estimation. A stepwise procedure is suggested to implement the method proposed. A comprehensive Monte Carlo study demonstrates good performance of the method proposed and a substantial computational advantage over existing local-polynomial-based methods. Consistency of the lag selection method based on the Bayes information criterion is established under the assumption that the observations are from a stochastic process that is strictly stationary and strongly mixing, which provides the first theoretical result of this kind for spline smoothing of weakly dependent data.
Identifying Quantitative Trait Loci in Experimental Crosses
, 1997
"... Identifying quantitative trait loci in experimental crosses by Karl William Broman Doctor of Philosophy in Statistics University of California, Berkeley Professor Terence P. Speed, Chair Identifying the genetic loci responsible for variation in traits which are quantitative in nature (such as the yi ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Identifying quantitative trait loci in experimental crosses by Karl William Broman Doctor of Philosophy in Statistics University of California, Berkeley Professor Terence P. Speed, Chair Identifying the genetic loci responsible for variation in traits which are quantitative in nature (such as the yield from an agricultural crop or the number of abdominal bristles on a fruit fly) is a problem of great importance to biologists. The number and effects of such loci help us to understand the biochemical basis of these traits, and of their evolution in populations over time. Moreover, knowledge of these loci may aid in designing selection experiments to improve the traits. We focus on data from a large experimental cross. The usual methods for analyzing such data use multiple tests of hypotheses. We feel the problem is best viewed as one of model selection. After a brief review of the major methods in this area, we discuss the use of model selection to identify quantitative trait loci. Forwa...
Model Selection With Data-Oriented Penalty
, 1999
"... We consider the problem of model (or variable) selection in the classical regression model using the GIC (general information criterion). In this method the maximum likelihood is used with a penalty function denoted by C n , depending on the sample size n and chosen to ensure consistency in the sele ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We consider the problem of model (or variable) selection in the classical regression model using the GIC (general information criterion). In this method the maximum likelihood is used with a penalty function denoted by C n , depending on the sample size n and chosen to ensure consistency in the selection of the true model. There are various choices of C n suggested in the literature on model selection. In this paper we show that a particular choice of C n based on observed data, which makes it random, preserves the consistency property and provides improved performance over a #xed choice of C n . c 1999 Elsevier Science B.V. All rights reserved.
Prediction/estimation With Simple Linear Models: Is It Really That Simple?
, 2004
"... Consider the simple normal linear regression model for estimation/prediction at a new design point. When the slope parameter is not obviously nonzero, hypothesis testing and model selection methods can be used for identifying the right model. We compare performance of such methods both theoretically ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Consider the simple normal linear regression model for estimation/prediction at a new design point. When the slope parameter is not obviously nonzero, hypothesis testing and model selection methods can be used for identifying the right model. We compare performance of such methods both theoretically and empirically from different perspectives for more insight. The testing approach, in spite of being the "standard approch", performs poorly. We also found that the frequently told story "BIC is good when the true model is finite-dimensional and AIC is good when the true model is infinite-dimensional" is far from being accurate. In addition, despite some successes in the effort to go beyond the debate between AIC and BIC by adaptive model selection, it turns out that it is not possible to share the most essential properties of them by any model selection method. When model selection methods have difficulty in selection, model combining is seen to be a better alternative. 1
Pace Regression
, 1999
"... This paper articulates a new method of linear regression, \pace regression," that addresses many drawbacks of standard regression reported in the literature|particularly the subset selection problem. Pace regression improves on classical ordinary least squares (ols) regression by evaluating the ee ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper articulates a new method of linear regression, \pace regression," that addresses many drawbacks of standard regression reported in the literature|particularly the subset selection problem. Pace regression improves on classical ordinary least squares (ols) regression by evaluating the eect of each variable and using a clustering analysis to improve the statistical basis for estimating their contribution to the overall regression. As well as outperforming ols, it also outperforms|in a remarkably general sense|other linear modeling techniques in the literature, including subset selection procedures, which seek a reduction in dimensionality that falls out as a natural byproduct of pace regression. The paper denes six procedures that share the fundamental idea of pace regression, all of which are theoretically justied in terms of asymptotic performance. Experiments conrm the performance improvement over other techniques. Keywords: Linear regression; subset model sele...
Asymptotics For The Gic In Model Selection
"... It is known that the C p method, which selects a model by minimizing the sum of squared residuals plus 2 times the model dimension, is asymptotically valid only when there is no fixed-dimension correct model, and that the GIC method, which selects a model by minimizing the sum of squared residuals p ..."
Abstract
- Add to MetaCart
It is known that the C p method, which selects a model by minimizing the sum of squared residuals plus 2 times the model dimension, is asymptotically valid only when there is no fixed-dimension correct model, and that the GIC method, which selects a model by minimizing the sum of squared residuals plus times the model dimension, is asymptotically valid when there are fixed-dimension correct models in the class of models to be selected. However, the behavior of the GIC is not clear when there is no fixed-dimension correct model. Also, when there are fixed-dimension correct models, how to choose is still an unsolved problem. In the first part of this paper we provide an asymptotic justification for the GIC in the case where there is no fixed-dimension correct model, using a loss function different from the customary squared error loss. The second part of this paper contains a result showing that in the GIC should be chosen as a function of the signal-noise ratio if the signal-noise ra...

