Results 1  10
of
14
A note on the LASSO and related procedures in model selection
 STATISTICA SINICA
, 2004
"... The Lasso, the Forward Stagewise regression and the Lars are closely related procedures recently proposed for linear regression problems. Each of them can produce sparse models and can be used both for estimation and variable selection. In practical implementations these algorithms are typically tu ..."
Abstract

Cited by 47 (6 self)
 Add to MetaCart
The Lasso, the Forward Stagewise regression and the Lars are closely related procedures recently proposed for linear regression problems. Each of them can produce sparse models and can be used both for estimation and variable selection. In practical implementations these algorithms are typically tuned to achieve optimal prediction accuracy. We show that, when the prediction accuracy is used as the criterion to choose the tuning parameter, in general these procedures are not consistent in terms of variable selection. That is, the sets of variables selected are not consistent at finding the true set of important variables. In particular, we show that for any sample size n, when there are superfluous variables in the linear regression model and the design matrix is orthogonal, the probability of the procedures correctly identifying the true set of important variables is less than a constant (smaller than one) not depending on n. This result is also shown to hold for two dimensional problems with general correlated design matrices. The results indicate that in problems where
Spike and slab variable selection: frequentist and bayesian strategies
 The Annals of Statistics
"... Variable selection in the linear regression model takes many apparent faces from both frequentist and Bayesian standpoints. In this paper we introduce a variable selection method referred to as a rescaled spike and slab model. We study the importance of prior hierarchical specifications and draw con ..."
Abstract

Cited by 40 (7 self)
 Add to MetaCart
Variable selection in the linear regression model takes many apparent faces from both frequentist and Bayesian standpoints. In this paper we introduce a variable selection method referred to as a rescaled spike and slab model. We study the importance of prior hierarchical specifications and draw connections to frequentist generalized ridge regression estimation. Specifically, we study the usefulness of continuous bimodal priors to model hypervariance parameters, and the effect scaling has on the posterior mean through its relationship to penalization. Several model selection strategies, some frequentist and some Bayesian in nature, are developed and studied theoretically. We demonstrate the importance of selective shrinkage for effective variable selection in terms of risk misclassification, and show this is achieved using the posterior from a rescaled spike and slab model. We also show how to verify a procedure’s ability to reduce model uncertainty in finite samples using a specialized forward selection strategy. Using this tool, we illustrate the effectiveness of rescaled spike and slab models in reducing model uncertainty. 1. Introduction. We
The variable selection problem
 Journal of the American Statistical Association
, 2000
"... The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables ..."
Abstract

Cited by 39 (2 self)
 Add to MetaCart
The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables or predictors, but there is uncertainty about which subset to use. This vignette reviews some of the key developments which have led to the wide variety of approaches for this problem. 1
Bayesian Statistics
 in WWW', Computing Science and Statistics
, 1989
"... ∗ Signatures are on file in the Graduate School. This dissertation presents two topics from opposite disciplines: one is from a parametric realm and the other is based on nonparametric methods. The first topic is a jackknife maximum likelihood approach to statistical model selection and the second o ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
∗ Signatures are on file in the Graduate School. This dissertation presents two topics from opposite disciplines: one is from a parametric realm and the other is based on nonparametric methods. The first topic is a jackknife maximum likelihood approach to statistical model selection and the second one is a convex hull peeling depth approach to nonparametric massive multivariate data analysis. The second topic includes simulations and applications on massive astronomical data. First, we present a model selection criterion, minimizing the KullbackLeibler distance by using the jackknife method. Various model selection methods have been developed to choose a model of minimum KullbackLiebler distance to the true model, such as Akaike information criterion (AIC), Bayesian information criterion (BIC), Minimum description length (MDL), and Bootstrap information criterion. Likewise, the jackknife method chooses a model of minimum KullbackLeibler distance through bias reduction. This bias, which is inevitable in model
Identification of nonlinear additive autoregressive models
 B
, 2004
"... Summary. We propose a lag selection method for nonlinear additive autoregressive models that is based on spline estimation and the Bayes information criterion. The additive structure of the autoregression function is used to overcome the ‘curse of dimensionality’, whereas the spline estimators effe ..."
Abstract

Cited by 16 (10 self)
 Add to MetaCart
Summary. We propose a lag selection method for nonlinear additive autoregressive models that is based on spline estimation and the Bayes information criterion. The additive structure of the autoregression function is used to overcome the ‘curse of dimensionality’, whereas the spline estimators effectively take into account such a structure in estimation. A stepwise procedure is suggested to implement the method proposed. A comprehensive Monte Carlo study demonstrates good performance of the method proposed and a substantial computational advantage over existing localpolynomialbased methods. Consistency of the lag selection method based on the Bayes information criterion is established under the assumption that the observations are from a stochastic process that is strictly stationary and strongly mixing, which provides the first theoretical result of this kind for spline smoothing of weakly dependent data.
Identifying Quantitative Trait Loci in Experimental Crosses
, 1997
"... Identifying quantitative trait loci in experimental crosses by Karl William Broman Doctor of Philosophy in Statistics University of California, Berkeley Professor Terence P. Speed, Chair Identifying the genetic loci responsible for variation in traits which are quantitative in nature (such as the yi ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Identifying quantitative trait loci in experimental crosses by Karl William Broman Doctor of Philosophy in Statistics University of California, Berkeley Professor Terence P. Speed, Chair Identifying the genetic loci responsible for variation in traits which are quantitative in nature (such as the yield from an agricultural crop or the number of abdominal bristles on a fruit fly) is a problem of great importance to biologists. The number and effects of such loci help us to understand the biochemical basis of these traits, and of their evolution in populations over time. Moreover, knowledge of these loci may aid in designing selection experiments to improve the traits. We focus on data from a large experimental cross. The usual methods for analyzing such data use multiple tests of hypotheses. We feel the problem is best viewed as one of model selection. After a brief review of the major methods in this area, we discuss the use of model selection to identify quantitative trait loci. Forwa...
Prediction/estimation With Simple Linear Models: Is It Really That Simple?
, 2004
"... Consider the simple normal linear regression model for estimation/prediction at a new design point. When the slope parameter is not obviously nonzero, hypothesis testing and model selection methods can be used for identifying the right model. We compare performance of such methods both theoretically ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Consider the simple normal linear regression model for estimation/prediction at a new design point. When the slope parameter is not obviously nonzero, hypothesis testing and model selection methods can be used for identifying the right model. We compare performance of such methods both theoretically and empirically from different perspectives for more insight. The testing approach, in spite of being the "standard approch", performs poorly. We also found that the frequently told story "BIC is good when the true model is finitedimensional and AIC is good when the true model is infinitedimensional" is far from being accurate. In addition, despite some successes in the effort to go beyond the debate between AIC and BIC by adaptive model selection, it turns out that it is not possible to share the most essential properties of them by any model selection method. When model selection methods have difficulty in selection, model combining is seen to be a better alternative.
Model Selection With DataOriented Penalty
, 1999
"... We consider the problem of model (or variable) selection in the classical regression model using the GIC (general information criterion). In this method the maximum likelihood is used with a penalty function denoted by C n , depending on the sample size n and chosen to ensure consistency in the sele ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We consider the problem of model (or variable) selection in the classical regression model using the GIC (general information criterion). In this method the maximum likelihood is used with a penalty function denoted by C n , depending on the sample size n and chosen to ensure consistency in the selection of the true model. There are various choices of C n suggested in the literature on model selection. In this paper we show that a particular choice of C n based on observed data, which makes it random, preserves the consistency property and provides improved performance over a #xed choice of C n . c 1999 Elsevier Science B.V. All rights reserved.
A new approach to fitting linear models in high dimensional spaces
, 2000
"... This thesis presents a new approach to fitting linear models, called “pace regression”, which also overcomes the dimensionality determination problem. Its optimality in minimizing the expected prediction loss is theoretically established, when the number of free parameters is infinitely large. In th ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
This thesis presents a new approach to fitting linear models, called “pace regression”, which also overcomes the dimensionality determination problem. Its optimality in minimizing the expected prediction loss is theoretically established, when the number of free parameters is infinitely large. In this sense, pace regression outperforms existing procedures for fitting linear models. Dimensionality determination, a special case of fitting linear models, turns out to be a natural byproduct. A range of simulation studies are conducted; the results support the theoretical analysis. Through the thesis, a deeper understanding is gained of the problem of fitting linear models. Many key issues are discussed. Existing procedures, namely OLS, AIC, BIC, RIC, CIC, CV(d), BS(m), RIDGE, NNGAROTTE and LASSO, are reviewed and compared, both theoretically and empirically, with the new methods. Estimating a mixing distribution is an indispensable part of pace regression. A measurebased minimum distance approach, including probability measures and nonnegative measures, is proposed, and strongly consistent estimators are produced. Of all minimum distance methods for estimating a mixing distribution, only the
Pace Regression
, 1999
"... This paper articulates a new method of linear regression, \pace regression," that addresses many drawbacks of standard regression reported in the literatureparticularly the subset selection problem. Pace regression improves on classical ordinary least squares (ols) regression by evaluating the ee ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
This paper articulates a new method of linear regression, \pace regression," that addresses many drawbacks of standard regression reported in the literatureparticularly the subset selection problem. Pace regression improves on classical ordinary least squares (ols) regression by evaluating the eect of each variable and using a clustering analysis to improve the statistical basis for estimating their contribution to the overall regression. As well as outperforming ols, it also outperformsin a remarkably general senseother linear modeling techniques in the literature, including subset selection procedures, which seek a reduction in dimensionality that falls out as a natural byproduct of pace regression. The paper denes six procedures that share the fundamental idea of pace regression, all of which are theoretically justied in terms of asymptotic performance. Experiments conrm the performance improvement over other techniques. Keywords: Linear regression; subset model sele...