Results 1  10
of
96
Least angle regression
, 2004
"... The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to s ..."
Abstract

Cited by 1326 (37 self)
 Add to MetaCart
The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of both methods, which are seen as constrained versions of the simpler LARS algorithm. (3) A simple approximation for the degrees of freedom of a LARS estimate is available, from which we derive a Cp estimate of prediction error; this allows a principled choice among the range of possible LARS estimates. LARS and its variants are computationally efficient: the paper describes a publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates.
The composite absolute penalties family for grouped and hierarchical variable selection
 Ann. Statist
"... Extracting useful information from highdimensional data is an important focus of today’s statistical research and practice. Penalized loss function minimization has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and sparsity, the ..."
Abstract

Cited by 146 (3 self)
 Add to MetaCart
(Show Context)
Extracting useful information from highdimensional data is an important focus of today’s statistical research and practice. Penalized loss function minimization has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and sparsity, the L1penalized squared error minimization method Lasso has been popular in regression models and beyond. In this paper, we combine different norms including L1 to form an intelligent penalty in order to add side information to the fitting of a regression or classification model to obtain reasonable estimates. Specifically, we introduce the Composite Absolute Penalties (CAP) family, which allows given grouping and hierarchical relationships between the predictors to be expressed. CAP penalties are built by defining groups and combining the properties of norm penalties at the acrossgroup and withingroup levels. Grouped selection occurs for nonoverlapping groups. Hierarchical variable selection is reached
R: Prediction error estimation: a comparison of resampling methods
 Bioinformatics
"... In genomic studies, thousands of features are collected on relatively few samples. One of the goals of these studies is to build classifiers to predict the outcome of future observations. There are three inherent steps to this process: feature selection, model selection, and prediction assessment. W ..."
Abstract

Cited by 84 (12 self)
 Add to MetaCart
In genomic studies, thousands of features are collected on relatively few samples. One of the goals of these studies is to build classifiers to predict the outcome of future observations. There are three inherent steps to this process: feature selection, model selection, and prediction assessment. With a focus on prediction assessment, we compare several methods for estimating the ’true ’ prediction error of a prediction model in the presence of feature selection. For small studies where features are selected from thousands of candidates, the resubstitution and simple splitsample estimates are seriously biased. In these small samples, leaveoneout (LOOCV), 10fold crossvalidation (CV), and the.632+ bootstrap have the smallest bias for diagonal discriminant analysis, nearest neighbor, and classification trees. LOOCV and 10fold CV have the smallest bias for linear discriminant analysis. Additionally, LOOCV, 5 and 10fold CV, and the.632+ bootstrap have the lowest mean square error. The.632+ bootstrap is quite biased in small sample sizes with strong signal to noise ratios. The differences in performance among resampling methods are reduced as the number of specimens available increases. Supplementary Information: R code for simulations and analyses is available from the authors. Tables and figures for all analyses are available at
Selfconcordant analysis for logistic regression
"... Most of the nonasymptotic theoretical work in regression is carried out for the square loss, where estimators can be obtained through closedform expressions. In this paper, we use and extend tools from the convex optimization literature, namely selfconcordant functions, to provide simple extensio ..."
Abstract

Cited by 46 (14 self)
 Add to MetaCart
(Show Context)
Most of the nonasymptotic theoretical work in regression is carried out for the square loss, where estimators can be obtained through closedform expressions. In this paper, we use and extend tools from the convex optimization literature, namely selfconcordant functions, to provide simple extensions of theoretical results for the square loss to the logistic loss. We apply the extension techniques to logistic regression with regularization by the ℓ2norm and regularization by the ℓ1norm, showing that new results for binary classification through logistic regression can be easily derived from corresponding results for leastsquares regression. 1
Penalized modelbased clustering with application to variable selection
 Journal of Machine Learning Research
, 2007
"... Variable selection in clustering analysis is both challenging and important. In the context of modelbased clustering analysis with a common diagonal covariance matrix, which is especially suitable for “high dimension, low sample size ” settings, we propose a penalized likelihood approach with an L1 ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
(Show Context)
Variable selection in clustering analysis is both challenging and important. In the context of modelbased clustering analysis with a common diagonal covariance matrix, which is especially suitable for “high dimension, low sample size ” settings, we propose a penalized likelihood approach with an L1 penalty function, automatically realizing variable selection via thresholding and delivering a sparse solution. We derive an EM algorithm to fit our proposed model, and propose a modified BIC as a model selection criterion to choose the number of components and the penalization parameter. A simulation study and an application to gene function prediction with gene expression profiles demonstrate the utility of our method.
A proximal iteration for deconvolving Poisson noisy images using sparse representations
"... ..."
(Show Context)
Simultaneous Inference: When Should Hypothesis Testing Problems Be Combined?
"... Modern statisticians are often presented with hundreds or thousands of hypothesis testing problems to evaluate at the same time, generated from new scientific technologies such as microarrays, medical and satellite imaging devices, or flow cytometry counters. The relevant statistical literature ten ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
Modern statisticians are often presented with hundreds or thousands of hypothesis testing problems to evaluate at the same time, generated from new scientific technologies such as microarrays, medical and satellite imaging devices, or flow cytometry counters. The relevant statistical literature tends to begin with the tacit assumption that a single combined analysis, for instance a False Discovery Rate assessment, should be applied to the entire set of problems at hand. This can be a dangerous assumption, as the examples in the paper show, leading to overly conservative or overly liberal conclusions within any particular subclass of the cases. A simple Bayesian theory yields a succinct description of the effects of separation or combination on false discovery rate analyses. The theory allows efficient testing within small subclasses, and has applications to “enrichment”, the detection of multicase effects. Key Words: false discovery rates, Twoclass model, enrichment 1. Introduction Modern scientific devices such as microarrays routinely provide the statistician with thousands of hypothesis testing problems to consider at the same time. A