Results 1 
8 of
8
Regression Shrinkage and Selection Via the Lasso
 Journal of the Royal Statistical Society, Series B
, 1994
"... We propose a new method for estimation in linear models. The "lasso" minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly zero and ..."
Abstract

Cited by 1828 (36 self)
 Add to MetaCart
We propose a new method for estimation in linear models. The "lasso" minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly zero and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and treebased models are briefly described. Keywords: regression, subset selection, shrinkage, quadratic programming. 1 Introduction Consider the usual regression situation: we h...
An equivalence between sparse approximation and Support Vector Machines
 A.I. Memo 1606, MIT Arti cial Intelligence Laboratory
, 1997
"... This publication can be retrieved by anonymous ftp to publications.ai.mit.edu. The pathname for this publication is: aipublications/15001999/AIM1606.ps.Z This paper shows a relationship between two di erent approximation techniques: the Support Vector Machines (SVM), proposed by V.Vapnik (1995), ..."
Abstract

Cited by 205 (7 self)
 Add to MetaCart
This publication can be retrieved by anonymous ftp to publications.ai.mit.edu. The pathname for this publication is: aipublications/15001999/AIM1606.ps.Z This paper shows a relationship between two di erent approximation techniques: the Support Vector Machines (SVM), proposed by V.Vapnik (1995), and a sparse approximation scheme that resembles the Basis Pursuit DeNoising algorithm (Chen, 1995 � Chen, Donoho and Saunders, 1995). SVM is a technique which can be derived from the Structural Risk Minimization Principle (Vapnik, 1982) and can be used to estimate the parameters of several di erent approximation schemes, including Radial Basis Functions, algebraic/trigonometric polynomials, Bsplines, and some forms of Multilayer Perceptrons. Basis Pursuit DeNoising is a sparse approximation technique, in which a function is reconstructed by using a small number of basis functions chosen from a large set (the dictionary). We show that, if the data are noiseless, the modi ed version of Basis Pursuit DeNoising proposed in this paper is equivalent to SVM in the following sense: if applied to the same data set the two techniques give the same solution, which is obtained by solving the same quadratic programming problem. In the appendix we also present a derivation of the SVM technique in the framework of regularization theory, rather than statistical learning theory, establishing a connection between SVM, sparse approximation and regularization theory.
Pathwise coordinate optimization
, 2007
"... We consider “oneatatime ” coordinatewise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the L1penalized regression (lasso) in the lterature, but it seems to have been largely ignored. Indeed, it seems that coordinatewise algorith ..."
Abstract

Cited by 166 (19 self)
 Add to MetaCart
We consider “oneatatime ” coordinatewise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the L1penalized regression (lasso) in the lterature, but it seems to have been largely ignored. Indeed, it seems that coordinatewise algorithms are not often used in convex optimization. We show that this algorithm is very competitive with the well known LARS (or homotopy) procedure in large lasso problems, and that it can be applied to related methods such as the garotte and elastic net. It turns out that coordinatewise descent does not work in the “fused lasso ” however, so we derive a generalized algorithm that yields the solution in much less time that a standard convex optimizer. Finally we generalize the procedure to the twodimensional fused lasso, and demonstrate its performance on some image smoothing problems.
LESS: a ModelBased Classifier for Sparse Subspaces
"... Abstract In this paper we specifically focus on high dimensional data sets for which the number of dimensions is an order of magnitude higher than the number of objects. From a classifier design standpoint, such small sample size problems have some interesting challenges. The first challenge is to f ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract In this paper we specifically focus on high dimensional data sets for which the number of dimensions is an order of magnitude higher than the number of objects. From a classifier design standpoint, such small sample size problems have some interesting challenges. The first challenge is to find, from all hyperplanes that separate the classes, a separating hyperplane which generalizes well for future data. A second important task is to determine which features are required to distinguish the classes. To attack these problems, we propose the LESS (Lowest Error in a Sparse Subspace) classifier that efficiently finds linear discriminants in a sparse subspace. In contrast with most classifiers for high dimensional data sets, the LESS classifier incorporates a (simple) data model. Further, by means of a regularization parameter the classifier establishes a suitable tradeoff between subspace sparseness and classification accuracy. In the experiments we show how LESS performs on several high dimensional data sets and compare its performance to related stateoftheart classifiers like among others linear ridge regression with the LASSO and the Support Vector Machine. It turns out that LESS performs competitively while using fewer dimensions.
Statistical Learning for Analyzing Functional Genomic Data
, 2006
"... signatures single biomarkers Prognostic Factor Studies response to treatment toxicity survival Custom Drug Selection predictive factors for response/ resistance to certain therapy indicators of adverse events ..."
Abstract
 Add to MetaCart
signatures single biomarkers Prognostic Factor Studies response to treatment toxicity survival Custom Drug Selection predictive factors for response/ resistance to certain therapy indicators of adverse events
Sparse Penalized Forward Selection for Support Vector Classification
"... We propose a new binary classification and variable selection technique especially designed for high dimensional predictors. Among many predictors, typically, only a small fraction of them have significant impact on prediction. In such a situation, more interpretable models with better prediction ac ..."
Abstract
 Add to MetaCart
We propose a new binary classification and variable selection technique especially designed for high dimensional predictors. Among many predictors, typically, only a small fraction of them have significant impact on prediction. In such a situation, more interpretable models with better prediction accuracy can be obtained by variable selection along with classification. By adding an ℓ1type penalty to the loss function, common classification methods such as logistic regression or support vector machines (SVM) can perform variable selection. Existing penalized SVM methods all attempt to jointly solve all the parameters involved in the penalization problem altogether. When data dimension is very high, the joint optimization problem is very complex and involves a lot of memory allocation. In this article, we propose a new penalized forward search technique which can reduce highdimensional optimization problems to one dimensional optimization by iterating the selection steps. The new algorithm can be regarded as a forward selection version of the penalized SVM and its variants. The advantage of optimizing in one dimension is that the location of the optimum solution can be obtained with intelligent search by exploiting convexity and a piecewise polynomial structure of the criterion function. In each step, the predictor which is most able to predict the outcome is chosen in the model. The search is then repeatedly used in an iterative fashion until convergence occurs. Comparison of our new classification rule with commonly used SVMbased techniques its promising performance, leading to much leaner models without compromising misclassification rates, particularly for high dimensional predictors.
Iterative Selection using Orthogonal Regression Techniques ∗
, 2012
"... High dimensional data are nowadays encountered in various branches of science. Variable selection techniques play a key role in analyzing high dimensional data. Generally two approaches for variable selection in the high dimensional data setting are considered — forward selection methods and penaliz ..."
Abstract
 Add to MetaCart
High dimensional data are nowadays encountered in various branches of science. Variable selection techniques play a key role in analyzing high dimensional data. Generally two approaches for variable selection in the high dimensional data setting are considered — forward selection methods and penalization methods. In the former, variables are introduced in the model one at a time depending on their ability to explain variation and the procedure is terminated at some stage following some stopping rule. For ultrahigh dimensional data, [Wang 2011] studied forward regression for variable screening. In penalization techniques such as the LASSO, an optimization procedure is carried out with an added carefully chosen penalty function, so that the solutions have a sparse structure. Recently, the idea of penalized forward selection has been introduced by [Hwang, Zhang and Ghosal, 2009]. The motivation comes from the fact that the penalization techniques like LASSO give rise to closed form expression when used in one dimension, just like the least square estimator. Hence one can repeat such a procedure in a forward selection setting until it converges. The resulting procedure selects sparser models than comparable methods without compromising on predictive power. However, when the regressor is high dimensional, it is typical that many predictors are highly correlated. We show that in such situations, it is possible to improve stability and computation efficiency of the procedure further by introducing an orthogonalization step. At each selection step, variables potentially available to be selected in the model are screened on the basis of their correlation with variables already in the model, thus preventing unnecessary duplication. The new strategy, called the Selection Technique in Orthogonalized Regression Models (STORM), turns out to be extremely successful in reducing the model dimension further and also leads to improved predicting power. We carry out a detailed simulation study to compare the newly proposed method with existing ones and analyze a real dataset.