#### DMCA

## Regularization paths for generalized linear models via coordinate descent (2009)

### Cached

### Download Links

Citations: | 722 - 15 self |

### Citations

4200 | Regression shrinkage and selection via the lasso
- Tibshirani
- 1997
(Show Context)
Citation Context ...dle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods. 1 Introduction The lasso [=-=Tibshirani, 1996-=-] is a popular method for regression that uses an ℓ1 penalty to achieve a sparse solution. In the signal processing literature, the lasso is also known as basis pursuit [Chen et al., 1998]. This idea ... |

2712 | Atomic decomposition by basis pursuit
- Chen, Donoho, et al.
- 1998
(Show Context)
Citation Context ...on The lasso [Tibshirani, 1996] is a popular method for regression that uses an ℓ1 penalty to achieve a sparse solution. In the signal processing literature, the lasso is also known as basis pursuit [=-=Chen et al., 1998-=-]. This idea has been broadly applied, for example to generalized linear models [Tibshirani, 1996] and Cox’s proportional hazard models for survival data [Tibshirani, 1997]. In recent years, there has... |

1774 | Molecular classification of cancer: class discovery and class prediction by gene expression monitoring
- Golub, Slonim, et al.
- 1999
(Show Context)
Citation Context ...e sparsity of the solution to (1) (i.e. the number of coefficients equal to zero) increases monotonically from 0 to the sparsity of the lasso solution. Figure 1 shows an example. The dataset is from [=-=Golub et al., 1999-=-b], consisting of 72 observations on 3571 genes measured with DNA microarrays. The observations fall in two classes: we treat this as a regression problem for illustration. The coefficient profiles fr... |

1325 | Least angle regression - Efron, Hastie, et al. - 2004 |

1266 | Ideal spatial adaptation by wavelet shrinkage
- Donoho, Johnstone
- 1994
(Show Context)
Citation Context .... If ˜βj > 0, then ∂R ∂βj | β= ˜ β = − 1 N N∑ i=1 xij(yi − ˜ βo − x T i ˜ β) + λ(1 − α)βj + λα. (4) A similar expression exists if ˜ βj < 0, and ˜ βj = 0 is treated separately. Simple calculus shows [=-=Donoho and Johnstone, 1994-=-] that the coordinate4● ● ● ● ● ● ● ● ● ● 2 4 6 8 10 −0.02 0.00 0.02 0.04 0.06 Step Coefficients ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ... |

1156 | Model selection and estimation in regression with grouped variables
- Yuan, Lin
- 2006
(Show Context)
Citation Context ...roportional hazard models for survival data (Tibshirani 1997). In recent years, there has been an enormous amount of research activity devoted to related regularization methods: 1. The grouped lasso (=-=Yuan and Lin 2007-=-; Meier et al. 2008), where variables are included or excluded in groups. 2. The Dantzig selector (Candes and Tao 2007, and discussion), a slightly modified version of the lasso. 3. The elastic net (Z... |

1107 | Core Team, 2009. R: A Language and Environment for Statistical Computing - Development |

972 | T.: Regularization and variable selection via the elastic net
- Zou, Hastie
- 2005
(Show Context)
Citation Context ...7; Meier et al. 2008), where variables are included or excluded in groups. 2. The Dantzig selector (Candes and Tao 2007, and discussion), a slightly modified version of the lasso. 3. The elastic net (=-=Zou and Hastie 2005-=-) for correlated variables, which uses a penalty that is part ℓ1, part ℓ2.2 Regularization Paths for GLMs via Coordinate Descent 4. ℓ1 regularization paths for generalized linear models (Park and Has... |

941 | Variable selection via nonconcave penalized likelihood and its oracle properties
- Fan, Li
- 2001
(Show Context)
Citation Context ...part ℓ2.2 Regularization Paths for GLMs via Coordinate Descent 4. ℓ1 regularization paths for generalized linear models (Park and Hastie 2007a). 5. Methods using non-concave penalties, such as SCAD (=-=Fan and Li 2005-=-) and Friedman’s generalized elastic net (Friedman 2008), enforce more severe variable selection than the lasso. 6. Regularization paths for the support-vector machine (Hastie et al. 2004). 7. The gra... |

868 | The Dantzig selector: Statistical estimation when p ≫ n, Annals of Statistics - Candès, Tao - 2007 |

745 | A iterative thresholding algorithm for linear inverse problems with a sparsity constraint - Daubechies, Defrise, et al. - 2004 |

681 | The adaptive lasso and its oracle properties
- Zou
- 2006
(Show Context)
Citation Context ...nalized, and always enters the model unrestricted at the first step and remains in the model. Penalty rescaling would also allow, for example, our software to be used to implement the adaptive lasso (=-=Zou 2006-=-). Considerable speedup is obtained by organizing the iterations around the active set of features— those with nonzero coefficients. After a complete cycle through all the variables, we iterate on onl... |

595 | Sparse inverse covariance estimation with the graphical lasso - Friedman, Hastie, et al. |

559 | Newsweeder: Learning to filter netnews
- LANG
- 1995
(Show Context)
Citation Context ...ment classification problem with mostly binary features. The response is binary, and indicates whether the document is an advertisement. Only 1.2% nonzero values in the predictor matrix. • Newsgroup [=-=Lang, 1995-=-]: document classification problem. We used the training set cultured from these data by Koh et al. [2007]. The response is binary, and indicates a subclass of topics; the predictors are binary, and i... |

331 | Multiclass cancer diagnosis using tumor gene expression signatures - Ramaswamy, Tamayo, et al. |

324 | Pathwise coordinate optimization - Friedman, Hastie, et al. |

298 | Convergence of a block coordinate descent method for nondifferentiable minimization
- TSENG
- 2001
(Show Context)
Citation Context ...t (Friedman et al. 2009) implemented in the R programming system (R Development Core Team 2009). We do not revisit the wellestablished convergence properties of coordinate descent in convex problems (=-=Tseng 2001-=-) in this article. Lasso procedures are frequently used in domains with very large datasets, such as genomics and web analysis. Consequently a focus of our research has been algorithmic efficiency and... |

290 | An interior-point method for large-scale l1-regularized logistic regression - Koh, Kim, et al. - 2007 |

276 | The group lasso for logistic regression - Meier, Geer, et al. - 2008 |

242 | A new approach to variable selection in least squares problems. - Osborne, Presnell, et al. - 2000 |

202 | The entire regularization path for the support vector machine
- Hastie, Rosset, et al.
- 2004
(Show Context)
Citation Context ..., such as SCAD (Fan and Li 2005) and Friedman’s generalized elastic net (Friedman 2008), enforce more severe variable selection than the lasso. 6. Regularization paths for the support-vector machine (=-=Hastie et al. 2004-=-). 7. The graphical lasso (Friedman et al. 2008) for sparse covariance estimation and undirected graphs. Efron et al. (2004) developed an efficient algorithm for computing the entire regularization pa... |

193 | Boosting for Tumor Classification with Gene Expression - Dettling, Bühlmann - 2003 |

190 | Large-scale Bayesian logistic regression for text categorization
- Genkin, Lewis, et al.
(Show Context)
Citation Context ...s outcome z with Pr(z = 1) = p, Pr(z = 0) = 1 − p. We compared the speed of glmnet to the interior point method l1logreg (Koh et al. 2007b,a), Bayesian binary regression (BBR, Madigan and Lewis 2007; =-=Genkin et al. 2007-=-), and the lasso penalized logistic program LPL supplied by Ken Lange (see Wu and Lange 2008). The latter two methods also use a coordinate descent approach. The BBR software automatically performs te... |

190 | Sparse multinomial logistic regression: Fast algorithms and generalization bounds - Krishnapuram, Carin, et al. - 2005 |

178 | Scalable training of L1-regularized log-linear models - Andrew, Gao |

139 | Piecewise linear regularized solution paths, The Annals of Statistics 35(3 - Rosset, Zhu - 2007 |

131 |
Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR
- TR, DK, et al.
(Show Context)
Citation Context ...e number of coefficients equal to zero) increases monotonically from 0 to the sparsity of the lasso solution. Figure 1 shows an example that demonstrates the effect of varying α. The dataset is from (=-=Golub et al. 1999-=-), consisting of 72 observations on 3571 genes measured with DNA microarrays. The observations fall in two classes, so we use the penalties in conjunction with the 1 Zou and Hastie (2005) called this ... |

131 |
Regularization and variable selection via the elastic
- Zou, Hastie
(Show Context)
Citation Context ...email: hastie@stanford.edu. Sequoia Hall, Stanford University, CA94305. 12. The Dantzig selector [Candes and Tao, 2007, and discussion], a slightly modified version of the lasso; 3. The elastic net [=-=Zhou and Hastie, 2005-=-] for correlated variables, which uses a penalty that is part ℓ1, part ℓ2; 4. ℓ1 regularization paths for generalized linear models [Park and Hastie, 2006]; 5. Regularization paths for the support-vec... |

121 | L1-regularization path algorithm for generalized linear models
- Park, Hastie
(Show Context)
Citation Context ...fied version of the lasso; 3. The elastic net [Zou and Hastie, 2005] for correlated variables, which uses a penalty that is part ℓ1, part ℓ2; 4. ℓ1 regularization paths for generalized linear models [=-=Park and Hastie, 2007-=-]; 5. Regularization paths for the support-vector machine [Hastie et al., 2004]. 6. The graphical lasso [Friedman et al., 2008] for sparse covariance estimation and undirected graphs Efron et al. [200... |

119 |
The LASSO Method for Variable Selection in
- Tibshirani
- 1997
(Show Context)
Citation Context ... also known as basis pursuit (Chen et al. 1998). This idea has been broadly applied, for example to generalized linear models (Tibshirani 1996) and Cox’s proportional hazard models for survival data (=-=Tibshirani 1997-=-). In recent years, there has been an enormous amount of research activity devoted to related regularization methods: 1. The grouped lasso (Yuan and Lin 2007; Meier et al. 2008), where variables are i... |

109 | Coordinate descent algorithms for lasso penalized regression - Wu, Lange |

107 | A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics. - Krishnaj, Keerthi - 2003 |

77 | Rob Tibshirani - Bair, Hastie, et al. - 2006 |

70 | Classification of gene microarrays by penalized logistic regression,” - Zhu, Hastie - 2004 |

62 | Learning to remove internet advertisement.
- Kushmerick
- 1999
(Show Context)
Citation Context ... cancer classes. • Leukemia [Golub et al., 1999a]: gene-expression data with a binary response indicating type of leukemia—AML vs ALL. We used the preprocessed data of Dettling [2004]. • Internet-Ad [=-=Kushmerick, 1999-=-]: document classification problem with mostly binary features. The response is binary, and indicates whether the document is an advertisement. Only 1.2% nonzero values in the predictor matrix. • News... |

51 |
An L1 Regularization-path Algorithm for Generalized Linear Models.
- Park, Hastie
- 2007
(Show Context)
Citation Context ...ied version of the lasso; 3. The elastic net [Zhou and Hastie, 2005] for correlated variables, which uses a penalty that is part ℓ1, part ℓ2; 4. ℓ1 regularization paths for generalized linear models [=-=Park and Hastie, 2006-=-]; 5. Regularization paths for the support-vector machine [Hastie et al., 2004]. 6. The graphical lasso [Friedman et al., 2007b] for sparse covariance estimation and undirected graphs Efron et al. [20... |

39 |
Fast sparse regression and classification.
- Friedman
- 2008
(Show Context)
Citation Context ...escent 4. ℓ1 regularization paths for generalized linear models (Park and Hastie 2007a). 5. Methods using non-concave penalties, such as SCAD (Fan and Li 2005) and Friedman’s generalized elastic net (=-=Friedman 2008-=-), enforce more severe variable selection than the lasso. 6. Regularization paths for the support-vector machine (Hastie et al. 2004). 7. The graphical lasso (Friedman et al. 2008) for sparse covarian... |

34 | Tibshirani (2008): “Sparse inverse covariance estimation with the graphical lasso - Friedman, Hastie, et al. |

31 | L1-regularization path algorithm for generalized linear models - MY, Hastie |

27 | Defrise M and De Mol C 2004 An iterative thresholding algorithm for linear inverse problems with a sparsity constraint Commun - Daubechies |

23 | Penalized regressions: the bridge vs. the lasso - Fu - 1998 |

23 | Atomic decomposition by basis pursuit. - SS, DL, et al. - 1998 |

12 | 2006a) glmpath: L1 Regularization Path for Generalized Linear Models and Proportional Hazards Model. URL http://cran.r-project.org/src/contrib/Descriptions/glmpath.html. R package version 0.91 - Park, Hastie |

11 |
Coordinate descent procedures for lasso penalized regression.
- Wu, Lange
- 2007
(Show Context)
Citation Context ...erior point method l1logreg (Koh et al. 2007b,a), Bayesian binary regression (BBR, Madigan and Lewis 2007; Genkin et al. 2007), and the lasso penalized logistic program LPL supplied by Ken Lange (see =-=Wu and Lange 2008-=-). The latter two methods also use a coordinate descent approach. The BBR software automatically performs ten-fold cross-validation when given a set of λ values. Hence we report the total time for ten... |

8 |
The Elements of Statistical Learning: Prediction, Inference and Data Mining. 2nd edn
- Hastie
- 2009
(Show Context)
Citation Context ... “one-standard-error” rule. The top of each plot is annotated with the size of the models.18 Regularization Paths for GLMs via Coordinate Descent Alternatively, they can use K-fold cross-validation (=-=Hastie et al. 2009-=-, for example), where the training data is used both for training and testing in an unbiased way. Figure 2 illustrates cross-validation on a simulated dataset. For logistic regression, we sometimes us... |

6 | Hartemink AJ. 2005. Sparse multinomial logistic regression: fast algorithms and generalization bounds - Krishnapuram, Carin, et al. |

5 | De Geer S, Bühlmann P. 2008 The group lasso for logistic regression - Meier, Van |

5 | Turlach (2000). A new approach to variable selection in least squares problems - Osborne, Presnell, et al. |

4 | Adaptable, efficient and robust methods for regression and classification via piecewise linear regularized coefficient paths - Rosset, Zhu - 2003 |

4 |
elasticnet: Elastic Net Regularization and Variable Selection. R package version
- Zou, Hastie
- 2005
(Show Context)
Citation Context ...not much software available for elastic net. Comparisons of our glmnet code with the R package elasticnet will mimic the comparisons with lars (Hastie and Efron 2007) for the lasso, since elasticnet (=-=Zou and Hastie 2004-=-) is built on the lars package. 5.1. Regression with the lasso We generated Gaussian data with N observations and p predictors, with each pair of predictors Xj, Xj ′ having the same population correla... |

3 | Tibshirani R (2009) glmnet: Lasso and elastic-net regularized generalized linear models. Version1. Available: http://www-stat. stanford.edu/,tibs/glmnet-matlab. Accessed 16 - Friedman, Hastie - 2013 |

3 | T (2004). Classification of Expression Arrays by Penalized Logistic Regression - Zhu, Hastie |

2 | der Kooij. Prediction Accuracy and Stability of Regression with Optimal Scaling Transformations - van - 2007 |

2 | Genome-wide association analysis by penalized logistic regression. - Wu, Chen, et al. - 2009 |

2 | Penalized Regressions: the Bridge vs - Fu - 1998 |

2 | Kooij A (2007). “Prediction Accuracy and Stability of Regrsssion with Optimal Scaling Transformations - der |

2 | Prediction accuracy and stability of regrsssion with optimal sclaing transformations - Kooij - 2007 |

1 |
Efficient l1 logistic regression
- Lee, Lee, et al.
- 2006
(Show Context)
Citation Context ...o-column matrix of counts, sometimes referred to as grouped data. We discuss this in more detail in Section 4.2. • The Newton algorithm is not guaranteed to converge without stepsize optimization [in =-=Lee et al., 2006-=-]. Our code does not implement any checks for divergence; this would slow it down, and when used as recommended we do not feel it is necessary.. We have a closed form expression for the starting solut... |

1 | OWL-QN: Orthant-Wise Limited-Memory Quasi-Newton Optimizer for L1-Regularized Objectives - Andrew, Gao |

1 | package version 0.999375-30, URL http://CRAN.R-project.org/package=Matrix. of Statistical Software 19 Candes E, Tao T (2007). “The Dantzig Selector: Statistical Estimation When p is much Larger than n.” The Annals of Statistics - unknown authors |

1 |
A MATLAB Implementation of glmnet
- Jiang
- 2009
(Show Context)
Citation Context ...able under general public licence (GPL-2) from the Comprehensive R Archive Network at http://CRAN.R-project.org/package=glmnet. Sparse data inputs are handled by the Matrix package. MATLAB functions (=-=Jiang 2009-=-) are available from http://www-stat.stanford.edu/~tibs/glmnet-matlab/. Acknowledgments We would like to thank Holger Hoefling for helpful discussions, and Hui Jiang for writing the MATLAB interface t... |

1 | S (2007b). l1logreg: A Solver for L1-Regularized Logistic Regression. R package version 0.1-1. Avaliable from Kwangmoo Koh (deneb1@stanford.edu - Koh, SJ, et al. |

1 | Abbeel P, Ng A (2006). “Efficient L1 Logistic Regression - Lee, Lee |

1 |
BBR, BMR: Bayesian Logistic Regression. Open-source standalone software, URL http://www.bayesianregression.org
- Madigan, Lewis
- 2007
(Show Context)
Citation Context ...s to generate a two-class outcome z with Pr(z = 1) = p, Pr(z = 0) = 1 − p. We compared the speed of glmnet to the interior point method l1logreg (Koh et al. 2007b,a), Bayesian binary regression (BBR, =-=Madigan and Lewis 2007-=-; Genkin et al. 2007), and the lasso penalized logistic program LPL supplied by Ken Lange (see Wu and Lange 2008). The latter two methods also use a coordinate descent approach. The BBR software autom... |

1 | The lasso method for variable selection in the cox model - Soc - 1996 |