## Regularization and variable selection via the Elastic Net (2005)

### Cached

### Download Links

Venue: | Journal of the Royal Statistical Society, Series B |

Citations: | 362 - 8 self |

### BibTeX

@ARTICLE{Zou05regularizationand,

author = {Hui Zou and Trevor Hastie},

title = {Regularization and variable selection via the Elastic Net},

journal = {Journal of the Royal Statistical Society, Series B},

year = {2005},

volume = {67},

pages = {301--320}

}

### Years of Citing Articles

### OpenURL

### Abstract

Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together.The elastic net is particularly useful when the number of predictors (p) is much bigger than the number of observations (n). By contrast, the lasso is not a very satisfactory variable selection method in the p n case. An algorithm called LARS-EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lasso.

### Citations

1958 | Matrix computations - Golub, Loan - 1996 |

1832 | Regression shrinkage and selection via the lasso
- Tibshirani
- 1996
(Show Context)
Citation Context ...n 2.3. (c) For usual n>p situations, if there are high correlations between predictors, it has been empirically observed that the prediction performance of the lasso is dominated by ridge regression (=-=Tibshirani, 1996-=-). Scenarios (a) and (b) make the lasso an inappropriate variable selection method in some situations. We illustrate our points by considering the gene selection problem in microarray data analysis. A... |

1227 | Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring
- Golub, Slomin, et al.
- 1999
(Show Context)
Citation Context ...comes the difficulty of p ≫ n and has the ability to do grouped selection. We use the leukaemia data to illustrate the elastic net classifier. The leukaemia data consist of 7129 genes and 72 samples (=-=Golub et al., 1999-=-). In the training data set, there are 38 samples, among which 27 are type 1 leukaemia (acute lymphoblastic leukaemia) and 11 are type 2 leukaemia (acute myeloid leukaemia). The goal is to construct a... |

1139 |
Significance analysis of microarrays applied to the ionizing radiation response
- G, Tibshirani, et al.
- 2001
(Show Context)
Citation Context ...between predictors and treats them as independent variables. Although this may be considered illegitimate, UST and its variants are used in other methods such as significance analysis of microarrays (=-=Tusher et al., 2001-=-) and the nearest shrunken centroids classifier (Tibshirani et al., 2002), and have shown good empirical performance. The elastic net naturally bridges the lasso and UST. 3.4. Computation: the algorit... |

863 |
The Elements of
- Hastie, Tibshirani, et al.
- 2009
(Show Context)
Citation Context ...uivalent to replacing ˆΣ with its shrunken version in the lasso. In linear discriminant analysis, the prediction accuracy can often be improved by replacing ˆΣ by a shrunken estimate (Friedman, 1989; =-=Hastie et al., 2001-=-). Likewise we improve the lasso by regularizing ˆΣ in equation (15). 3.3. Connections with univariate soft thresholding The lasso is a special case of the elastic net with λ2 = 0. The other interesti... |

752 | Least angle regression
- Efron, Hastie, et al.
(Show Context)
Citation Context ... them is selected (‘grouped selection’). For this kind of p ≫ n and grouped variables situation, the lasso is not the ideal method, because it can only select at most n variables out of p candidates (=-=Efron et al., 2004-=-), and it lacks the ability to reveal the grouping information. As for prediction performance, scenario (c) is not rare in regression problems. So it is possible to strengthen further the prediction p... |

681 | Gene Selection for Cancer Classification Using Support Vector
- Guyon, Weston, et al.
- 2002
(Show Context)
Citation Context ...enes in a satisfactory way. Most of the popular classifiers fail with respect to at least one of the above properties. The lasso is good at (a) but fails both (b) and (c). The support vector machine (=-=Guyon et al., 2002-=-) and penalized logistic regression (Zhu and Hastie, 2004) are very successful classifiers, but they cannot do gene selection automatically and both use either univariate ranking (Golub et al.,sStanda... |

362 |
Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression
- Tibshirani, Hastie, et al.
- 2002
(Show Context)
Citation Context ...h this may be considered illegitimate, UST and its variants are used in other methods such as significance analysis of microarrays (Tusher et al., 2001) and the nearest shrunken centroids classifier (=-=Tibshirani et al., 2002-=-), and have shown good empirical performance. The elastic net naturally bridges the lasso and UST. 3.4. Computation: the algorithm LARS-EN We propose an efficient algorithm called LARS-EN to solve the... |

343 | Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties - Fan, Li - 2001 |

303 | Regularized discriminant analysis
- Friedman
- 1989
(Show Context)
Citation Context ...athematically equivalent to replacing ˆΣ with its shrunken version in the lasso. In linear discriminant analysis, the prediction accuracy can often be improved by replacing ˆΣ by a shrunken estimate (=-=Friedman, 1989-=-; Hastie et al., 2001). Likewise we improve the lasso by regularizing ˆΣ in equation (15). 3.3. Connections with univariate soft thresholding The lasso is a special case of the elastic net with λ2 = 0... |

232 |
A Statistical View of Some Chemometrics Regression Tools (with discussion)," Technometries
- Friedman
- 1993
(Show Context)
Citation Context ...he lasso does both continuous shrinkage and automatic variable selection simultaneously. Tibshirani (1996) and Fu (1998) compared the prediction performance of the lasso, ridge and bridge regression (=-=Frank and Friedman, 1993-=-) and found that none of them uniformly dominates the other two. However, as variable selection becomes increasingly important in modern data analysis, the lasso is much more appealing owing to its sp... |

193 |
Predicting the clinical status of human breast cancer by using gene expression profiles
- West, Blanchette, et al.
(Show Context)
Citation Context ...n, where the naïve elastic net can be viewed as a two-stage procedure: a ridge-type direct shrinkage followed by a lasso-type thresholding. 2.3. The grouping effect In the ‘large p, small n’ problem (=-=West et al., 2001-=-), the ‘grouped variables’ situation is a particularly important concern, which has been addressed many times in the literature. For example, principal component analysis has been used to construct me... |

115 | Gen shaving’ as a method for identifying distinct sets of genes with similar expression patterns - Hastie, Tibshirani, et al. - 2000 |

113 | Statistical behavior and consistency of classification methods based on convex risk minimization
- Zhang
- 2003
(Show Context)
Citation Context ...es automatic gene selection.s318 H. Zou and T. Hastie Although our methodology is motivated by regression problems, the elastic net penalty can be used in classification problems with any consistent (=-=Zhang, 2004-=-) loss functions, including the L2-loss which we have considered here and binomial deviance. Some nice properties of the elastic net are better understood in the classification paradigm. For example, ... |

105 | Penalized regressions: the bridge versus the lasso
- Fu
- 1998
(Show Context)
Citation Context ...en y and x1 − x2. It is easy to construct examples such that ρ = corr.x1, x2/ → 1 but cos.θ/ does not vanish. 2.4. Bayesian connections and the Lq-penalty Bridge regression (Frank and Friedman, 1993; =-=Fu, 1998-=-) has J.β/ =|β| q q = Σ p j=1 |βj| q in equation (7), which is a generalization of both the lasso (q = 1) and ridge regression (q = 2). Thesbridge estimator can be viewed as the Bayes posterior mode u... |

69 | Boosting as a Regularized Path to a Maximum Margin Classifier
- Zhu, J, et al.
- 2004
(Show Context)
Citation Context ...t region and then slightly increases (Hastie et al., 2001). This is no coincidence. In fact we have discovered that the elastic net penalty has a close connection with the maximum margin explanation (=-=Rosset et al., 2004-=-) to the success of the support vector machine and boosting. Thus Fig. 6 has a nice margin-based explanation. We have made some progress in using the elastic net penalty in classification, which will ... |

34 | Supervised harvesting of expression trees
- Hastie, Tibshirani, et al.
- 2001
(Show Context)
Citation Context ...literature. For example, principal component analysis has been used to construct methods for finding a set of highly correlated genes in Hastie et al. (2000) and Díaz-Uriarte (2003). Tree harvesting (=-=Hastie et al., 2003-=-) uses supervised learning methods to select groups of predictive genes found by hierarchical clustering. Using an algorithmic approach, Dettling and Bühlmann (2004) performed the clustering and super... |

26 | Finding predictive gene groups from microarray data - Dettling, Bühlmann |

26 | Regression approaches for microarray data analysis - Segal, Dahlquist, et al. |

18 |
Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate, II: Radical prostatectomy treated patients
- Stamey, Kabalin, et al.
- 1989
(Show Context)
Citation Context ...onsider only the best k within 500. From now on we drop the subscript of λ2 if s or k is the other parameter. 4. Prostate cancer example The data in this example come from a study of prostate cancer (=-=Stamey et al., 1989-=-). The predictors are eight clinical measures: log(cancer volume) (lcavol), log(prostate weight) (lweight), age, the logarithm of the amount of benign prostatic hyperplasia (lbph), seminal vesicle inv... |

11 |
R.W.: Ridge regression
- Hoerl, Kennard
- 1970
(Show Context)
Citation Context ...er of predictors is large. It is well known that OLS often does poorly in both prediction and interpretation. Penalization techniques have been proposed to improve OLS. For example, ridge regression (=-=Hoerl and Kennard, 1988-=-) minimizes the residual sum of squares subject to a bound on the L2-norm of the coefficients. As a continuous shrinkage method, ridge regression achieves its better prediction performance through a b... |

7 | Discussion of boosting papers
- Friedman, Hastie, et al.
(Show Context)
Citation Context ...ing and feature extraction. Recently the lasso was used to explain the success of boosting: boosting performs a high dimensional lasso without explicitly using the lasso penalty (Hastie et al., 2001; =-=Friedman et al., 2004-=-). Our results offer other insights into the lasso, and ways to improve it. Acknowledgements We thank Rob Tibshirani and Ji Zhu for helpful comments, and an Associate Editor and referee for their usef... |

2 | A simple method for finding molecular signatures from gene expression data. Technical Report. Spanish National Cancer Center. (Available from http://www.arxiv.org/abs/q-bio. QM/0401043 - Díaz-Uriarte - 2003 |