## The group Lasso for logistic regression (2008)

Venue: | Journal of the Royal Statistical Society, Series B |

Citations: | 140 - 7 self |

### BibTeX

@ARTICLE{Meier08thegroup,

author = {Lukas Meier and Sara Van De Geer and Peter Bühlmann and Eidgenössische Technische Hochschule Zürich},

title = {The group Lasso for logistic regression},

journal = {Journal of the Royal Statistical Society, Series B},

year = {2008}

}

### OpenURL

### Abstract

Summary. The group lasso is an extension of the lasso to do variable selection on (predefined) groups of variables in linear regression models. The estimates have the attractive property of being invariant under groupwise orthogonal reparameterizations. We extend the group lasso to logistic regression models and present an efficient algorithm, that is especially suitable for high dimensional problems, which can also be applied to generalized linear models to solve the corresponding convex optimization problem. The group lasso estimator for logistic regression is shown to be statistically consistent even if the number of predictors is much larger than sample size but with sparse true underlying structure. We further use a two-stage procedure which aims for sparser models than the group lasso, leading to improved prediction performance for some cases. Moreover, owing to the two-stage nature, the estimates can be constructed to be hierarchical. The methods are used on simulated and real data sets about splice site detection in DNA sequences.

### Citations

1832 | Regression shrinkage and selection via the lasso
- Tibshirani
- 1996
(Show Context)
Citation Context ...o the two-stage nature the estimates can be constructed to be hierarchical. The methods are used on simulated and real datasets about splice site detection in DNA sequences. 1 Introduction The Lasso (=-=Tibshirani, 1996-=-), originally proposed for linear regression models, has become a popular model selection and shrinkage estimation method. In the usual linear regression setup we have a continuous response Y ∈ R n , ... |

752 | Least angle regression
- Efron, Hastie, et al.
(Show Context)
Citation Context ...es this problem by working with an updated local bound on the second derivative and by restricting the change of the current parameter to a local neighbourhood. For linear models, the LARS-algorithm (=-=Efron et al., 2004-=-) is very efficient for computing the path of Lasso solutions { � β λ}λ≥0. For logistic regression, approximate path following algorithms have been proposed (Rosset, 2005; Zhao and Yu, 2004; Park and ... |

742 | Prediction of complete gene structures in human – 84- genomic DNA
- Burge, Karlin
- 1997
(Show Context)
Citation Context ...3.38) (2.39) (3.28) 12s5 Application to Splice Site detection The prediction of short DNA motifs plays an important role in many areas of computational biology. Gene finding algorithms such as GENIE (=-=Burge and Karlin, 1997-=-) often rely on the prediction of splice sites. Splice sites are the regions between coding (exons) and non-coding (introns) DNA segments. The 5’ end of an intron is called a donor splice site and the... |

504 | Model selection and estimation in regression with grouped variables
- Yuan, Lin
(Show Context)
Citation Context ...witzerlandsThe Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Seminar für Statistik ETH Zentrum CH-8092 Zürich, Switzerland April, 2006 Abstract The Group Lasso (=-=Yuan and Lin, 2006-=-) is an extension of the Lasso to do variable selection on (predefined) groups of variables in linear regression models. The estimates have the attractive property of being invariant under groupwise o... |

162 | A new approach to variable selection in least squares problems
- Osborne, Presnell, et al.
- 2000
(Show Context)
Citation Context ...local bound on the second derivative and by restricting the change in the current parameter to a local neighbourhood. For linear models, the least angle regression algorithm lars (Efron et al., 2004; =-=Osborne et al., 2000-=-) is very efficient for computing the path of lasso solutions { ˆβ λ}λ�0. For logistic regression, approximate path following algorithms have been proposed (Rosset, 2005; Zhao and Yu, 2007; Park and H... |

113 | Large-scale Bayesian logistic regression for text categorization
- Genkin, Madigan
- 2006
(Show Context)
Citation Context ...ameter λ, some components of � β λ are set exactly to zero. The ℓ1-type penalty of the Lasso can also be applied to other models as for example Cox regression (Tibshirani, 1997), logistic regression (=-=Genkin et al., 2004-=-) or multinomial logistic regression (Krishnapuram et al., 2005) by replacing the residual sum of squares by the corresponding negative log-likelihood function. Already for the special case in linear ... |

113 | Sparse multinomial logistic regression: fast algorithms and generalization bounds. IEEE Transaction on pattern analysis and machine learning
- Krishnapuram, Carin, et al.
- 2005
(Show Context)
Citation Context ...o. The ℓ1-type penalty of the Lasso can also be applied to other models as for example Cox regression (Tibshirani, 1997), logistic regression (Genkin et al., 2004) or multinomial logistic regression (=-=Krishnapuram et al., 2005-=-) by replacing the residual sum of squares by the corresponding negative log-likelihood function. Already for the special case in linear regression when not only continuous but also categorical predic... |

113 | Convergence of a block coordinate descent method for nondifferentiable minimization
- Tseng
(Show Context)
Citation Context ...ion problem which is inherently more difficult than ℓ1-penalized logistic regression. The algorithms rely on recent theory and developments for block coordinate and block coordinate gradient descent (=-=Tseng, 2001-=-; Tseng and Yun, 2006) and we give rigorous proofs that our algorithms yield a solution of the corresponding convex optimization problem. Moreover, we present a statistical consistency theory for the ... |

61 |
The lasso method for variable selection
- Tibshirani
- 1997
(Show Context)
Citation Context ...Rn . For large values of the penalty parameter λ, some components of � β λ are set exactly to zero. The ℓ1-type penalty of the Lasso can also be applied to other models as for example Cox regression (=-=Tibshirani, 1997-=-), logistic regression (Genkin et al., 2004) or multinomial logistic regression (Krishnapuram et al., 2005) by replacing the residual sum of squares by the corresponding negative log-likelihood functi... |

61 | Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals
- Yeo, Burge
- 2004
(Show Context)
Citation Context ... other (possibly continuous) predictor variables as for example global sequence information could be naturally included in the Group Lasso approach to improve the rather low correlation coefficients (=-=Yeo and Burge, 2004-=-). 6 Discussion We study the Group Lasso for logistic regression. We present efficient algorithms, especially suitable for high-dimensional problems, for solving the corresponding convex optimization ... |

60 |
A Simple and Efficient Algorithm for Gene Selection using Sparse Logistic Regression
- Shevade, Keerthi
- 2003
(Show Context)
Citation Context ...s of ˆβ λ are set exactly to 0. The l1-type penalty of the lasso can also be applied to other models as for example Cox regression (Tibshirani, 1997), logistic regression (Lokhorst, 1999; Roth, 2004; =-=Shevade and Keerthi, 2003-=-; Genkin et al., 2007) or multinomial logistic regression (Krishnapuram et al., 2005) by replacing the residual sum of squares by the corresponding negative log-likelihood function. Already for the sp... |

57 | Logistic regression in rare events data
- King, Zeng
- 2001
(Show Context)
Citation Context ...s are fitted on the balanced training dataset. As the ratio of true to false splice sites strongly differs from the training to the validation and the test set, the intercept is corrected as follows (=-=King and Zeng, 2001-=-): �β corr 0 = � � � � � ¯y π β0 − log + log , 1 − ¯y 1 − π where π is the proportion of true sites in the validation set. Penalty parameters λ and κ are selected according to the (unpenalized) log-li... |

51 |
A Coordinate Gradient Descent Method for Nonsmooth Separable
- Tseng, Yun
(Show Context)
Citation Context ...s ideally chosen to be close to the corresponding submatrix of the Hessian of the log-likelihood function. Restricting ourselves to diagonal matrices H (t) gg = h (t) g · Idfg , a possible choice is (=-=Tseng and Yun, 2006-=-) h (t) � � g = − max diag −∇ 2 ℓ( � β (t) � � )gg , c∗ , (2.4) where c∗ > 0 is a lower bound to ensure convergence (see Proposition 2.2). In Tseng and Yun (2006) c∗ = 1 is used which also worked well... |

47 | The generalized lasso
- Roth
(Show Context)
Citation Context ...me components of ˆβ λ are set exactly to 0. The l1-type penalty of the lasso can also be applied to other models as for example Cox regression (Tibshirani, 1997), logistic regression (Lokhorst, 1999; =-=Roth, 2004-=-; Shevade and Keerthi, 2003; Genkin et al., 2007) or multinomial logistic regression (Krishnapuram et al., 2005) by replacing the residual sum of squares by the corresponding negative log-likelihood f... |

46 |
Regularization of wavelets approximations (with discussions
- Antoniadis, Fan
(Show Context)
Citation Context ...how the dummy variables are encoded. Choosing different contrasts for a categorical predictor will produce different solutions in general. The group lasso (Yuan and Lin, 2006; Bakin, 1999; Cai, 2001; =-=Antoniadis and Fan, 2001-=-) overcomes these problems by introducing a suitable extension Address for correspondence: Lukas Meier, Seminar für Statistik, Eidgenössische Technische Hochschule Zürich, Leonhardstrasse 27, CH-8092 ... |

45 | Blockwise sparse regression - KIM, KIM, et al. - 2006 |

39 |
Adaptive regression and model selection in data mining problems
- Bakin
- 1999
(Show Context)
Citation Context ...sso solution depends on how the dummy variables are encoded. Choosing different contrasts for a categorical predictor will produce different solutions in general. The group lasso (Yuan and Lin, 2006; =-=Bakin, 1999-=-; Cai, 2001; Antoniadis and Fan, 2001) overcomes these problems by introducing a suitable extension Address for correspondence: Lukas Meier, Seminar für Statistik, Eidgenössische Technische Hochschule... |

34 |
An ℓ1 regularization-path algorithm for generalized linear models
- Park, Hastie
(Show Context)
Citation Context ...thods which can easily deal with predictor dimensions in the thousands, in contrast to Kim et al. (2006). We do not aim for an (approximate) path-following algorithm (Rosset, 2005; Zhao and Yu, 2004; =-=Park and Hastie, 2006-=-) but our approaches are very fast (also in case of the Lasso penalty) for computing a whole range of solutions for varying penalty parameters on a (fixed) grid. Our approach is related to Genkin et a... |

25 | Modeling dependencies in pre-mRNA splicing signals - Burge - 1998 |

19 | High-dimensional generalized linear models and the - GEER - 2008 |

19 | The LASSO method for variable selection in the Cox model
- Tibshirani
- 1997
(Show Context)
Citation Context ...u ∈ Rn . For large values of the penalty parameter λ, some components of ˆβ λ are set exactly to 0. The l1-type penalty of the lasso can also be applied to other models as for example Cox regression (=-=Tibshirani, 1997-=-), logistic regression (Lokhorst, 1999; Roth, 2004; Shevade and Keerthi, 2003; Genkin et al., 2007) or multinomial logistic regression (Krishnapuram et al., 2005) by replacing the residual sum of squa... |

18 |
Following curved regularized optimization solution paths
- Rosset
- 2005
(Show Context)
Citation Context ...hms. We present very efficient methods which can easily deal with predictor dimensions in the thousands, in contrast to Kim et al. (2006). We do not aim for an (approximate) path-following algorithm (=-=Rosset, 2005-=-; Zhao and Yu, 2004; Park and Hastie, 2006) but our approaches are very fast (also in case of the Lasso penalty) for computing a whole range of solutions for varying penalty parameters on a (fixed) gr... |

16 | Algorithms for sparse linear classifiers in the massive data setting - Balakrishnan, Madigan |

13 |
Lasso with relaxation
- Meinshausen
- 2005
(Show Context)
Citation Context ...cted by the group lasso are large compared with the underlying true models. For the ordinary lasso, smaller models with good prediction performance can be obtained by using the lasso with relaxation (=-=Meinshausen, 2007-=-). This idea can also be incorporated in the (logistic) group lasso approach and our proposal will also allow us to fit hierarchical models. Denote by Îλ ⊆ {0,...,G} the index set of predictors that a... |

6 | Classifiers of support vector machine type, with ℓ1 penalty - Geer, Tarigan - 2006 |

4 | Adaptive quantile regression - Geer - 2003 |

3 |
Discussion of “Regularization of Wavelets Approximations” by A. Antoniadis and
- Cai
(Show Context)
Citation Context ...depends on how the dummy variables are encoded. Choosing different contrasts for a categorical predictor will produce different solutions in general. The group lasso (Yuan and Lin, 2006; Bakin, 1999; =-=Cai, 2001-=-; Antoniadis and Fan, 2001) overcomes these problems by introducing a suitable extension Address for correspondence: Lukas Meier, Seminar für Statistik, Eidgenössische Technische Hochschule Zürich, Le... |

3 |
The LASSO and Generalised Linear Models. Honors Project The
- Lokhorst
- 1999
(Show Context)
Citation Context ... parameter λ, some components of ˆβ λ are set exactly to 0. The l1-type penalty of the lasso can also be applied to other models as for example Cox regression (Tibshirani, 1997), logistic regression (=-=Lokhorst, 1999-=-; Roth, 2004; Shevade and Keerthi, 2003; Genkin et al., 2007) or multinomial logistic regression (Krishnapuram et al., 2005) by replacing the residual sum of squares by the corresponding negative log-... |

1 |
Lasso with Relaxation. Tech. rep., Seminar für Statistik
- Meinshausen
- 2005
(Show Context)
Citation Context ...dels selected by the Group Lasso are large compared to the underlying true models. For the ordinary Lasso, smaller models with good prediction performance can be obtained using Lasso with relaxation (=-=Meinshausen, 2005-=-). This idea can also be incorporated into the (logistic) Group Lasso approach and our proposal will also allow to fit hierarchical models. Denote by � Iλ ⊆ {0, . . . , G} the index set of predictors ... |

1 | 389–403. Lasso for Logistic Regression 71 - Anal - 2006 |