## Bolasso: model consistent lasso estimation through the bootstrap (2008)

### Cached

### Download Links

- [icml2008.cs.helsinki.fi]
- [www.di.ens.fr]
- [eprints.pascal-network.org]
- [www.di.ens.fr]
- [arxiv.org]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML |

Citations: | 39 - 14 self |

### BibTeX

@INPROCEEDINGS{Bach08bolasso:model,

author = {Francis R. Bach},

title = {Bolasso: model consistent lasso estimation through the bootstrap},

booktitle = {In Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML},

year = {2008}

}

### OpenURL

### Abstract

We consider the least-square linear regression problem with regularization by the ℓ1-norm, a problem usually referred to as the Lasso. In this paper, we present a detailed asymptotic analysis of model consistency of the Lasso. For various decays of the regularization parameter, we compute asymptotic equivalents of the probability of correct model selection (i.e., variable selection). For a specific rate decay, we show that the Lasso selects all the variables that should enter the model with probability tending to one exponentially fast, while it selects all other variables with strictly positive probability. We show that this property implies that if we run the Lasso for several bootstrapped replications of a given sample, then intersecting the supports of the Lasso bootstrap estimates leads to consistent model selection. This novel variable selection algorithm, referred to as the Bolasso, is compared favorably to other linear regression methods on synthetic data and datasets from the UCI machine learning repository. 1.

### Citations

3389 | An Introduction to the Bootstrap
- Efron, Tibshirani
- 1993
(Show Context)
Citation Context ...r, in practice, only one dataset is given; but resampling methods such as the bootstrap are exactly dedicated to mimic the availability of several datasets by resampling from the same unique dataset (=-=Efron & Tibshirani, 1998-=-). In this paper, we show that when using the bootstrap and intersecting the supports, we actually get a consistentBolasso: Model Consistent Lasso Estimation through the Bootstrap model estimate, wit... |

2765 | Bagging predictors
- Breiman
- 1996
(Show Context)
Citation Context ...cations) variable selection error is computed as the square distance between sparsity pattern indicator vectors. research. Note in particular that we compare with bagging of least-square regressions (=-=Breiman, 1996-=-a) followed by a thresholding of the loading vector, which is another simple way of using bootstrap samples: the Bolasso provides a more efficient way to use the extra information, not for usual stabi... |

816 | Least angle regression
- Efron, Hastie, et al.
- 2004
(Show Context)
Citation Context ...s because of the efficiency of the Lars algorithm (which we use in simulations), that allows to find the entire regularization path for the Lasso at the (empirical) cost of a single matrix inversion (=-=Efron et al., 2004-=-). Thus the computational complexity of the Bolasso is O(m(p 3 + p 2 n)). The following proposition (proved in Appendix A) shows that the previous algorithm leads to consistent model selection. Propos... |

714 |
UCI Machine Learning Repository
- Asuncion, Newman
- 2007
(Show Context)
Citation Context ...this section, we illustrate the consistency results obtained in this paper with a few simple simulations on synthetic examples and some medium scale datasets from the UCI machine learning repository (=-=Asuncion & Newman, 2007-=-). 4.1. Synthetic examples For a given dimension p, we sampled X ∈ Rp from a normal distribution with zero mean and covariance matrix generated as follows: (a) sample a p×p matrix G with independent s... |

361 |
Regression shrinkage and selection via the
- Tibshirani
- 1996
(Show Context)
Citation Context ...s attracted a lot of interest in recent years in machine learning, statistics and signal processing. In the context of least-square linear regression, the problem is usually referred to as the Lasso (=-=Tibshirani, 1994-=-). Much of the early effort has been dedicated to algorithms to solve the optimization problem efficiently. In particular, the Lars algorithm of Efron et al. (2004) allows to find the entire regulariz... |

299 | Arcing classifiers
- Breiman
- 1998
(Show Context)
Citation Context ...e Lasso may be easily enhanced thanks to a simple parameter-free resampling procedure. Our contribution also suggests that the use of bootstrap samples by L. Breiman in Bagging/Arcing/Random Forests (=-=Breiman, 1998-=-) may have been so far slightly overlooked and considered a minor feature, while using boostrap samples may actually be a key computational feature in such algorithms for good model selection performa... |

289 |
The adaptive Lasso and its oracle properties
- Zou
- 2006
(Show Context)
Citation Context ...n Machine Learning, Helsinki, Finland, 2008. Copyright 2008 by the author(s)/owner(s). ing vectors with many zeros, and thus performs model selection. Recent works (Zhao & Yu, 2006; Yuan & Lin, 2007; =-=Zou, 2006-=-; Wainwright, 2006) have looked precisely at the model consistency of the Lasso, i.e., if we know that the data were generated from a sparse loading vector, does the Lasso actually recover the sparsit... |

168 |
Heuristics of instability and stabilization in model selection
- Breiman
- 1996
(Show Context)
Citation Context ...cations) variable selection error is computed as the square distance between sparsity pattern indicator vectors. research. Note in particular that we compare with bagging of least-square regressions (=-=Breiman, 1996-=-a) followed by a thresholding of the loading vector, which is another simple way of using bootstrap samples: the Bolasso provides a more efficient way to use the extra information, not for usual stabi... |

160 | Consistency of the group lasso and multiple kernel learning
- Bach
(Show Context)
Citation Context ...sible situations which explain various portions of the regularization path (we assume (A1-3)); many of these results appear elsewhere (Yuan & Lin, 2007; Zhao & Yu, 2006; Fu & Knight, 2000; Zou, 2006; =-=Bach, 2008-=-; Lounici, 2008) but some of the finer results presented below are new (see Section 2.4). 1 Throughout this paper, we use boldface fonts for population quantities.Bolasso: Model Consistent Lasso Esti... |

154 | Asymptotics for Lasso-type estimators
- Fu, Knight
- 2000
(Show Context)
Citation Context ...er five mutually exclusive possible situations which explain various portions of the regularization path (we assume (A1-3)); many of these results appear elsewhere (Yuan & Lin, 2007; Zhao & Yu, 2006; =-=Fu & Knight, 2000-=-; Zou, 2006; Bach, 2008; Lounici, 2008) but some of the finer results presented below are new (see Section 2.4). 1 Throughout this paper, we use boldface fonts for population quantities.Bolasso: Mode... |

126 | Lasso-type recovery of sparse representations for high-dimensional data
- Meinshausen, Yu
- 2006
(Show Context)
Citation Context ... The current work could be extended in various ways: first, we have focused on a fixed total number of variables, and allowing the numbers of variables to grow is important in theory and in practice (=-=Meinshausen & Yu, 2008-=-). Second, the same technique can be applied to similar settings than least-square regression with the ℓ1-norm, namely regularization by block ℓ1-norms (Bach, 2008) and other losses such as general co... |

76 |
On model selection consistency of
- Zhao, Yu
(Show Context)
Citation Context ...he 25 th International Conference on Machine Learning, Helsinki, Finland, 2008. Copyright 2008 by the author(s)/owner(s). ing vectors with many zeros, and thus performs model selection. Recent works (=-=Zhao & Yu, 2006-=-; Yuan & Lin, 2007; Zou, 2006; Wainwright, 2006) have looked precisely at the model consistency of the Lasso, i.e., if we know that the data were generated from a sparse loading vector, does the Lasso... |

60 |
Sharp thresholds for noisy and high-dimensional recovery of sparsity using l1-constrained quadratic programming
- Wainwright
- 2006
(Show Context)
Citation Context ...earning, Helsinki, Finland, 2008. Copyright 2008 by the author(s)/owner(s). ing vectors with many zeros, and thus performs model selection. Recent works (Zhao & Yu, 2006; Yuan & Lin, 2007; Zou, 2006; =-=Wainwright, 2006-=-) have looked precisely at the model consistency of the Lasso, i.e., if we know that the data were generated from a sparse loading vector, does the Lasso actually recover the sparsity pattern when the... |

46 | Boosting for high-dimensional linear models
- Bühlmann
- 2006
(Show Context)
Citation Context ...k ℓ1-norms (Bach, 2008) and other losses such as general convex classification losses. Finally, theoretical and practical connections could be made with other work on resampling methods and boosting (=-=Bühlmann, 2006-=-). A. Proof of Model Consistency Results In this appendix, we give sketches of proofs for the asymptotic results presented in Section 2 and Section 3. The proofs rely on the well-known property of the... |

42 | Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators
- Lounici
(Show Context)
Citation Context ...ions which explain various portions of the regularization path (we assume (A1-3)); many of these results appear elsewhere (Yuan & Lin, 2007; Zhao & Yu, 2006; Fu & Knight, 2000; Zou, 2006; Bach, 2008; =-=Lounici, 2008-=-) but some of the finer results presented below are new (see Section 2.4). 1 Throughout this paper, we use boldface fonts for population quantities.Bolasso: Model Consistent Lasso Estimation through ... |

38 | Concentration inequalities
- Boucheron, Lugosi, et al.
- 2004
(Show Context)
Citation Context ...ties. A.2. Concentration Inequalities Throughout the proofs, we need to provide upper bounds on the following quantities P(‖Q−1/2q‖2 > α) and P(‖Q − Q‖2 > η). We obtain, following standard arguments (=-=Boucheron et al., 2004-=-): if α < C9 and η < C10 (where C9,C10 > 0 are constants), P(‖Q−1/2 ( q‖2 > α) � 4pexp − nα2 ) . 2pC9 P(‖Q − Q‖2 > η) � 4p2 ) exp . ( − nη2 2p 2 C10 We also consider multivariate Berry-Esseen inequali... |

26 |
On the non-negative garrotte estimator
- Yuan, Lin
(Show Context)
Citation Context ...ional Conference on Machine Learning, Helsinki, Finland, 2008. Copyright 2008 by the author(s)/owner(s). ing vectors with many zeros, and thus performs model selection. Recent works (Zhao & Yu, 2006; =-=Yuan & Lin, 2007-=-; Zou, 2006; Wainwright, 2006) have looked precisely at the model consistency of the Lasso, i.e., if we know that the data were generated from a sparse loading vector, does the Lasso actually recover ... |

11 |
Dependence of the Berry–Esseen bound on the dimension
- Bentkus
- 1986
(Show Context)
Citation Context ...: if α < C9 and η < C10 (where C9, C10 > 0 are constants), P(‖Q−1/2 ( q‖2 > α) � 4p exp P(‖Q − Q‖2 > η) � 4p 2 exp − nα2 2pC9 ( − nη2 2p 2 C10 We also consider multivariate Berry-Esseen inequalities (=-=Bentkus, 2003-=-); the probability P(n 1/2 q ∈ C) can be estimated as P(t ∈ C) where t is normal with mean zero and covariance matrix σ 2 Q. The error is then uniformly (for all convex sets C) upperbounded by: ) . ) ... |

2 |
On the Dependence of the Berry- Esséen Bound on Dimension
- Bentkus
- 2003
(Show Context)
Citation Context ...α < C9 and η < C10 (where C9,C10 > 0 are constants), P(‖Q−1/2 ( q‖2 > α) � 4pexp − nα2 ) . 2pC9 P(‖Q − Q‖2 > η) � 4p2 ) exp . ( − nη2 2p 2 C10 We also consider multivariate Berry-Esseen inequalities (=-=Bentkus, 2003-=-); the probability P(n 1/2 q ∈ C) can be estimated as P(t ∈ C) where t is normal with mean zero and covariance matrix σ 2 Q. The error |P(n 1/2 q ∈ C) − P(t ∈ C)| is then uniformly (for all convex set... |