## High dimensional graphs and variable selection with the Lasso (2006)

Venue: | ANNALS OF STATISTICS |

Citations: | 407 - 21 self |

### BibTeX

@ARTICLE{Meinshausen06highdimensional,

author = {Nicolai Meinshausen and Peter Bühlmann},

title = {High dimensional graphs and variable selection with the Lasso},

journal = {ANNALS OF STATISTICS},

year = {2006},

volume = {34},

number = {3},

pages = {1436--1462}

}

### Years of Citing Articles

### OpenURL

### Abstract

The pattern of zero entries in the inverse covariance matrix of a multivariate normal distribution corresponds to conditional independence restrictions between variables. Covariance selection aims at estimating those structural zeros from data. We show that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse high-dimensional graphs. Neighborhood selection estimates the conditional independence restrictions separately for each node in the graph and is hence equivalent to variable selection for Gaussian linear models. We show that the proposed neighborhood selection scheme is consistent for sparse high-dimensional graphs. Consistency hinges on the choice of the penalty parameter. The oracle value for optimal prediction does not lead to a consistent neighborhood estimate. Controlling instead the probability of falsely joining some distinct connectivity components of the graph, consistent estimation for sparse graphs is achieved (with exponential rates), even when the number of variables grows as the number of observations raised to an arbitrary power.

### Citations

1777 | Atomic decomposition by basis pursuit - Chen, Donoho, et al. - 1998 |

1154 | Graphical Models
- Lauritzen
- 1996
(Show Context)
Citation Context ...} ={Xk; k ∈ Ɣ \{a,b}}. Everypairofvariables not contained in the edge set is conditionally independent, given all remaining variables, and corresponds to a zero entry in the inverse covariance matrix =-=[12]-=-. Covariance selection was introduced by Dempster [3] andaimsatdiscovering the conditional independence restrictions (the graph) from a set of i.i.d. observations. Covariance selection traditionally r... |

359 | Regression shrinkage and selection via the - Tibshirani - 1996 |

285 |
Weak Convergence and Empirical Processes. With Applications to Statistics
- Vaart, Wellner
- 1996
(Show Context)
Citation Context ...⊥ b 〉|, isstochasticallysmallerthan|2n−1 〈Va, Wb〉| (this can be derived by conditioning on {Xk; k ∈ nea}). Due to independence of Va and Wb, E(VaWb) = 0. Using Bernstein’s inequality (Lemma 2.2.11 in =-=[17]-=-), and λ ∼ dn −(1−ε)/2 with ε>0, there exists for every g>0somec>0sothat (A.14) P(|2n −1 〈Va, W ⊥ b 〉| ≥ gλ) ≤ P(|2n−1 〈Va, Wb〉| ≥ gλ) = O(exp(−cn ε )). Instead of (A.11), it is sufficient by (A.13)an... |

270 | A statistical view of some chemometrics regression tools - Frank, Friedman - 1993 |

201 | Covariance selection - Dempster - 1972 |

160 | Dependency networks for inference, collaborative filtering, and data visualization
- Heckerman, Chickering, et al.
- 2001
(Show Context)
Citation Context ...uced substantially at the price of potential inconsistencies between neighborhood estimates. Graph estimates that apply this strategy for complexity reduction are sometimes called dependency networks =-=[9]-=-. The complexity of the proposed neighborhood selection for one node with the Lasso is reduced further to O(npmin{n, p}), as the Lars procedure of Efron, Hastie, Johnstone and Tibshirani [6] requires ... |

155 | Asymptotics for lasso-type estimators
- Knight, Fu
- 2000
(Show Context)
Citation Context ...Neighborhood selection with the Lasso. It is well known that the Lasso, introduced by Tibshirani [16], and known as Basis Pursuit in the context of wavelet regression [2], has a parsimonious property =-=[11]-=-. When predicting a variable Xa with all remaining variables {Xk; k ∈ Ɣ(n) \{a}}, thevanishingLassocoefficient estimates identify asymptotically the neighborhood of node a in the graph, as shown in th... |

153 | On the LASSO and its dual - Osborne, Presnell, et al. - 2000 |

143 |
Linear model selection by crossvalidation
- Shao
- 1993
(Show Context)
Citation Context ...ediction-oracle penalty minimizes the predictive risk among all Lasso estimates. An estimate of λoracle is obtained by the cross-validated choice λcv. For l0-penalized regression it was shown by Shao =-=[14]-=- thatthecross-validated choice of the penalty parameter is consistent for model selection under certain conditions on the size of the validation set. The prediction-oracle solution does not lead to co... |

100 | Least angle regression (with discussion - Efron, Hastie, et al. - 2004 |

77 | Persistence in high-dimensional linear predictor selection and the virtue of overparameterization - Greenshtein, Ritov - 2004 |

72 | Gaussian Markov Distributions over Finite Graphs - SPEED, KIIVERI - 1986 |

39 | Model selection for Gaussian concentration graphs. Biometrika 91 - Drton, Perlman - 2004 |

32 | Nemirovski,A.--Functional Aggregation for Nonparametric Regression - Juditsky |

18 |
Introduction to graphical modelling (2nd ed
- Edwards
- 2000
(Show Context)
Citation Context ...satdiscovering the conditional independence restrictions (the graph) from a set of i.i.d. observations. Covariance selection traditionally relies on the discrete optimization of an objective function =-=[5, 12]-=-. Exhaustive search is computationally infeasible for all Received May 2004; revised August 2005. AMS 2000 subject classifications. Primary 62J07; secondary 62H20, 62F12. Key words and phrases. Linear... |

6 | On the existence of maximum likelihood estimators for graphical Gaussian models - Buhl - 1993 |

2 |
with discussion
- Hastie, R
- 1993
(Show Context)
Citation Context ... networks [9]. The complexity of the proposed neighborhood selection for one node with the Lasso is reduced further to O(npmin{n, p}), as the Lars procedure of Efron, Hastie, Johnstone and Tibshirani =-=[6]-=- requires O(min{n, p}) steps, each of complexity O(np).Forhigh-dimensionalproblems as in Theorems 1 and 2, wherethenumberofvariablesgrowsasp(n) ∼ cn γ for some c>0andγ>1, this is equivalent to O(p 2+2... |

1 |
J.(1993).Astatisticalviewofsomechemometricsregressiontools (with discussion
- FRANK
(Show Context)
Citation Context ...ut neighborhoods (in particular Theorems 1 and 2) holdforanysolutionof(3). Other regression estimates have been proposed, which are based on the lp-norm, where p is typically in the range [0, 2] (see =-=[7]-=-). A value of p = 2leadstothe ridge estimate, while p = 0correspondstotraditionalmodelselection.Itiswell known that the estimates have a parsimonious property (with some components being exactly zero)... |

1 |
Y.(2004).Persistenceinhigh-dimensionallinearpredictor selection and the virtue of over-parametrization
- GREENSHTEIN
(Show Context)
Citation Context ...servations (sparsity). Anumberofstudieshaveexaminedthecaseofregressionwithagrowingnumber of parameters as sample size increases. The closest to our setting is the recent work of Greenshtein and Ritov =-=[8]-=-, who study consistent prediction in a triangular setup very similar to ours (see also [10]). However, the problem of consistent estimation of the model structure, which is the relevant concept for gr... |