## Adaptive Lasso for sparse highdimensional regression (2006)

Venue: | University of Iowa |

Citations: | 38 - 4 self |

### BibTeX

@TECHREPORT{Huang06adaptivelasso,

author = {Jian Huang and Shuangge Ma and Cun-hui Zhang},

title = {Adaptive Lasso for sparse highdimensional regression},

institution = {University of Iowa},

year = {2006}

}

### Years of Citing Articles

### OpenURL

### Abstract

Summary. We study the asymptotic properties of adaptive LASSO estimators in sparse, high-dimensional, linear regression models when the number of covariates may increase with the sample size. We consider variable selection using the adaptive LASSO, where the L1 norms in the penalty are re-weighted by data-dependent weights. We show that, if a reasonable initial estimator is available, then under appropriate conditions, adaptive LASSO correctly select covariates with nonzero coefficients with probability converging to one and that the estimators of nonzero coefficients have the same asymptotic dis-tribution that they would have if the zero coefficients were known in advance. Thus, the adaptive LASSO has an oracle property in the sense of Fan and Li (2001) and Fan and Peng (2004). In addition, under a partial orthogonality condition in which the covariates with zero coefficients are weakly correlated with the covariates with nonzero coefficients, univariate regression can be used to obtain the initial estimator. With this initial estimator, adaptive LASSO has the oracle property even when the number of covariates is greater than the sample size. Key Words and phrases. Penalized regression, high-dimensional data, variable selection, asymptotic normality, oracle property, zero-consistency. Short title. Sparse high-dimensional regression

### Citations

1836 |
Robust statistics
- Huber
- 1981
(Show Context)
Citation Context ...ate regression coefficient �βj = � n i=1 xijYi � n i=1 x2 ij Let ξnj = E � βj. Since Eyi = x ′ (1)i β 10, we have We make the following assumptions: ξnj = n −1 = n −1 n� i=1 xijYi. n� xijx ′ (1)iβ10. =-=(13)-=- i=1 (B1) (a) εi, ε2, . . . are independent and identically distributed random variables with mean zero and variance σ 2 , where 0 < σ 2 < ∞; (b) For 1 ≤ d ≤ 2, the tail probabilities of εi satisfy P ... |

758 | Least angle regression
- Efron, Hastie, et al.
- 2004
(Show Context)
Citation Context ...oposition 1 can be proved following the proof of Proposition 1 of Zhao and Yu (2005). Let J0n = {j : β0j = 0} and J1n = {j : β0j �= 0}. Let bn1 = min{|β0j| : j ∈ J1n}, and bn2 = max{|β0j| : j ∈ J1n}. =-=(6)-=- Definition 1 We say that � β n is zero-consistent if (a) maxj∈J0n |� βnj| = op(1) and, (b) There exists a constant ξb > 0 such that, for any ε > 0, � P min | j∈J1n � � βnj| ≥ ξb bn1 > 1 − ε for all n... |

498 |
regression: Biased estimation for nonorthogonal problems
- Hoerl, Kennard, et al.
- 1970
(Show Context)
Citation Context ...1/d √ n bn1 √ n(log mn) 1/d λnrn (log n)(log kn) √ n bn1 → 0, and λnkn nb 2 n1 → 0, and k 2 n rnbn1 → 0, and λnkn nb 2 n1 √ n(log n)(log mn) → 0, and λnrn → 0, (8) k 2 n rnbn1 → 0. (9) → 0, (10) → 0. =-=(11)-=- (A4) There exist constants 0 < τ1 < τ2 < ∞ such that τ1 ≤ τ1n ≤ τ2n ≤ τ2 for all n; Condition (A1a) is standard in linear regression. Condition (A1b) allows a range of tail behaviors of the error ter... |

426 | The Dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics 35:2313–23516
- Candès, Tao
- 2007
(Show Context)
Citation Context ...iates of the ith observation, i = 1, . . . , n. We assume that the Yi’s are centered and the covariates are standardized, i.e., n� Yi = 0, i=1 n� i=1 xij = 0 and 1 n n� x 2 ij = 1, j = 1, . . . , pn. =-=(4)-=- We also write xi = (x ′ i1 , x′ i2 )′ where xi1 consists of the first kn covariates with nonzero coefficients, and xi2 consists of the remaining mn covariates with zero coefficients. Let Xn, X1n, and... |

384 | High-dimensional graphs and variable selection with the Lasso
- Meinshausen, Bühlmann
(Show Context)
Citation Context ...−1 kn� i=1 k=1 i=1 Let µ 0 n = maxj∈Jn0 |µ0 nj |. Under condition (B2), µ0 n = O(knn −1/2 ). For j ∈ Jn1, let µ 1 nj = E� βnj. Then µ 1 nj = ξnj = n −1 ξn n� xij(x ′ (1)iβ10 + εi). i=1 n� xijxikβ10k. =-=(19)-=- n� xijx ′ (1)iβ10. (20) i=1 Let µ 1 n = minj∈Jn1 |ξ1 nj |. By condition (B3), µ1 n > 2ξrbn1. We first show that We have � P rn max | j∈Jn0 � � βnj| > C � P rn max | j∈Jn0 � � βnj| > C → 0. (21) � = P... |

346 |
A comparison of normalization methods for high density oligonucleotide array data based on variance and bias
- Bolstad, Irizarry, et al.
- 2003
(Show Context)
Citation Context ...thod similar to the ridge regression but uses the L1 penalty � pn j=1 |βj| instead of the L2 penalty �pn j=1 β2 j . So the LASSO estimator is the value that minimizes n� (Yi − x ′ iβ) 2 pn� + λ |βj|, =-=(2)-=- i=1 where λ is the penalty parameter. An important feature of LASSO is that it can be used for variable selection. Compared to the classical variable selection methods such as subset selection, the L... |

343 | Variable selection via nonconcave penalized likelihood and its oracle properties
- Fan, Li
(Show Context)
Citation Context ...bn1 > 1 − ε for all n sufficiently large. In addition, � β n is zero-consistent with rate rn if (a) is strengthened to where rn → ∞. We assume the following conditions. rn max | j∈J0n � βnj| = Op(1), =-=(7)-=- (A1) (a) εi, ε2, . . . are independent and identically distributed random variables with mean zero and variance σ 2 , where 0 < σ 2 < ∞; (b) For 1 ≤ d ≤ 2, the tail probabilities of εi satisfy P (|εi... |

330 |
Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4:249
- Irizarry, Hobbs, et al.
(Show Context)
Citation Context ...s smaller models with better prediction performance. However, due to the very large number of covariates, the number of covariates identified by the adaptive Lasso is still larger than the true value =-=(15)-=-. When the partial orthogonality condition is not satisfied (Examples 7 and 8), the adaptive Lasso still yields smaller models with satisfactory prediction performance (comparable to Lasso). Extensive... |

329 |
Regression shrinkage and selection via the
- Tibshirani
- 1996
(Show Context)
Citation Context ... � � βnj| > C = O(n −1/2 rn(log n)(log mn)). Next, we show that P(minj∈Jn1 |� βnj| ≥ ξrbn1) → 1, or equivalently, We have � P min | j∈Jn1 � � βnj| < ξrbn1 � . (22) P( min | j∈Jn1 � βnj| < ξrbn1) → 0. =-=(23)-=- ⎛ = P ⎝ � ≤ � j∈Jn1 j∈Jn1 P � | � � βnj| < ξrbn1 ⎞ ⎠ � | � � βnj| < ξrbn1 22s= � j∈Jn1 ≤ � j∈Jn1 = � j∈Jn1 � P | � βnj − µ 1 nj + µ 1 � nj| < ξrbn1 � P n 1/2 |µ 1 nj| − n 1/2 | � βnj − µ 1 nj| < ξrn ... |

253 | The adaptive lasso and its oracle properties - Zou |

245 | Weak Convergence and Empirical Processes: With Application to Statistics - Vaart, Wellner - 1996 |

232 |
A Statistical View of Some Chemometrics Regression Tools (with discussion)," Technometries
- Friedman
- 1993
(Show Context)
Citation Context ...or d = 1, (log kn) 1/d √ n bn1 √ n(log mn) 1/d λnrn (log n)(log kn) √ n bn1 → 0, and λnkn nb 2 n1 → 0, and k 2 n rnbn1 → 0, and λnkn nb 2 n1 √ n(log n)(log mn) → 0, and λnrn → 0, (8) k 2 n rnbn1 → 0. =-=(9)-=- → 0, (10) → 0. (11) (A4) There exist constants 0 < τ1 < τ2 < ∞ such that τ1 ≤ τ1n ≤ τ2n ≤ τ2 for all n; Condition (A1a) is standard in linear regression. Condition (A1b) allows a range of tail behavi... |

222 | On model selection consistency of Lasso - Zhao, Yu |

138 | Asymptotics for lasso-type estimators
- Knight, Fu
- 2000
(Show Context)
Citation Context ... γjγ ′ jsn.sBy the Cauchy-Schwartz inequality, |ul| 2 ≤ τ −2 kn kn� (γ ′ jsn) 2 kn� j=1 j=1 (γjl) 2 ≤ τ −2 n1 kn� j=1 By the definition of sn, �sn� 2 = �wn1� 2 . On {|wn1| ≤ c1b −1 n1 } From (15) and =-=(16)-=-, we have Let c ′ 1 |ul| ≤ τ −1 �γj� 2 �sn� 2 = τ −2 n1 kn�sn� 2 . (15) �sn� 2 ≤ c1knb −2 n1 . (16) n1 knwn1 ≤ c1τ −1 1 = c1τ −1 1 , νn = 2 √ nbn1 − c ′ 1 n−1/2 λnknb −1 n1 and, By the definition of A... |

92 | Sure independence screening for ultrahigh dimensional feature space (with discussion
- Fan, Lv
- 2008
(Show Context)
Citation Context ...fied. Let s 2 n = σ 2 α ′ nΣ −1 n11 αn for any kn × 1 vector αn satisfying �αn�2 ≤ 1. If Mn1λn/n 1/2 → 0 n 1/2 s −1 n α ′ n( � βn1 − β0) = n −1/2 s −1 n n� i=1 εiα ′ nΣ −1 n11 x1i + op(1) →D N(0, 1), =-=(8)-=- where op(1) is a term that converges to zero in probability uniformly with respect to αn. This theorem can be proved by verifying the Lindeberg conditions the same way as in the proof of Theorem 2 of... |

79 | Nonconcave penalized likelihood with a diverging number of parameters
- Fan, Peng
- 2004
(Show Context)
Citation Context ... for 1 < d ≤ 2, (b) for d = 1, (log kn) 1/d √ n bn1 √ n(log mn) 1/d λnrn (log n)(log kn) √ n bn1 → 0, and λnkn nb 2 n1 → 0, and k 2 n rnbn1 → 0, and λnkn nb 2 n1 √ n(log n)(log mn) → 0, and λnrn → 0, =-=(8)-=- k 2 n rnbn1 → 0. (9) → 0, (10) → 0. (11) (A4) There exist constants 0 < τ1 < τ2 < ∞ such that τ1 ≤ τ1n ≤ τ2n ≤ τ2 for all n; Condition (A1a) is standard in linear regression. Condition (A1b) allows a... |

74 |
Persistency in high dimensional linear predictor-selection and the virtue of over-parametrization
- Greenshtein, Ritov
(Show Context)
Citation Context ... (log kn) 1/d √ n bn1 √ n(log mn) 1/d λnrn (log n)(log kn) √ n bn1 → 0, and λnkn nb 2 n1 → 0, and k 2 n rnbn1 → 0, and λnkn nb 2 n1 √ n(log n)(log mn) → 0, and λnrn → 0, (8) k 2 n rnbn1 → 0. (9) → 0, =-=(10)-=- → 0. (11) (A4) There exist constants 0 < τ1 < τ2 < ∞ such that τ1 ≤ τ1n ≤ τ2n ≤ τ2 for all n; Condition (A1a) is standard in linear regression. Condition (A1b) allows a range of tail behaviors of the... |

72 | Regularization and variable selection via the elastic - Zou, Hastie |

54 | Prediction by supervised principal components
- Bair, Hastie, et al.
- 2006
(Show Context)
Citation Context ...parse high-dimensional regression AMS 2000 subject classification. Primary 62J05, 62J07; secondary 62E20, 60F05 2s1 Introduction Consider a linear regression model Yi = x ′ iβ + εi, i = 1, . . . , n, =-=(1)-=- where β is a pn × 1 vector, ε1, . . . , εn are i.i.d. random variables with mean zero and finite variance σ 2 . We note that pn, the length of β, may depend on the sample size n. We assume that the r... |

38 | Asymptotic properties of bridge estimators in sparse high-dimensional regression models, Annals of Statistics
- Huang, Horowitz, et al.
- 2008
(Show Context)
Citation Context ...(A4) are satisfied. Let s 2 n = σ 2 α ′ nΣ −1 n11 αn for any kn × 1 vector αn satisfying �αn�2 ≤ 1. Then n 1/2 s −1 n α ′ n( � βn1 − β0) = n −1/2 s −1 n n� i=1 εiα ′ nΣ −1 n11 x1i + op(1) →D N(0, 1), =-=(12)-=- where op(1) is a term that converges to zero in probability uniformly with respect to αn. This theorem can be proved by verifying the Lindeberg conditions the same way as in the proof of Theorem 2 of... |

36 | Variable selection using MM algorithms
- Hunter, Li
(Show Context)
Citation Context ... We can write Then A c n = � ηn ≥ 2 √ n |βn1| − n −1/2 � λn|un| . P(A c n) = P(A c n ∩ {|wn1| ≤ c1b −1 n1 }) + P(Ac n ∩ {|wn1 > c1b −1 n1 }) ≤ P(A c n ∩ {|wn1| ≤ c1b −1 n1 }) + P(|wn1| > c1b −1 n1 ), =-=(14)-=- where {|wn1| ≤ c1b −1 n1 } = {|wn1j| ≤ c1b −1 n1 , 1 ≤ j ≤ kn}. By condition (A2), P(|wn1| > c1b −1 n1 ) → 0. So it suffices to show that the first term on the right-hand side of (14) converges to ze... |

24 | Gene expression analysis with the parametric bootstrap - Laan, Bryan |

20 |
Asymptotic Behavior of M Estimators of p Regression Parameters When p2
- Portnoy
- 1985
(Show Context)
Citation Context ... 0 n = maxj∈Jn0 |µ0 nj |. Under condition (B2), µ0 n = O(knn −1/2 ). For j ∈ Jn1, let µ 1 nj = E� βnj. Then µ 1 nj = ξnj = n −1 ξn n� xij(x ′ (1)iβ10 + εi). i=1 n� xijxikβ10k. (19) n� xijx ′ (1)iβ10. =-=(20)-=- i=1 Let µ 1 n = minj∈Jn1 |ξ1 nj |. By condition (B3), µ1 n > 2ξrbn1. We first show that We have � P rn max | j∈Jn0 � � βnj| > C � P rn max | j∈Jn0 � � βnj| > C → 0. (21) � = P rn max | j∈Jn0 � � βnj ... |

12 | Marginal asymptotics for the “large p, small n” paradigm: With applications to micorarray data
- Korosok, Ma
(Show Context)
Citation Context ...mma 1, for 1 < d ≤ 2, Write Under condition (A3a), we have For d = 1, −1 knb , l = 1, . . . , kn. Cn1 = {|ηj| ≥ νn, j = 1, . . . , kn} = { max |ηj| ≥ νn}. 1≤j≤kn n1 A c n ∩ {|wn1 ≤ c1b −1 n1 } ⊆ Cn1. =-=(17)-=- � � P(Cn1) = P max |ηj| ≥ νn 1≤j≤kn (log 2) 1/2K log(kn) = νn (log kn) 1/d √ n bn1 ≤ K′ (log kn) 1/2 . νn K ′ (log kn) 1/d √ nbn1[2 − (λnkn/nb 2 n1 )]. → 0, and λnkn nb 2 n1 (log kn) 1/d νn → 0. � � ... |

11 | Model-selection consistency of the Lasso in high-dimensional linear regression - Zhang, Huang - 2006 |

6 |
Homozygosity mapping with SNP arrays identifies TRIM32, an E3 ubiquitin ligase, as a Bardet-Biedl syndrome gene (BBS11
- Chiang, Beck, et al.
- 2006
(Show Context)
Citation Context ...β n) = sgn(β). 6 i=1sProposition 1 Let Wn1 = diag(wn1, . . . , wnkn ), Wn2 = diag(wn,kn+1, . . . , wnpn), and wn2 = (wn,kn+1, . . . , wnpn) ′ . Then where and Bn = An = P( � β n =s β 0) ≥ P(An ∩ Bn), =-=(5)-=- � 2n −1/2 |Σ −1 n11X′ 1εn| < 2 √ n |βn1| − n −1/2 λn|Σ −1 n11Wn1sgn(β � 10)| , � 2n −1/2 |X ′ n2(I − Hn)εn| ≤ n −1/2 λnwn2 − n −1/2 λn|Σn21Σ −1 n11Wn1sgn(β � 10)| , where the inequalities in An and B... |

2 |
Boosting for high-dimensional linear models
- Bülhman
- 2004
(Show Context)
Citation Context ...eights determined by an initial estimator (Zou, 2006). Suppose that an initial estimator � β n is available. Let Denote Ln(β) = wnj = | � βj| −1 , j = 1, . . . , pn. n� (Yi − xiβ) 2 pn� + λn wnj|βj|. =-=(3)-=- i=1 The value � β n that minimizes Ln is called the adaptive LASSO estimator (Zou 2006). If the initial estimator � β n is zero-consistent in the sense that estimators of zero coefficients converge t... |

2 | Regulation of gene expression in the mammalian eye and its relevance to eye disease - Scheetz, Kim, et al. - 2006 |