## Sure independence screening for ultra-high dimensional feature space (2006)

Citations: | 104 - 12 self |

### BibTeX

@TECHREPORT{Fan06sureindependence,

author = {Jianqing Fan and Jinchi Lv},

title = {Sure independence screening for ultra-high dimensional feature space},

institution = {},

year = {2006}

}

### OpenURL

### Abstract

Variable selection plays an important role in high dimensional statistical modeling which nowa-days appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, Candes and Tao (2007) propose the Dantzig selector using L1 regularization and show that it achieves the ideal risk up to a logarithmic factor log p. Their innovative procedure and remarkable result are challenged when the dimensionality is ultra high as the factor log p can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method based on a correlation learning, called the Sure Independence Screening (SIS), to reduce dimensionality from high to a moderate scale that is below sample size. In a fairly general asymptotic framework, the SIS is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, an iterative SIS (ISIS) is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be ac-

### Citations

2492 | A decision-theoretic generalization of on-line learning and an application to boosting - Freund, Schapire - 1995 |

1799 | Atomic Decomposition by Basis Pursuit - Chen, Donoho, et al. - 1999 |

1356 | Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring - Golub - 1999 |

923 | Ideal spatial adaptation by wavelet shrinkage - Donoho, Johnstone - 1994 |

647 |
Convergence of Stochastic Processes
- Pollard
- 1984
(Show Context)
Citation Context ...implies that the moment generating function of the random variable ξ1 − 1 is Ee t(ξ1−1) = (1 − 2t) −1/2 e −t , t ∈ (−∞,1/2) . Thus, for any ε > 0 and 0 < t < 1/2, by Chebyshev’s inequality (see, e.g. =-=Pollard, 1984-=- or van der Vaart and Wellner, 1996) we have ( ) ξ1 + · · · + ξn P > 1 + ε ≤ n 1 etnεE exp {t (ξ1 − 1) + · · · + t (ξn − 1)} = exp(−nfε(t)), 32where fε (t) = 1 2 log (1 − 2t)+(1 + ε) t. Setting the d... |

449 | The Dantzig selector: statistical estimation when p is much larger than n - Candes, Tao - 2007 |

413 |
Statistical significance for genomewide studies
- Storey, Tibshirani
- 2003
(Show Context)
Citation Context ...on with componentwise regression is using the two-sample t-test statistic to select features. This has been widely used in the significance analysis of gene selection in microarray data analysis (see =-=Storey and Tibshirani, 2003-=-; Fan and Ren, 2006), including the nearest shrunken centroids method of Tibshirani et al. (2002). In other words, the componentwise regression technique is an insightful and natural extension of a tw... |

406 | High-dimensional graphs and variable selection with the Lasso - Meinshausen, Bühlmann - 2006 |

405 | Diagnosis of multiple cancer types by shrunken centroids of gene expression - Tibshirani, Hastie, et al. - 2002 |

393 | Variable selection via nonconcave penalized likelihood and its oracle properties - Fan, Li - 2001 |

383 | Weak convergence and empirical processes - Vaart, Wellner - 1996 |

382 | Uncertainty principles and ideal atomic decomposition - Donoho, Huo |

368 | Toeplitz forms and their applications - Grenander, Szegö - 1958 |

363 | The concentration of measure phenomenon - Ledoux - 2001 |

361 |
Regression shrinkage and selection via the
- Tibshirani
- 1996
(Show Context)
Citation Context ...agram of the approach. When it is desired to reduce the model size further, we can further single out d ′ n variables with d ′ n < dn using the Dantzig selector along with hard thresholding or Lasso (=-=Tibshirani, 1996-=-) with a suitable choice of the penalty parameter. From there, one can apply a more refined but computationally more intensive method such as the SCAD or adaptive Lasso. See Figure 2. These two method... |

289 | The adaptive Lasso and its oracle properties - Zou - 2006 |

273 | A statistical view of some chemometrics regression tools (with discussion - Frank, Friedman - 1993 |

240 | On model selection consistency of Lasso - Zhao, Yu - 2007 |

216 | On the distribution of the largest eigenvalue in principal component analysis - JOHNSTONE - 2001 |

211 | Methodologies in spectral analysis of large dimensional random matrices: a review, Statistica Sinica 9 - Bai - 1999 |

189 | Simultaneous analysis of Lasso and Dantzig selector - Bickel, Ritov, et al. - 2007 |

180 | Pathwise coordinate optimization - Friedman, Hastie, et al. - 2007 |

168 |
Heuristics of instability and stabilization in model selection
- Breiman
- 1996
(Show Context)
Citation Context ...For example, we may want to keep certain important predictors in the model and choose not to penalize their coefficients. The regularization parameters λj can be chosen by cross-validation (see, e.g. =-=Breiman, 1996-=- and Tibshirani, 1996). A unified and efficient algorithm for optimizing penalized likelihood, called local quadratic approximation (LQA), was proposed in Fan and Li (2001) and well studied in Hunter ... |

154 | Asymptotics for Lasso-type estimators - Fu, Knight - 2000 |

152 | The Group Lasso for logistic regression - Meier, Geer, et al. - 2008 |

137 | Approaches for Bayesian Variable Selection
- George, McCulloch
- 1997
(Show Context)
Citation Context ...iable selection drastically. It also makes the model selection problem efficient and modular. SIS can be used in conjunction with any model selection techniques, including Bayesian methods (see, e.g. =-=George and McCulloch, 1997-=-). 1.5. Outline of the paper In Section 2, we present a simple and fast sure screening method, and study its accuracy. We discuss applications of SIS to classification problems in Section 3. In Sectio... |

115 | Better subset regression using the nonnegative garrote - Breiman - 1995 |

102 | Least angle regression (with discussion - Efron, Hastie, et al. - 2004 |

101 | High-dimensional data analysis: the curses and blessings of dimensionality. Neural Comput, Aide-Memoire of a Lecture at - Donoho - 2000 |

99 | A limit theorem for the norm of random matrices - Geman - 1980 |

95 | Regularized estimation of large covariance matrices - BICKEL, LEVINA - 2008 |

91 | An introduction to compressive sensing - Candès |

89 | On non-concave penalized likelihood with diverging number of parameters - FAN, PENG - 2004 |

83 | The sparsity and bias of the lasso selection in highdimensional linear regression - Zhang, Huang - 2008 |

80 | Sparse additive models - Ravikumar, Lafferty, et al. - 2009 |

77 |
Some theory for Fisher’s linear discriminant, ‘naive Bayes’, and some alternatives when there are many more variables than observations
- BICKEL, LEVINA
- 2004
(Show Context)
Citation Context ...permutation of the predictors. Therefore, the largest eigenvalue of Σn usually does not grow too fast with n. In addition, Condition 4 holds for the covariance matrix of a stationary time series (see =-=Bickel and Levina, 2004-=-, 2006). See Grenander and Szegö (1984) for more details about the characterization on extreme eigenvalues of the covariance matrix of a stationary process in terms of its spectral 9density. It is in... |

77 | Persistency in high dimensional linear predictor-selection and the virtue of over-parametrization - Greenshtein, Ritov - 2004 |

72 | Limit of the smallest eigenvalue of a large dimensional sample covariance matrix. The annals of Probability - Bai, Yin - 1993 |

70 | Multivariate Statistics: A Vector Space Approach - Eaton - 1983 |

62 | One-step sparse estimates in nonconcave penalized likelihood models - Zou, Li - 2008 |

60 | Local strong homogeneity of a regularized estimator - Nikolova - 2000 |

52 | 2002): Variable selection for Coxs proportional hazards model and frailty model - Fan, Li |

48 | Geometric Representation of High Dimension, Low Sample Size Data - Hall, Marron, et al. - 2005 |

47 | Regularization of Wavelet Approximations” (with discussion - Antoniadis, Fan - 2001 |

43 | 2007a), \Asymptotic properties of bridge estimators in sparse high-dimensional regression models," Annals of Statistics - Huang, Horowitz, et al. |

43 | Sparsistency and rates of convergence in large covariance matrix estimation - Lam, Fan |

42 | The smallest eigenvalue of a large dimensional Wishart matrix - Silverstein - 1985 |

41 | Statistical challenges with high dimensionality: Feature selection in knowledge discovery - FAN, LI - 2006 |

40 | Variable selection using mm algorithms - HUNTER, LI - 2005 |

32 | Maximal sparsity representation via L1 minimization - Donoho, Elad - 2003 |