Results 1  10
of
81
The Dantzig Selector: Statistical Estimation When p Is Much Larger Than n
, 2007
"... In many important statistical applications, the number of variables or parameters p is much larger than the number of observations n. Suppose then that we have observations y = Xβ + z, where β ∈ Rp is a parameter vector of interest, X is a data matrix with possibly far fewer rows than columns, n ≪ p ..."
Abstract

Cited by 417 (12 self)
 Add to MetaCart
In many important statistical applications, the number of variables or parameters p is much larger than the number of observations n. Suppose then that we have observations y = Xβ + z, where β ∈ Rp is a parameter vector of interest, X is a data matrix with possibly far fewer rows than columns, n ≪ p, and the zi’s are i.i.d. N(0,σ2). Is it possible to estimate β reliably based on the noisy data y? To estimate β, we introduce a new estimator—we call it the Dantzig selector—which is a solution to the ℓ1regularization problem min ˜β∈R p ‖ ˜β‖ℓ1 subject to ‖X ∗ r‖ℓ ∞ ≤ (1 + t−1 √) 2logp · σ, where r is the residual vector y − X ˜β and t is a positive scalar. We show that if X obeys a uniform uncertainty principle (with unitnormed columns) and if the true parameter vector β is sufficiently sparse (which here roughly guarantees that the model is identifiable), then with very large probability,
Stability selection
"... Proofs subject to correction. Not to be reproduced without permission. Contributions to the discussion must not exceed 400 words. Contributions longer than 400 words will be cut by the editor. 1 2 ..."
Abstract

Cited by 60 (2 self)
 Add to MetaCart
Proofs subject to correction. Not to be reproduced without permission. Contributions to the discussion must not exceed 400 words. Contributions longer than 400 words will be cut by the editor. 1 2
Asymptotic properties of bridge estimators in sparse highdimensional regression models
 Ann. Statist
, 2007
"... We study the asymptotic properties of bridge estimators in sparse, highdimensional, linear regression models when the number of covariates may increase to infinity with the sample size. We are particularly interested in the use of bridge estimators to distinguish between covariates whose coefficien ..."
Abstract

Cited by 38 (9 self)
 Add to MetaCart
We study the asymptotic properties of bridge estimators in sparse, highdimensional, linear regression models when the number of covariates may increase to infinity with the sample size. We are particularly interested in the use of bridge estimators to distinguish between covariates whose coefficients are zero and covariates whose coefficients are nonzero. We show that under appropriate conditions, bridge estimators correctly select covariates with nonzero coefficients with probability converging to one and that the estimators of nonzero coefficients have the same asymptotic distribution that they would have if the zero coefficients were known in advance. Thus, bridge estimators have an oracle property in the sense of Fan and Li [J. Amer. Statist. Assoc. 96 (2001) 1348–1360] and Fan and Peng [Ann. Statist. 32 (2004) 928–961]. In general, the oracle property holds only if the number of covariates is smaller than the sample size. However, under a partial orthogonality condition in which the covariates of the zero coefficients are uncorrelated or weakly correlated with the covariates of nonzero coefficients, we show that marginal bridge estimators can correctly distinguish between covariates with nonzero and zero coefficients with probability converging to one even when the number of covariates is greater than the sample size.
Adaptive Lasso for sparse highdimensional regression
 University of Iowa
, 2006
"... Summary. We study the asymptotic properties of adaptive LASSO estimators in sparse, highdimensional, linear regression models when the number of covariates may increase with the sample size. We consider variable selection using the adaptive LASSO, where the L1 norms in the penalty are reweighted b ..."
Abstract

Cited by 36 (4 self)
 Add to MetaCart
Summary. We study the asymptotic properties of adaptive LASSO estimators in sparse, highdimensional, linear regression models when the number of covariates may increase with the sample size. We consider variable selection using the adaptive LASSO, where the L1 norms in the penalty are reweighted by datadependent weights. We show that, if a reasonable initial estimator is available, then under appropriate conditions, adaptive LASSO correctly select covariates with nonzero coefficients with probability converging to one and that the estimators of nonzero coefficients have the same asymptotic distribution that they would have if the zero coefficients were known in advance. Thus, the adaptive LASSO has an oracle property in the sense of Fan and Li (2001) and Fan and Peng (2004). In addition, under a partial orthogonality condition in which the covariates with zero coefficients are weakly correlated with the covariates with nonzero coefficients, univariate regression can be used to obtain the initial estimator. With this initial estimator, adaptive LASSO has the oracle property even when the number of covariates is greater than the sample size. Key Words and phrases. Penalized regression, highdimensional data, variable selection, asymptotic normality, oracle property, zeroconsistency. Short title. Sparse highdimensional regression
Highdimensional variable selection
 Ann. Statist
, 2009
"... This paper explores the following question: what kind of statistical guarantees can be given when doing variable selection in highdimensional models? In particular, we look at the error rates and power of some multistage regression methods. In the first stage we fit a set of candidate models. In th ..."
Abstract

Cited by 32 (3 self)
 Add to MetaCart
This paper explores the following question: what kind of statistical guarantees can be given when doing variable selection in highdimensional models? In particular, we look at the error rates and power of some multistage regression methods. In the first stage we fit a set of candidate models. In the second stage we select one model by crossvalidation. In the third stage we use hypothesis testing to eliminate some variables. We refer to the first two stages as “screening” and the last stage as “cleaning. ” We consider three screening methods: the lasso, marginal regression, and forward stepwise regression. Our method gives consistent variable selection under certain conditions. 1. Introduction. Several
Highdimensional additive modeling
 Annals of Statistics
"... We propose a new sparsitysmoothness penalty for highdimensional generalized additive models. The combination of sparsity and smoothness is crucial for mathematical theory as well as performance for finitesample data. We present a computationally efficient algorithm, with provable numerical conver ..."
Abstract

Cited by 31 (1 self)
 Add to MetaCart
We propose a new sparsitysmoothness penalty for highdimensional generalized additive models. The combination of sparsity and smoothness is crucial for mathematical theory as well as performance for finitesample data. We present a computationally efficient algorithm, with provable numerical convergence properties, for optimizing the penalized likelihood. Furthermore, we provide oracle results which yield asymptotic optimality of our estimator for highdimensional but sparse additive models. Finally, an adaptive version of our sparsitysmoothness penalized approach yields large additional performance gains. 1
High dimensional classification using features annealed independence rules
 Ann. Statist
, 2008
"... ABSTRACT. Classification using highdimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other highthroughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel an ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
ABSTRACT. Classification using highdimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other highthroughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel and Levina (2004) show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as bad as the random guessing due to noise accumulation in estimating population centroids in highdimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as bad as the random guessing. Thus, it is paramountly important to select a subset of important features for highdimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the twosample tstatistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.
A SELECTIVE OVERVIEW OF VARIABLE SELECTION IN HIGH DIMENSIONAL FEATURE SPACE
, 2010
"... High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded ..."
Abstract

Cited by 23 (4 self)
 Add to MetaCart
High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of nonconcave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultrahigh dimensional variable selection, with emphasis on independence screening and twoscale methods.
Ultrahigh dimensional feature selection: beyond the linear model
, 2009
"... Variable selection in highdimensional space characterizes many contemporary problems in scientific discovery and decision making. Many frequentlyused techniques are based on independence screening; examples include correlation ranking (Fan and Lv, 2008) or feature selection using a twosample tte ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
Variable selection in highdimensional space characterizes many contemporary problems in scientific discovery and decision making. Many frequentlyused techniques are based on independence screening; examples include correlation ranking (Fan and Lv, 2008) or feature selection using a twosample ttest in highdimensional classification (Tibshirani et al., 2003). Within the context of the linear model, Fan and Lv (2008) showed that this simple correlation ranking possesses a sure independence screening property under certain conditions and that its revision, called iteratively sure independent screening (ISIS), is needed when the features are marginally unrelated but jointly related to the response variable. In this paper, we extend ISIS, without explicit definition of residuals, to a general pseudolikelihood framework, which includes generalized linear models as a special case. Even in the leastsquares setting, the new method improves ISIS by allowing feature deletion in the iterative process. Our technique allows us to select important features in highdimensional classification where the popularly used twosample tmethod fails. A new technique is introduced to reduce the false selection rate in the feature screening stage. Several simulated and two real data examples are presented to illustrate the methodology.
Strong rules for discarding predictors in lassotype problems. Arxiv preprint arXiv:1011.2234
, 2010
"... Summary. We consider rules for discarding predictors in lasso regression and related problems, for computational efficiency. El Ghaoui et al. (2010) propose “SAFE ” rules, based on univariate inner products between each predictor and the outcome, that guarantee a coefficient will be zero in the solu ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Summary. We consider rules for discarding predictors in lasso regression and related problems, for computational efficiency. El Ghaoui et al. (2010) propose “SAFE ” rules, based on univariate inner products between each predictor and the outcome, that guarantee a coefficient will be zero in the solution vector. This provides a reduction in the number of variables that need to be entered into the optimization. In this paper, we propose strong rules that are very simple and yet screen out far more predictors than the SAFE rules. This great practical improvement comes at a price: the strong rules are not foolproof and can mistakenly discard active predictors, that is, ones that have nonzero coefficients in the solution. We therefore combine them with simple checks of the KarushKuhnTucker (KKT) conditions to ensure that the exact solution to the convex problem is delivered. Of course, any (approximate) screening method can be combined with the KKT conditions to ensure the exact solution; the strength of the strong rules lies in the fact that, in practice, they discard a very large number of the inactive predictors and almost never commit mistakes. We also derive conditions under which they are foolproof. Strong rules provide a substantial savings in computational time for a variety of statistical optimization problems. 1.