Results 1  10
of
94
Asymptotic properties of bridge estimators in sparse highdimensional regression models
 Ann. Statist
, 2007
"... We study the asymptotic properties of bridge estimators in sparse, highdimensional, linear regression models when the number of covariates may increase to infinity with the sample size. We are particularly interested in the use of bridge estimators to distinguish between covariates whose coefficien ..."
Abstract

Cited by 96 (10 self)
 Add to MetaCart
(Show Context)
We study the asymptotic properties of bridge estimators in sparse, highdimensional, linear regression models when the number of covariates may increase to infinity with the sample size. We are particularly interested in the use of bridge estimators to distinguish between covariates whose coefficients are zero and covariates whose coefficients are nonzero. We show that under appropriate conditions, bridge estimators correctly select covariates with nonzero coefficients with probability converging to one and that the estimators of nonzero coefficients have the same asymptotic distribution that they would have if the zero coefficients were known in advance. Thus, bridge estimators have an oracle property in the sense of Fan and Li [J. Amer. Statist. Assoc. 96 (2001) 1348–1360] and Fan and Peng [Ann. Statist. 32 (2004) 928–961]. In general, the oracle property holds only if the number of covariates is smaller than the sample size. However, under a partial orthogonality condition in which the covariates of the zero coefficients are uncorrelated or weakly correlated with the covariates of nonzero coefficients, we show that marginal bridge estimators can correctly distinguish between covariates with nonzero and zero coefficients with probability converging to one even when the number of covariates is greater than the sample size.
Adaptive Lasso for sparse highdimensional regression
 University of Iowa
, 2006
"... Summary. We study the asymptotic properties of adaptive LASSO estimators in sparse, highdimensional, linear regression models when the number of covariates may increase with the sample size. We consider variable selection using the adaptive LASSO, where the L1 norms in the penalty are reweighted b ..."
Abstract

Cited by 93 (10 self)
 Add to MetaCart
(Show Context)
Summary. We study the asymptotic properties of adaptive LASSO estimators in sparse, highdimensional, linear regression models when the number of covariates may increase with the sample size. We consider variable selection using the adaptive LASSO, where the L1 norms in the penalty are reweighted by datadependent weights. We show that, if a reasonable initial estimator is available, then under appropriate conditions, adaptive LASSO correctly select covariates with nonzero coefficients with probability converging to one and that the estimators of nonzero coefficients have the same asymptotic distribution that they would have if the zero coefficients were known in advance. Thus, the adaptive LASSO has an oracle property in the sense of Fan and Li (2001) and Fan and Peng (2004). In addition, under a partial orthogonality condition in which the covariates with zero coefficients are weakly correlated with the covariates with nonzero coefficients, univariate regression can be used to obtain the initial estimator. With this initial estimator, adaptive LASSO has the oracle property even when the number of covariates is greater than the sample size. Key Words and phrases. Penalized regression, highdimensional data, variable selection, asymptotic normality, oracle property, zeroconsistency. Short title. Sparse highdimensional regression
Sensitivity of PCA for Traffic Anomaly Detection
, 2007
"... Detecting anomalous traffic is a crucial part of managing IP networks. In recent years, networkwide anomaly detection based on Principal Component Analysis (PCA) has emerged as a powerful method for detecting a wide variety of anomalies. We show that tuning PCA to operate effectively in practice is ..."
Abstract

Cited by 65 (3 self)
 Add to MetaCart
(Show Context)
Detecting anomalous traffic is a crucial part of managing IP networks. In recent years, networkwide anomaly detection based on Principal Component Analysis (PCA) has emerged as a powerful method for detecting a wide variety of anomalies. We show that tuning PCA to operate effectively in practice is difficult and requires more robust techniques than have been presented thus far. We analyze a week of networkwide traffic measurements from two IP backbones (Abilene and Geant) across three different traffic aggregations (ingress routers, OD flows, and input links), and conduct a detailed inspection of the feature time series for each suspected anomaly. Our study identifies and evaluates four main challenges of using PCA to detect traffic anomalies: (i) the false positive rate is very sensitive to small differences in the number of principal components in the normal subspace, (ii) the effectiveness of PCA is sensitive to the level of aggregation of the traffic measurements, (iii) a large anomaly may inadvertently pollute the normal subspace, (iv) correctly identifying which flow triggered the anomaly detector is an inherently challenging problem.
Forecasting economic time series using targeted predictors
 Journal of Econometrics
, 2008
"... This paper studies two refinements to the method of factor forecasting. First, we consider the method of quadratic principal components that allows the link function between the predictors and the factors to be nonlinear. Second, the factors used in the forecasting equation are estimated in a way t ..."
Abstract

Cited by 56 (1 self)
 Add to MetaCart
This paper studies two refinements to the method of factor forecasting. First, we consider the method of quadratic principal components that allows the link function between the predictors and the factors to be nonlinear. Second, the factors used in the forecasting equation are estimated in a way to take into account that the goal is to forecast a specific series. This is accomplished by applying the method of principal components to ‘targeted predictors ’ selected using hard and soft thresholding rules. Our three main findings can be summarized as follows. First, we find improvements at all forecast horizons over the current diffusion index forecasts by estimating the factors using fewer but informative predictors. Allowing for nonlinearity often leads to additional gains. Second, forecasting the volatile one month ahead inflation warrants a high degree of targeting to screen out the noisy predictors. A handful of variables, notably relating to housing starts and interest rates, are found to have systematic predictive power for inflation at all horizons. Third, the targeted predictors selected by both soft and hard thresholding changes with the forecast horizon and the sample period. Holding the set of predictors fixed as is the current practice of factor forecasting is unnecessarily restrictive.
Ultrahigh dimensional feature selection: beyond the linear model
, 2009
"... Variable selection in highdimensional space characterizes many contemporary problems in scientific discovery and decision making. Many frequentlyused techniques are based on independence screening; examples include correlation ranking (Fan and Lv, 2008) or feature selection using a twosample tte ..."
Abstract

Cited by 48 (8 self)
 Add to MetaCart
(Show Context)
Variable selection in highdimensional space characterizes many contemporary problems in scientific discovery and decision making. Many frequentlyused techniques are based on independence screening; examples include correlation ranking (Fan and Lv, 2008) or feature selection using a twosample ttest in highdimensional classification (Tibshirani et al., 2003). Within the context of the linear model, Fan and Lv (2008) showed that this simple correlation ranking possesses a sure independence screening property under certain conditions and that its revision, called iteratively sure independent screening (ISIS), is needed when the features are marginally unrelated but jointly related to the response variable. In this paper, we extend ISIS, without explicit definition of residuals, to a general pseudolikelihood framework, which includes generalized linear models as a special case. Even in the leastsquares setting, the new method improves ISIS by allowing feature deletion in the iterative process. Our technique allows us to select important features in highdimensional classification where the popularly used twosample tmethod fails. A new technique is introduced to reduce the false selection rate in the feature screening stage. Several simulated and two real data examples are presented to illustrate the methodology.
Sparse partial least squares regression for simultaneous dimension reduction and variable selection
 J.R. Statist. Soc.B
"... Summary. Analysis of modern biological data often involves illposed problems due to high dimensionality and multicollinearity. Partial Least Squares (pls) regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since 1960s. ..."
Abstract

Cited by 46 (0 self)
 Add to MetaCart
Summary. Analysis of modern biological data often involves illposed problems due to high dimensionality and multicollinearity. Partial Least Squares (pls) regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since 1960s. At the core of the pls methodology lies a dimension reduction technique coupled with a regression model. Although pls regression has been shown to achieve good predictive performance, it is not particularly tailored for variable/feature selection and therefore often produces linear combinations of the original predictors that are hard to interpret due to high dimensionality. In this paper, we investigate the known asymptotic properties of the pls estimator and show that its consistency property no longer holds with the very large p and small n paradigm. We, then, propose a sparse partial least squares (spls) formulation which aims to simultaneously achieve good predictive performance and variable selection by producing sparse linear combinations of the original predictors. We provide an efficient implementation of spls regression based on the lars algorithm and benchmark the proposed method by comparisons to well known variable selection and dimension reduction approaches via simulation experiments. An additional advantage of the spls regression is its ability to handle multivariate responses without much additional computational cost. We illustrate this in a joint analysis of gene expression and genomewide binding data. 1.
2012, ‘Augmented sparse principal component analysis for high dimensional data’. arXiv preprint arXiv:1202.1242
"... Principal components analysis (PCA) has been a widely used technique in reducing dimensionality of multivariate data. A traditional setting where PCA is applicable is when one has repeated observations from a multivariate population that can be described reasonably well by its first two moments. Wh ..."
Abstract

Cited by 36 (6 self)
 Add to MetaCart
(Show Context)
Principal components analysis (PCA) has been a widely used technique in reducing dimensionality of multivariate data. A traditional setting where PCA is applicable is when one has repeated observations from a multivariate population that can be described reasonably well by its first two moments. When the dimension of sample observations, is fixed, distributional
Supervised group Lasso with applications to microarray data analysis. BMC Bioinformatics 8:60
, 2007
"... Supervised group Lasso with applications to microarray data analysis ..."
Abstract

Cited by 33 (1 self)
 Add to MetaCart
(Show Context)
Supervised group Lasso with applications to microarray data analysis
Averaged gene expressions for regression
 Biostatistics
, 2007
"... Although averaging is a simple technique, it plays an important role in reducing variance. We use this essential property of averaging in regression of the DNA microarray data, which poses the challenge of having far more features than samples. In this paper, we introduce a twostep procedure that c ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
Although averaging is a simple technique, it plays an important role in reducing variance. We use this essential property of averaging in regression of the DNA microarray data, which poses the challenge of having far more features than samples. In this paper, we introduce a twostep procedure that combines (1) hierarchical clustering and (2) Lasso. By averaging the genes within the clusters obtained from hierarchical clustering, we define supergenes and use them to fit regression models, thereby attaining concise interpretation and accuracy. Our methods are supported with theoretical justifications and demonstrated on simulated and real data sets.
Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data
, 2009
"... In recent work, several authors have introduced methods for sparse canonical correlation analysis (sparse CCA). Suppose that two sets of measurements are available on the same set of observations. Sparse CCA is a method for identifying sparse linear combinations of the two sets of variables that are ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
In recent work, several authors have introduced methods for sparse canonical correlation analysis (sparse CCA). Suppose that two sets of measurements are available on the same set of observations. Sparse CCA is a method for identifying sparse linear combinations of the two sets of variables that are highly correlated with each other. It has been shown to be useful in the analysis of highdimensional genomic data, when two sets of assays are available on the same set of samples. In this paper, we propose two extensions to the sparse CCA methodology. (1) Sparse CCA is an unsupervised method; that is, it does not make use of outcome measurements that may be available for each observation (e.g., survival time or cancer subtype). We propose an extension to sparse CCA, which we call sparse supervised CCA, which results in the identification of linear combinations of the two sets of variables that are correlated with each other and associated with the outcome. (2) It is becoming increasingly common for researchers to collect data on more than two assays on the same set of samples; for instance, SNP, gene expression, and DNA copy number measurements may all be available. We develop sparse multiple CCA in order to extend the sparse CCA methodology to the case of more than two data sets. We demonstrate these new methods on