Results 1  10
of
284
Least angle regression
 Ann. Statist
"... The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to s ..."
Abstract

Cited by 759 (35 self)
 Add to MetaCart
The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising
High dimensional graphs and variable selection with the Lasso
 ANNALS OF STATISTICS
, 2006
"... The pattern of zero entries in the inverse covariance matrix of a multivariate normal distribution corresponds to conditional independence restrictions between variables. Covariance selection aims at estimating those structural zeros from data. We show that neighborhood selection with the Lasso is a ..."
Abstract

Cited by 399 (21 self)
 Add to MetaCart
The pattern of zero entries in the inverse covariance matrix of a multivariate normal distribution corresponds to conditional independence restrictions between variables. Covariance selection aims at estimating those structural zeros from data. We show that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse highdimensional graphs. Neighborhood selection estimates the conditional independence restrictions separately for each node in the graph and is hence equivalent to variable selection for Gaussian linear models. We show that the proposed neighborhood selection scheme is consistent for sparse highdimensional graphs. Consistency hinges on the choice of the penalty parameter. The oracle value for optimal prediction does not lead to a consistent neighborhood estimate. Controlling instead the probability of falsely joining some distinct connectivity components of the graph, consistent estimation for sparse graphs is achieved (with exponential rates), even when the number of variables grows as the number of observations raised to an arbitrary power.
Simultaneous analysis of Lasso and Dantzig selector
 ANNALS OF STATISTICS
, 2009
"... We show that, under a sparsity scenario, the Lasso estimator and the Dantzig selector exhibit similar behavior. For both methods, we derive, in parallel, oracle inequalities for the prediction risk in the general nonparametric regression model, as well as bounds on the ℓp estimation loss for 1 ≤ p ≤ ..."
Abstract

Cited by 189 (5 self)
 Add to MetaCart
We show that, under a sparsity scenario, the Lasso estimator and the Dantzig selector exhibit similar behavior. For both methods, we derive, in parallel, oracle inequalities for the prediction risk in the general nonparametric regression model, as well as bounds on the ℓp estimation loss for 1 ≤ p ≤ 2 in the linear model when the number of variables can be much larger than the sample size.
A direct formulation for sparse pca using semidefinite programming
 In NIPS 17
, 2004
"... Abstract. Given a covariance matrix, we consider the problem of maximizing the variance explained by a particular linear combination of the input variables while constraining the number of nonzero coefficients in this combination. This problem arises in the decomposition of a covariance matrix into ..."
Abstract

Cited by 167 (29 self)
 Add to MetaCart
Abstract. Given a covariance matrix, we consider the problem of maximizing the variance explained by a particular linear combination of the input variables while constraining the number of nonzero coefficients in this combination. This problem arises in the decomposition of a covariance matrix into sparse factors or sparse principal component analysis (PCA), and has wide applications ranging from biology to finance. We use a modification of the classical variational representation of the largest eigenvalue of a symmetric matrix, where cardinality is constrained, and derive a semidefinite programming–based relaxation for our problem. We also discuss Nesterov’s smooth minimization technique applied to the semidefinite program arising in the semidefinite relaxation of the sparse PCA problem. The method has complexity O(n 4 √ log(n)/ɛ), where n is the size of the underlying covariance matrix and ɛ is the desired absolute accuracy on the optimal value of the problem.
Pathwise coordinate optimization
, 2007
"... We consider “oneatatime ” coordinatewise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the L1penalized regression (lasso) in the lterature, but it seems to have been largely ignored. Indeed, it seems that coordinatewise algorith ..."
Abstract

Cited by 166 (19 self)
 Add to MetaCart
We consider “oneatatime ” coordinatewise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the L1penalized regression (lasso) in the lterature, but it seems to have been largely ignored. Indeed, it seems that coordinatewise algorithms are not often used in convex optimization. We show that this algorithm is very competitive with the well known LARS (or homotopy) procedure in large lasso problems, and that it can be applied to related methods such as the garotte and elastic net. It turns out that coordinatewise descent does not work in the “fused lasso ” however, so we derive a generalized algorithm that yields the solution in much less time that a standard convex optimizer. Finally we generalize the procedure to the twodimensional fused lasso, and demonstrate its performance on some image smoothing problems.
Probing the Pareto frontier for basis pursuit solutions
, 2008
"... The basis pursuit problem seeks a minimum onenorm solution of an underdetermined leastsquares problem. Basis pursuit denoise (BPDN) fits the leastsquares problem only approximately, and a single parameter determines a curve that traces the optimal tradeoff between the leastsquares fit and the ..."
Abstract

Cited by 157 (2 self)
 Add to MetaCart
The basis pursuit problem seeks a minimum onenorm solution of an underdetermined leastsquares problem. Basis pursuit denoise (BPDN) fits the leastsquares problem only approximately, and a single parameter determines a curve that traces the optimal tradeoff between the leastsquares fit and the onenorm of the solution. We prove that this curve is convex and continuously differentiable over all points of interest, and show that it gives an explicit relationship to two other optimization problems closely related to BPDN. We describe a rootfinding algorithm for finding arbitrary points on this curve; the algorithm is suitable for problems that are large scale and for those that are in the complex domain. At each iteration, a spectral gradientprojection method approximately minimizes a leastsquares problem with an explicit onenorm constraint. Only matrixvector operations are required. The primaldual solution of this problem gives function and derivative information needed for the rootfinding method. Numerical experiments on a comprehensive set of test problems demonstrate that the method scales well to large problems.
The Entire Regularization Path for the Support Vector Machine
, 2004
"... In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model. ..."
Abstract

Cited by 148 (9 self)
 Add to MetaCart
In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model.
Sparsity and smoothness via the fused lasso
 Journal of the Royal Statistical Society Series B
, 2005
"... The lasso (Tibshirani 1996) penalizes a least squares regression by the sum of the absolute values (L1 norm) of the coefficients. The form of this penalty encourages sparse solutions, that is, having many coefficients equal to zero. Here we propose the “fused lasso”, a generalization of the lasso de ..."
Abstract

Cited by 132 (11 self)
 Add to MetaCart
The lasso (Tibshirani 1996) penalizes a least squares regression by the sum of the absolute values (L1 norm) of the coefficients. The form of this penalty encourages sparse solutions, that is, having many coefficients equal to zero. Here we propose the “fused lasso”, a generalization of the lasso designed for problems with features that can be ordered in some meaningful way. The fused lasso penalizes both the L1 norm of the coefficients and their successive differences. Thus it encourages both sparsity
Lassotype recovery of sparse representations for highdimensional data
 ANNALS OF STATISTICS
, 2009
"... The Lasso is an attractive technique for regularization and variable selection for highdimensional data, where the number of predictor variables pn is potentially much larger than the number of samples n. However, it was recently discovered that the sparsity pattern of the Lasso estimator can only ..."
Abstract

Cited by 122 (9 self)
 Add to MetaCart
The Lasso is an attractive technique for regularization and variable selection for highdimensional data, where the number of predictor variables pn is potentially much larger than the number of samples n. However, it was recently discovered that the sparsity pattern of the Lasso estimator can only be asymptotically identical to the true sparsity pattern if the design matrix satisfies the socalled irrepresentable condition. The latter condition can easily be violated in the presence of highly correlated variables. Here we examine the behavior of the Lasso estimators if the irrepresentable condition is relaxed. Even though the Lasso cannot recover the correct sparsity pattern, we show that the estimator is still consistent in the ℓ2norm sense for fixed designs under conditions on (a) the number sn of nonzero components of the vector βn and (b) the minimal singular values of design matrices that are induced by selecting small subsets of variables. Furthermore, a rate of convergence result is obtained on the ℓ2 error with an appropriate choice of the smoothing parameter. The rate is shown to be
Sure independence screening for ultrahigh dimensional feature space
, 2006
"... Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, ..."
Abstract

Cited by 90 (12 self)
 Add to MetaCart
Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, Candes and Tao (2007) propose the Dantzig selector using L1 regularization and show that it achieves the ideal risk up to a logarithmic factor log p. Their innovative procedure and remarkable result are challenged when the dimensionality is ultra high as the factor log p can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method based on a correlation learning, called the Sure Independence Screening (SIS), to reduce dimensionality from high to a moderate scale that is below sample size. In a fairly general asymptotic framework, the SIS is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, an iterative SIS (ISIS) is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be ac