Results 1  10
of
695
Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
, 2010
"... ..."
(Show Context)
SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2007
"... We exhibit an approximate equivalence between the Lasso estimator and Dantzig selector. For both methods we derive parallel oracle inequalities for the prediction risk in the general nonparametric regression model, as well as bounds on the ℓp estimation loss for 1 ≤ p ≤ 2 in the linear model when th ..."
Abstract

Cited by 441 (8 self)
 Add to MetaCart
We exhibit an approximate equivalence between the Lasso estimator and Dantzig selector. For both methods we derive parallel oracle inequalities for the prediction risk in the general nonparametric regression model, as well as bounds on the ℓp estimation loss for 1 ≤ p ≤ 2 in the linear model when the number of variables can be much larger than the sample size.
Model selection through sparse maximum likelihood estimation
 Journal of Machine Learning Research
, 2008
"... We consider the problem of estimating the parameters of a Gaussian or binary distribution in such a way that the resulting undirected graphical model is sparse. Our approach is to solve a maximum likelihood problem with an added ℓ1norm penalty term. The problem as formulated is convex but the memor ..."
Abstract

Cited by 299 (2 self)
 Add to MetaCart
We consider the problem of estimating the parameters of a Gaussian or binary distribution in such a way that the resulting undirected graphical model is sparse. Our approach is to solve a maximum likelihood problem with an added ℓ1norm penalty term. The problem as formulated is convex but the memory requirements and complexity of existing interior point methods are prohibitive for problems with more than tens of nodes. We present two new algorithms for solving problems with at least a thousand nodes in the Gaussian case. Our first algorithm uses block coordinate descent, and can be interpreted as recursive ℓ1norm penalized regression. Our second algorithm, based on Nesterov’s first order method, yields a complexity estimate with a better dependence on problem size than existing interior point methods. Using a log determinant relaxation of the log partition function (Wainwright and Jordan, 2006), we show that these same algorithms can be used to solve an approximate sparse maximum likelihood problem for the binary case. We test our algorithms on synthetic data, as well as on gene expression and senate voting records data.
Sharp thresholds for highdimensional and noisy sparsity recovery using l1constrained quadratic programmming (Lasso)
, 2006
"... ..."
An interiorpoint method for largescale l1regularized logistic regression
 Journal of Machine Learning Research
, 2007
"... Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interiorpoint method for solving largescale ℓ1regularized logistic regression problems. Small problems with up to a thousand ..."
Abstract

Cited by 243 (8 self)
 Add to MetaCart
(Show Context)
Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interiorpoint method for solving largescale ℓ1regularized logistic regression problems. Small problems with up to a thousand or so features and examples can be solved in seconds on a PC; medium sized problems, with tens of thousands of features and examples, can be solved in tens of seconds (assuming some sparsity in the data). A variation on the basic method, that uses a preconditioned conjugate gradient method to compute the search step, can solve very large problems, with a million features and examples (e.g., the 20 Newsgroups data set), in a few minutes, on a PC. Using warmstart techniques, a good approximation of the entire regularization path can be computed much more efficiently than by solving a family of problems independently.
Sure independence screening for ultrahigh dimensional feature space
, 2006
"... Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, ..."
Abstract

Cited by 231 (26 self)
 Add to MetaCart
Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, Candes and Tao (2007) propose the Dantzig selector using L1 regularization and show that it achieves the ideal risk up to a logarithmic factor log p. Their innovative procedure and remarkable result are challenged when the dimensionality is ultra high as the factor log p can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method based on a correlation learning, called the Sure Independence Screening (SIS), to reduce dimensionality from high to a moderate scale that is below sample size. In a fairly general asymptotic framework, the SIS is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, an iterative SIS (ISIS) is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be ac
A Shrinkage Approach to LargeScale Covariance Matrix Estimation and Implications for Functional Genomics
, 2005
"... ..."
A unified framework for highdimensional analysis of Mestimators with decomposable regularizers
"... ..."
Lassotype recovery of sparse representations for highdimensional data
 ANNALS OF STATISTICS
, 2009
"... The Lasso is an attractive technique for regularization and variable selection for highdimensional data, where the number of predictor variables pn is potentially much larger than the number of samples n. However, it was recently discovered that the sparsity pattern of the Lasso estimator can only ..."
Abstract

Cited by 189 (13 self)
 Add to MetaCart
(Show Context)
The Lasso is an attractive technique for regularization and variable selection for highdimensional data, where the number of predictor variables pn is potentially much larger than the number of samples n. However, it was recently discovered that the sparsity pattern of the Lasso estimator can only be asymptotically identical to the true sparsity pattern if the design matrix satisfies the socalled irrepresentable condition. The latter condition can easily be violated in the presence of highly correlated variables. Here we examine the behavior of the Lasso estimators if the irrepresentable condition is relaxed. Even though the Lasso cannot recover the correct sparsity pattern, we show that the estimator is still consistent in the ℓ2norm sense for fixed designs under conditions on (a) the number sn of nonzero components of the vector βn and (b) the minimal singular values of design matrices that are induced by selecting small subsets of variables. Furthermore, a rate of convergence result is obtained on the ℓ2 error with an appropriate choice of the smoothing parameter. The rate is shown to be
Regularized estimation of large covariance matrices
 Ann. Statist
, 2008
"... This paper considers estimating a covariance matrix of p variables from n observations by either banding or tapering the sample covariance matrix, or estimating a banded version of the inverse of the covariance. We show that these estimates are consistent in the operator norm as long as (log p)/n → ..."
Abstract

Cited by 185 (14 self)
 Add to MetaCart
(Show Context)
This paper considers estimating a covariance matrix of p variables from n observations by either banding or tapering the sample covariance matrix, or estimating a banded version of the inverse of the covariance. We show that these estimates are consistent in the operator norm as long as (log p)/n → 0, and obtain explicit rates. The results are uniform over some fairly natural wellconditioned families of covariance matrices. We also introduce an analogue of the Gaussian white noise model and show that if the population covariance is embeddable in that model and wellconditioned, then the banded approximations produce consistent estimates of the eigenvalues and associated eigenvectors of the covariance matrix. The results can be extended to smooth versions of banding and to nonGaussian distributions with sufficiently short tails. A resampling approach is proposed for choosing the banding parameter in practice. This approach is illustrated numerically on both simulated and real data. 1. Introduction. Estimation