Results 1  10
of
150
Lassotype recovery of sparse representations for highdimensional data
 ANNALS OF STATISTICS
, 2009
"... The Lasso is an attractive technique for regularization and variable selection for highdimensional data, where the number of predictor variables pn is potentially much larger than the number of samples n. However, it was recently discovered that the sparsity pattern of the Lasso estimator can only ..."
Abstract

Cited by 122 (9 self)
 Add to MetaCart
The Lasso is an attractive technique for regularization and variable selection for highdimensional data, where the number of predictor variables pn is potentially much larger than the number of samples n. However, it was recently discovered that the sparsity pattern of the Lasso estimator can only be asymptotically identical to the true sparsity pattern if the design matrix satisfies the socalled irrepresentable condition. The latter condition can easily be violated in the presence of highly correlated variables. Here we examine the behavior of the Lasso estimators if the irrepresentable condition is relaxed. Even though the Lasso cannot recover the correct sparsity pattern, we show that the estimator is still consistent in the ℓ2norm sense for fixed designs under conditions on (a) the number sn of nonzero components of the vector βn and (b) the minimal singular values of design matrices that are induced by selecting small subsets of variables. Furthermore, a rate of convergence result is obtained on the ℓ2 error with an appropriate choice of the smoothing parameter. The rate is shown to be
Group Lasso with Overlap and Graph Lasso
"... We propose a new penalty function which, when used as regularization for empirical risk minimization procedures, leads to sparse estimators. The support of the sparse vector is typically a union of potentially overlapping groups of covariates defined a priori, or a set of covariates which tend to be ..."
Abstract

Cited by 112 (12 self)
 Add to MetaCart
We propose a new penalty function which, when used as regularization for empirical risk minimization procedures, leads to sparse estimators. The support of the sparse vector is typically a union of potentially overlapping groups of covariates defined a priori, or a set of covariates which tend to be connected to each other when a graph of covariates is given. We study theoretical properties of the estimator, and illustrate its behavior on simulated and breast cancer gene expression data. 1.
Spam: Sparse additive models
 In Advances in Neural Information Processing Systems 20
, 2007
"... We present a new class of models for highdimensional nonparametric regression and classification called sparse additive models (SpAM). Our methods combine ideas from sparse linear modeling and additive nonparametric regression. We derive a method for fitting the models that is effective even when t ..."
Abstract

Cited by 81 (17 self)
 Add to MetaCart
We present a new class of models for highdimensional nonparametric regression and classification called sparse additive models (SpAM). Our methods combine ideas from sparse linear modeling and additive nonparametric regression. We derive a method for fitting the models that is effective even when the number of covariates is larger than the sample size. A statistical analysis of the properties of SpAM is given together with empirical results on synthetic and real data, showing that SpAM can be effective in fitting sparse nonparametric models in high dimensional data. 1
The sparsity and bias of the lasso selection in highdimensional linear regression. Ann. Statist. Volume 36, Number 4, 15671594. Alexandre Belloni Duke University Fuqua
 School of Business 1 Towerview Drive Durham, NC 277080120 PO Box 90120 Email: abn5@duke.edu Victor Chernozhukov Massachusetts Institute of Technology Department of Economics and Operations research Center 50 Memorial Drive Room E52262f Cambridge, MA 02
, 2008
"... showed that, for neighborhood selection in Gaussian graphical models, under a neighborhood stability condition, the LASSO is consistent, even when the number of variables is of greater order than the sample size. Zhao and Yu [(2006) J. Machine Learning Research 7 2541–2567] formalized the neighborho ..."
Abstract

Cited by 80 (14 self)
 Add to MetaCart
showed that, for neighborhood selection in Gaussian graphical models, under a neighborhood stability condition, the LASSO is consistent, even when the number of variables is of greater order than the sample size. Zhao and Yu [(2006) J. Machine Learning Research 7 2541–2567] formalized the neighborhood stability condition in the context of linear regression as a strong irrepresentable condition. That paper showed that under this condition, the LASSO selects exactly the set of nonzero regression coefficients, provided that these coefficients are bounded away from zero at a certain rate. In this paper, the regression coefficients outside an ideal model are assumed to be small, but not necessarily zero. Under a sparse Riesz condition on the correlation of design variables, we prove that the LASSO selects a model of the correct order of dimensionality, controls the bias of the selected model at a level determined by the contributions of small regression coefficients and threshold bias, and selects all coefficients of greater order than the bias of the selected model. Moreover, as a consequence of this rate consistency of the LASSO in model selection, it is proved that the sum of error squares for the mean response and the ℓαloss for the regression coefficients converge at the best possible rates under the given conditions. An interesting aspect of our results is that the logarithm of the number of variables can be of the same order as the sample size for certain random dependent designs. 1. Introduction. Consider
Highdimensional graphical model selection using ℓ1regularized logistic regression
 Advances in Neural Information Processing Systems 19
, 2007
"... We consider the problem of estimating the graph structure associated with a discrete Markov random field. We describe a method based on ℓ1regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an ℓ1constraint. Our fram ..."
Abstract

Cited by 78 (6 self)
 Add to MetaCart
We consider the problem of estimating the graph structure associated with a discrete Markov random field. We describe a method based on ℓ1regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an ℓ1constraint. Our framework applies to the highdimensional setting, in which both the number of nodes p and maximum neighborhood sizes d are allowed to grow as a function of the number of observations n. Our main results provide sufficient conditions on the triple (n, p, d) for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously. Under certain assumptions on the population Fisher information matrix, we prove that consistent neighborhood selection can be obtained for sample sizes n = Ω(d 3 log p), with the error decaying as O(exp(−Cn/d 3)) for some constant C. If these same assumptions are imposed directly on the sample matrices, we show that n = Ω(d 2 log p) samples are sufficient.
A unified framework for highdimensional analysis of Mestimators with decomposable regularizers
"... ..."
Stability selection
"... Proofs subject to correction. Not to be reproduced without permission. Contributions to the discussion must not exceed 400 words. Contributions longer than 400 words will be cut by the editor. 1 2 ..."
Abstract

Cited by 60 (2 self)
 Add to MetaCart
Proofs subject to correction. Not to be reproduced without permission. Contributions to the discussion must not exceed 400 words. Contributions longer than 400 words will be cut by the editor. 1 2
An interiorpoint method for largescale ℓ1 regularized logistic regression
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2007
"... Recently, a lot of attention has been paid to ℓ1regularization based methods for sparse signal reconstruction (e.g., basis pursuit denoising and compressed sensing) and feature selection (e.g., the Lasso algorithm) in signal processing, statistics, and related fields. These problems can be cast as ..."
Abstract

Cited by 57 (4 self)
 Add to MetaCart
Recently, a lot of attention has been paid to ℓ1regularization based methods for sparse signal reconstruction (e.g., basis pursuit denoising and compressed sensing) and feature selection (e.g., the Lasso algorithm) in signal processing, statistics, and related fields. These problems can be cast as ℓ1regularized leastsquares programs (LSPs), which can be reformulated as convex quadratic programs, and then solved by several standard methods such as interiorpoint methods, at least for small and medium size problems. In this paper, we describe a specialized interiorpoint method for solving largescale ℓ1regularized LSPs that uses the preconditioned conjugate gradients algorithm to compute the search direction. The interiorpoint method can solve large sparse problems, with a million variables and observations, in a few tens of minutes on a PC. It can efficiently solve large dense problems, that arise in sparse signal recovery with orthogonal transforms, by exploiting fast algorithms for these transforms. The method is illustrated on a magnetic resonance imaging data set.
Informationtheoretic limits on sparsity recovery in the highdimensional and noisy setting
, 2007
"... Abstract—The problem of sparsity pattern or support set recovery refers to estimating the set of nonzero coefficients of an un3 p known vector 2 based on a set of n noisy observations. It arises in a variety of settings, including subset selection in regression, graphical model selection, signal de ..."
Abstract

Cited by 51 (2 self)
 Add to MetaCart
Abstract—The problem of sparsity pattern or support set recovery refers to estimating the set of nonzero coefficients of an un3 p known vector 2 based on a set of n noisy observations. It arises in a variety of settings, including subset selection in regression, graphical model selection, signal denoising, compressive sensing, and constructive approximation. The sample complexity of a given method for subset recovery refers to the scaling of the required sample size n as a function of the signal dimension p, sparsity index k (number of nonzeroes in 3), as well as the minimum value min of 3 over its support and other parameters of measurement matrix. This paper studies the informationtheoretic limits of sparsity recovery: in particular, for a noisy linear observation model based on random measurement matrices drawn from general Gaussian measurement matrices, we derive both a set of sufficient conditions for exact support recovery using an exhaustive search decoder, as well as a set of necessary conditions that any decoder, regardless of its computational complexity, must satisfy for exact support recovery. This analysis of fundamental limits complements our previous work on sharp thresholds for support set recovery over the same set of random measurement ensembles using the polynomialtime Lasso method (`1constrained quadratic programming). Index Terms—Compressed sensing, `1relaxation, Fano’s method, highdimensional statistical inference, informationtheoretic
Necessary and sufficient conditions on sparsity pattern recovery
, 2009
"... The paper considers the problem of detecting the sparsity pattern of a ksparse vector in R n from m random noisy measurements. A new necessary condition on the number of measurements for asymptotically reliable detection with maximum likelihood (ML) estimation and Gaussian measurement matrices is ..."
Abstract

Cited by 46 (8 self)
 Add to MetaCart
The paper considers the problem of detecting the sparsity pattern of a ksparse vector in R n from m random noisy measurements. A new necessary condition on the number of measurements for asymptotically reliable detection with maximum likelihood (ML) estimation and Gaussian measurement matrices is derived. This necessary condition for ML detection is compared against a sufficient condition for simple maximum correlation (MC) or thresholding algorithms. The analysis shows that the gap between thresholding and ML can be described by a simple expression in terms of the total signaltonoise ratio (SNR), with the gap growing with increasing SNR. Thresholding is also compared against the more sophisticated lasso and orthogonal matching pursuit (OMP) methods. At high SNRs, it is shown that the gap between lasso and OMP over thresholding is described by the range of powers of the nonzero component values of the unknown signals. Specifically, the key benefit of lasso and OMP over thresholding is the ability of lasso and OMP to detect signals with relatively small components.