Results 1  10
of
1,296
Regularization and variable selection via the Elastic Net
 Journal of the Royal Statistical Society, Series B
, 2005
"... Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where ..."
Abstract

Cited by 922 (13 self)
 Add to MetaCart
(Show Context)
Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together.The elastic net is particularly useful when the number of predictors (p) is much bigger than the number of observations (n). By contrast, the lasso is not a very satisfactory variable selection method in the p n case. An algorithm called LARSEN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lasso.
Regularization paths for generalized linear models via coordinate descent
, 2009
"... We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, twoclass logistic regression, and multinomial regression problems while the penalties include ℓ1 (the lasso), ℓ2 (ridge regression) and mixtures of the two (the elastic ..."
Abstract

Cited by 698 (14 self)
 Add to MetaCart
We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, twoclass logistic regression, and multinomial regression problems while the penalties include ℓ1 (the lasso), ℓ2 (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.
Guaranteed minimumrank solutions of linear matrix equations via nuclear norm minimization
, 2007
"... The affine rank minimization problem consists of finding a matrix of minimum rank that satisfies a given system of linear equality constraints. Such problems have appeared in the literature of a diverse set of fields including system identification and control, Euclidean embedding, and collaborative ..."
Abstract

Cited by 568 (23 self)
 Add to MetaCart
(Show Context)
The affine rank minimization problem consists of finding a matrix of minimum rank that satisfies a given system of linear equality constraints. Such problems have appeared in the literature of a diverse set of fields including system identification and control, Euclidean embedding, and collaborative filtering. Although specific instances can often be solved with specialized algorithms, the general affine rank minimization problem is NPhard, because it contains vector cardinality minimization as a special case. In this paper, we show that if a certain restricted isometry property holds for the linear transformation defining the constraints, the minimum rank solution can be recovered by solving a convex optimization problem, namely the minimization of the nuclear norm over the given affine space. We present several random ensembles of equations where the restricted isometry property holds with overwhelming probability, provided the codimension of the subspace is sufficiently large. The techniques used in our analysis have strong parallels in the compressed sensing framework. We discuss how affine rank minimization generalizes this preexisting concept and outline a dictionary relating concepts from cardinality minimization to those of rank minimization. We also discuss several algorithmic approaches to solving the norm minimization relaxations, and illustrate our results with numerical examples.
Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems
 IEEE Journal of Selected Topics in Signal Processing
, 2007
"... Abstract—Many problems in signal processing and statistical inference involve finding sparse solutions to underdetermined, or illconditioned, linear systems of equations. A standard approach consists in minimizing an objective function which includes a quadratic (squared ℓ2) error term combined wi ..."
Abstract

Cited by 524 (15 self)
 Add to MetaCart
(Show Context)
Abstract—Many problems in signal processing and statistical inference involve finding sparse solutions to underdetermined, or illconditioned, linear systems of equations. A standard approach consists in minimizing an objective function which includes a quadratic (squared ℓ2) error term combined with a sparsenessinducing (ℓ1) regularization term.Basis pursuit, the least absolute shrinkage and selection operator (LASSO), waveletbased deconvolution, and compressed sensing are a few wellknown examples of this approach. This paper proposes gradient projection (GP) algorithms for the boundconstrained quadratic programming (BCQP) formulation of these problems. We test variants of this approach that select the line search parameters in different ways, including techniques based on the BarzilaiBorwein method. Computational experiments show that these GP approaches perform well in a wide range of applications, often being significantly faster (in terms of computation time) than competing methods. Although the performance of GP methods tends to degrade as the regularization term is deemphasized, we show how they can be embedded in a continuation scheme to recover their efficient practical performance. A. Background I.
SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2007
"... We exhibit an approximate equivalence between the Lasso estimator and Dantzig selector. For both methods we derive parallel oracle inequalities for the prediction risk in the general nonparametric regression model, as well as bounds on the ℓp estimation loss for 1 ≤ p ≤ 2 in the linear model when th ..."
Abstract

Cited by 465 (8 self)
 Add to MetaCart
(Show Context)
We exhibit an approximate equivalence between the Lasso estimator and Dantzig selector. For both methods we derive parallel oracle inequalities for the prediction risk in the general nonparametric regression model, as well as bounds on the ℓp estimation loss for 1 ≤ p ≤ 2 in the linear model when the number of variables can be much larger than the sample size.
On Model Selection Consistency of Lasso
, 2006
"... Sparsity or parsimony of statistical models is crucial for their proper interpretations, as in sciences and social sciences. Model selection is a commonly used method to find such models, but usually involves a computationally heavy combinatorial search. Lasso (Tibshirani, 1996) is now being used ..."
Abstract

Cited by 462 (23 self)
 Add to MetaCart
Sparsity or parsimony of statistical models is crucial for their proper interpretations, as in sciences and social sciences. Model selection is a commonly used method to find such models, but usually involves a computationally heavy combinatorial search. Lasso (Tibshirani, 1996) is now being used as a computationally feasible alternative to model selection.
Efficient sparse coding algorithms
 In NIPS
, 2007
"... Sparse coding provides a class of algorithms for finding succinct representations of stimuli; given only unlabeled input data, it discovers basis functions that capture higherlevel features in the data. However, finding sparse codes remains a very difficult computational problem. In this paper, we ..."
Abstract

Cited by 440 (14 self)
 Add to MetaCart
(Show Context)
Sparse coding provides a class of algorithms for finding succinct representations of stimuli; given only unlabeled input data, it discovers basis functions that capture higherlevel features in the data. However, finding sparse codes remains a very difficult computational problem. In this paper, we present efficient sparse coding algorithms that are based on iteratively solving two convex optimization problems: an L1regularized least squares problem and an L2constrained least squares problem. We propose novel algorithms to solve both of these optimization problems. Our algorithms result in a significant speedup for sparse coding, allowing us to learn larger sparse codes than possible with previously described algorithms. We apply these algorithms to natural images and demonstrate that the inferred sparse codes exhibit endstopping and nonclassical receptive field surround suppression and, therefore, may provide a partial explanation for these two phenomena in V1 neurons. 1
Sparse Reconstruction by Separable Approximation
, 2008
"... Finding sparse approximate solutions to large underdetermined linear systems of equations is a common problem in signal/image processing and statistics. Basis pursuit, the least absolute shrinkage and selection operator (LASSO), waveletbased deconvolution and reconstruction, and compressed sensing ( ..."
Abstract

Cited by 373 (36 self)
 Add to MetaCart
(Show Context)
Finding sparse approximate solutions to large underdetermined linear systems of equations is a common problem in signal/image processing and statistics. Basis pursuit, the least absolute shrinkage and selection operator (LASSO), waveletbased deconvolution and reconstruction, and compressed sensing (CS) are a few wellknown areas in which problems of this type appear. One standard approach is to minimize an objective function that includes a quadratic (ℓ2) error term added to a sparsityinducing (usually ℓ1) regularization term. We present an algorithmic framework for the more general problem of minimizing the sum of a smooth convex function and a nonsmooth, possibly nonconvex regularizer. We propose iterative methods in which each step is obtained by solving an optimization subproblem involving a quadratic term with diagonal Hessian (which is therefore separable in the unknowns) plus the original sparsityinducing regularizer. Our approach is suitable for cases in which this subproblem can be solved much more rapidly than the original problem. In addition to solving the standard ℓ2 − ℓ1 case, our framework yields an efficient solution technique for other regularizers, such as an ℓ∞norm regularizer and groupseparable (GS) regularizers. It also generalizes immediately to the case in which the data is complex rather than real. Experiments with CS problems show that our approach is competitive with the fastest known methods for the standard ℓ2 − ℓ1 problem, as well as being efficient on problems with other separable regularization terms.
Probing the Pareto frontier for basis pursuit solutions
, 2008
"... The basis pursuit problem seeks a minimum onenorm solution of an underdetermined leastsquares problem. Basis pursuit denoise (BPDN) fits the leastsquares problem only approximately, and a single parameter determines a curve that traces the optimal tradeoff between the leastsquares fit and the ..."
Abstract

Cited by 363 (4 self)
 Add to MetaCart
The basis pursuit problem seeks a minimum onenorm solution of an underdetermined leastsquares problem. Basis pursuit denoise (BPDN) fits the leastsquares problem only approximately, and a single parameter determines a curve that traces the optimal tradeoff between the leastsquares fit and the onenorm of the solution. We prove that this curve is convex and continuously differentiable over all points of interest, and show that it gives an explicit relationship to two other optimization problems closely related to BPDN. We describe a rootfinding algorithm for finding arbitrary points on this curve; the algorithm is suitable for problems that are large scale and for those that are in the complex domain. At each iteration, a spectral gradientprojection method approximately minimizes a leastsquares problem with an explicit onenorm constraint. Only matrixvector operations are required. The primaldual solution of this problem gives function and derivative information needed for the rootfinding method. Numerical experiments on a comprehensive set of test problems demonstrate that the method scales well to large problems.