Results 1  10
of
111
Sparse Reconstruction by Separable Approximation
, 2008
"... Finding sparse approximate solutions to large underdetermined linear systems of equations is a common problem in signal/image processing and statistics. Basis pursuit, the least absolute shrinkage and selection operator (LASSO), waveletbased deconvolution and reconstruction, and compressed sensing ( ..."
Abstract

Cited by 358 (36 self)
 Add to MetaCart
(Show Context)
Finding sparse approximate solutions to large underdetermined linear systems of equations is a common problem in signal/image processing and statistics. Basis pursuit, the least absolute shrinkage and selection operator (LASSO), waveletbased deconvolution and reconstruction, and compressed sensing (CS) are a few wellknown areas in which problems of this type appear. One standard approach is to minimize an objective function that includes a quadratic (ℓ2) error term added to a sparsityinducing (usually ℓ1) regularization term. We present an algorithmic framework for the more general problem of minimizing the sum of a smooth convex function and a nonsmooth, possibly nonconvex regularizer. We propose iterative methods in which each step is obtained by solving an optimization subproblem involving a quadratic term with diagonal Hessian (which is therefore separable in the unknowns) plus the original sparsityinducing regularizer. Our approach is suitable for cases in which this subproblem can be solved much more rapidly than the original problem. In addition to solving the standard ℓ2 − ℓ1 case, our framework yields an efficient solution technique for other regularizers, such as an ℓ∞norm regularizer and groupseparable (GS) regularizers. It also generalizes immediately to the case in which the data is complex rather than real. Experiments with CS problems show that our approach is competitive with the fastest known methods for the standard ℓ2 − ℓ1 problem, as well as being efficient on problems with other separable regularization terms.
An interiorpoint method for largescale l1regularized logistic regression
 Journal of Machine Learning Research
, 2007
"... Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interiorpoint method for solving largescale ℓ1regularized logistic regression problems. Small problems with up to a thousand ..."
Abstract

Cited by 268 (8 self)
 Add to MetaCart
(Show Context)
Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interiorpoint method for solving largescale ℓ1regularized logistic regression problems. Small problems with up to a thousand or so features and examples can be solved in seconds on a PC; medium sized problems, with tens of thousands of features and examples, can be solved in tens of seconds (assuming some sparsity in the data). A variation on the basic method, that uses a preconditioned conjugate gradient method to compute the search step, can solve very large problems, with a million features and examples (e.g., the 20 Newsgroups data set), in a few minutes, on a PC. Using warmstart techniques, a good approximation of the entire regularization path can be computed much more efficiently than by solving a family of problems independently.
Group Lasso with Overlap and Graph Lasso
"... We propose a new penalty function which, when used as regularization for empirical risk minimization procedures, leads to sparse estimators. The support of the sparse vector is typically a union of potentially overlapping groups of covariates defined a priori, or a set of covariates which tend to be ..."
Abstract

Cited by 222 (19 self)
 Add to MetaCart
(Show Context)
We propose a new penalty function which, when used as regularization for empirical risk minimization procedures, leads to sparse estimators. The support of the sparse vector is typically a union of potentially overlapping groups of covariates defined a priori, or a set of covariates which tend to be connected to each other when a graph of covariates is given. We study theoretical properties of the estimator, and illustrate its behavior on simulated and breast cancer gene expression data. 1.
A unified framework for highdimensional analysis of Mestimators with decomposable regularizers
"... ..."
Structured variable selection with sparsityinducing norms
, 2011
"... We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsityinducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1norm and the group ℓ1norm by allowing the subsets to ov ..."
Abstract

Cited by 189 (29 self)
 Add to MetaCart
We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsityinducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1norm and the group ℓ1norm by allowing the subsets to overlap. This leads to a specific set of allowed nonzero patterns for the solutions of such problems. We first explore the relationship between the groups defining the norm and the resulting nonzero patterns, providing both forward and backward algorithms to go back and forth from groups to patterns. This allows the design of norms adapted to specific prior knowledge expressed in terms of nonzero patterns. We also present an efficient active set algorithm, and analyze the consistency of variable selection for leastsquares linear regression in low and highdimensional settings.
The composite absolute penalties family for grouped and hierarchical variable selection
 Ann. Statist
"... Extracting useful information from highdimensional data is an important focus of today’s statistical research and practice. Penalized loss function minimization has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and sparsity, the ..."
Abstract

Cited by 137 (3 self)
 Add to MetaCart
(Show Context)
Extracting useful information from highdimensional data is an important focus of today’s statistical research and practice. Penalized loss function minimization has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and sparsity, the L1penalized squared error minimization method Lasso has been popular in regression models and beyond. In this paper, we combine different norms including L1 to form an intelligent penalty in order to add side information to the fitting of a regression or classification model to obtain reasonable estimates. Specifically, we introduce the Composite Absolute Penalties (CAP) family, which allows given grouping and hierarchical relationships between the predictors to be expressed. CAP penalties are built by defining groups and combining the properties of norm penalties at the acrossgroup and withingroup levels. Grouped selection occurs for nonoverlapping groups. Hierarchical variable selection is reached
Efficient Online and Batch Learning using Forward Backward Splitting
"... We describe, analyze, and experiment with a framework for empirical loss minimization with regularization. Our algorithmic framework alternates between two phases. On each iteration we first perform an unconstrained gradient descent step. We then cast and solve an instantaneous optimization problem ..."
Abstract

Cited by 134 (1 self)
 Add to MetaCart
(Show Context)
We describe, analyze, and experiment with a framework for empirical loss minimization with regularization. Our algorithmic framework alternates between two phases. On each iteration we first perform an unconstrained gradient descent step. We then cast and solve an instantaneous optimization problem that trades off minimization of a regularization term while keeping close proximity to the result of the first phase. This view yields a simple yet effective algorithm that can be used for batch penalized risk minimization and online learning. Furthermore, the two phase approach enables sparse solutions when used in conjunction with regularization functions that promote sparsity, such as ℓ1. We derive concrete and very simple algorithms for minimization of loss functions with ℓ1, ℓ2, ℓ 2 2, and ℓ ∞ regularization. We also show how to construct efficient algorithms for mixednorm ℓ1/ℓq regularization. We further extend the algorithms and give efficient implementations for very highdimensional data with sparsity. We demonstrate the potential of the proposed framework in a series of experiments with synthetic and natural datasets.
TreeGuided Group Lasso for MultiTask Regression with Structured Sparsity
"... We consider the problem of learning a sparse multitask regression, where the structure in the outputs can be represented as a tree with leaf nodes as outputs and internal nodes as clusters of the outputs at multiple granularity. Our goal is to recover the common set of relevant inputs for each outp ..."
Abstract

Cited by 112 (13 self)
 Add to MetaCart
(Show Context)
We consider the problem of learning a sparse multitask regression, where the structure in the outputs can be represented as a tree with leaf nodes as outputs and internal nodes as clusters of the outputs at multiple granularity. Our goal is to recover the common set of relevant inputs for each output cluster. Assuming that the tree structure is available as prior knowledge, we formulate this problem as a new multitask regularized regression called treeguided group lasso. Our structured regularization is based on a grouplasso penalty, where groups are defined with respect to the tree structure. We describe a systematic weighting scheme for the groups in the penalty such that each output variable is penalized in a balanced manner even if the groups overlap. We present an efficient optimization method that can handle a largescale problem. Using simulated and yeast datasets, we demonstrate that our method shows a superior performance in terms of both prediction errors and recovery of true sparsity patterns compared to other methods for multitask learning. 1.
Exploring large feature spaces with hierarchical MKL
, 2008
"... For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or H ..."
Abstract

Cited by 110 (24 self)
 Add to MetaCart
(Show Context)
For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsityinducing norms such as the ℓ 1norm or the block ℓ 1norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsityinducing norms leads to stateoftheart predictive performance. 1
VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS
, 2008
"... Summary. We consider a nonparametric additive model of a conditional mean function in which the number of variables and additive components may be larger than the sample size but the number of nonzero additive components is “small” relative to the sample size. The statistical problem is to determin ..."
Abstract

Cited by 64 (1 self)
 Add to MetaCart
Summary. We consider a nonparametric additive model of a conditional mean function in which the number of variables and additive components may be larger than the sample size but the number of nonzero additive components is “small” relative to the sample size. The statistical problem is to determine which additive components are nonzero. The additive components are approximated by truncated series expansions with Bspline bases. With this approximation, the problem of component selection becomes that of selecting the groups of coefficients in the expansion. We apply the adaptive group Lasso to select nonzero components, using the group Lasso to obtain an initial estimator and reduce the dimension of the problem. We give conditions under which the group Lasso selects a model whose number of components is comparable with the underlying model and, the adaptive group Lasso selects the nonzero components correctly with probability approaching one as the sample size increases and achieves the optimal rate of convergence. Following model selection, oracleefficient, asymptotically normal estimators of the nonzero components can be obtained by using existing methods. The results of Monte Carlo experiments show that the adaptive group Lasso procedure works well with samples of moderate size. A data example is used to illustrate the application of the proposed method. Key words and phrases. Adaptive group Lasso; component selection; highdimensional data; nonparametric regression; selection consistency. Short title. Nonparametric component selection AMS 2000 subject classification. Primary 62G08, 62G20; secondary 62G99 1