Results 1 - 10
of
27
An interior-point method for large-scale l1-regularized logistic regression
- Journal of Machine Learning Research
, 2007
"... Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interior-point method for solving large-scale ℓ1-regularized logistic regression problems. Small problems with up to a thousand ..."
Abstract
-
Cited by 77 (3 self)
- Add to MetaCart
Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interior-point method for solving large-scale ℓ1-regularized logistic regression problems. Small problems with up to a thousand or so features and examples can be solved in seconds on a PC; medium sized problems, with tens of thousands of features and examples, can be solved in tens of seconds (assuming some sparsity in the data). A variation on the basic method, that uses a preconditioned conjugate gradient method to compute the search step, can solve very large problems, with a million features and examples (e.g., the 20 Newsgroups data set), in a few minutes, on a PC. Using warm-start techniques, a good approximation of the entire regularization path can be computed much more efficiently than by solving a family of problems independently.
The group Lasso for logistic regression
- Journal of the Royal Statistical Society, Series B
, 2008
"... Summary. The group lasso is an extension of the lasso to do variable selection on (predefined) groups of variables in linear regression models. The estimates have the attractive property of being invariant under groupwise orthogonal reparameterizations. We extend the group lasso to logistic regressi ..."
Abstract
-
Cited by 75 (4 self)
- Add to MetaCart
Summary. The group lasso is an extension of the lasso to do variable selection on (predefined) groups of variables in linear regression models. The estimates have the attractive property of being invariant under groupwise orthogonal reparameterizations. We extend the group lasso to logistic regression models and present an efficient algorithm, that is especially suitable for high dimensional problems, which can also be applied to generalized linear models to solve the corresponding convex optimization problem. The group lasso estimator for logistic regression is shown to be statistically consistent even if the number of predictors is much larger than sample size but with sparse true underlying structure. We further use a two-stage procedure which aims for sparser models than the group lasso, leading to improved prediction performance for some cases. Moreover, owing to the two-stage nature, the estimates can be constructed to be hierarchical. The methods are used on simulated and real data sets about splice site detection in DNA sequences.
Sparse Reconstruction by Separable Approximation
, 2008
"... Finding sparse approximate solutions to large underdetermined linear systems of equations is a common problem in signal/image processing and statistics. Basis pursuit, the least absolute shrinkage and selection operator (LASSO), waveletbased deconvolution and reconstruction, and compressed sensing ( ..."
Abstract
-
Cited by 73 (7 self)
- Add to MetaCart
Finding sparse approximate solutions to large underdetermined linear systems of equations is a common problem in signal/image processing and statistics. Basis pursuit, the least absolute shrinkage and selection operator (LASSO), waveletbased deconvolution and reconstruction, and compressed sensing (CS) are a few well-known areas in which problems of this type appear. One standard approach is to minimize an objective function that includes a quadratic (ℓ2) error term added to a sparsity-inducing (usually ℓ1) regularization term. We present an algorithmic framework for the more general problem of minimizing the sum of a smooth convex function and a nonsmooth, possibly nonconvex regularizer. We propose iterative methods in which each step is obtained by solving an optimization subproblem involving a quadratic term with diagonal Hessian (which is therefore separable in the unknowns) plus the original sparsity-inducing regularizer. Our approach is suitable for cases in which this subproblem can be solved much more rapidly than the original problem. In addition to solving the standard ℓ2 − ℓ1 case, our framework yields an efficient solution technique for other regularizers, such as an ℓ∞-norm regularizer and groupseparable (GS) regularizers. It also generalizes immediately to the case in which the data is complex rather than real. Experiments with CS problems show that our approach is competitive with the fastest known methods for the standard ℓ2 − ℓ1 problem, as well as being efficient on problems with other separable regularization terms.
Grouped and hierarchical model selection through composite absolute penalties
- Annals of Statistics
, 2006
"... Extracting useful information from high-dimensional data is an important part of the focus of today’s statistical research and practice. Penalized loss function minimiza-tion has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and ..."
Abstract
-
Cited by 61 (2 self)
- Add to MetaCart
Extracting useful information from high-dimensional data is an important part of the focus of today’s statistical research and practice. Penalized loss function minimiza-tion has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and sparsity, the L1-penalized L2 minimization method Lasso has been popular in regression models. In this paper, we combine different norms including L1 to form an intelligent penalty in order to add side information to the fitting of a regression or classification model to obtain reasonable estimates. Specifically, we introduce the Composite Absolute Penal-ties (CAP) family which allows the grouping and hierarchical relationships between the predictors to be expressed. CAP penalties are built by defining groups and com-bining the properties of norm penalties at the across group and within group levels. Grouped selection occurs for non-overlapping groups. In that case, we give a Bayesian 1 interpretation for CAP penalties. Hierarchical variable selection is reached by defining groups with particular overlapping patterns. In the computation aspect, we propose using the BLASSO and cross-validation to obtain CAP estimates. For a subfamily of CAP estimates involving only the L1 and L ∞ norms, we introduce the iCAP algorithm to trace the entire regularization path for the grouped selection problem. Within this subfamily, unbiased estimates of the degrees of freedom (df) are derived allowing the regularization parameter to be selected without cross-validation. CAP is shown to im-prove on the predictive performance of the LASSO in a series of simulated experiments including cases with p>> n and mis-specified groupings. When the complexity of a model is properly calculated, iCAP is seen to be parsimonious in the experiments. 1
The composite absolute penalties family for grouped and hierarchical variable selection
- Ann. Statist
"... Extracting useful information from high-dimensional data is an important focus of today’s statistical research and practice. Penalized loss function minimization has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and sparsity, the ..."
Abstract
-
Cited by 37 (1 self)
- Add to MetaCart
Extracting useful information from high-dimensional data is an important focus of today’s statistical research and practice. Penalized loss function minimization has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and sparsity, the L1-penalized squared error minimization method Lasso has been popular in regression models and beyond. In this paper, we combine different norms including L1 to form an intelligent penalty in order to add side information to the fitting of a regression or classification model to obtain reasonable estimates. Specifically, we introduce the Composite Absolute Penalties (CAP) family, which allows given grouping and hierarchical relationships between the predictors to be expressed. CAP penalties are built by defining groups and combining the properties of norm penalties at the across-group and within-group levels. Grouped selection occurs for nonoverlapping groups. Hierarchical variable selection is reached
Structure learning in random fields for heart motion abnormality detection
- In CVPR
, 2008
"... Coronary Heart Disease can be diagnosed by assessing the regional motion of the heart walls in ultrasound images of the left ventricle. Even for experts, ultrasound images are difficult to interpret leading to high intra-observer variability. Previous work indicates that in order to approach this pr ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
Coronary Heart Disease can be diagnosed by assessing the regional motion of the heart walls in ultrasound images of the left ventricle. Even for experts, ultrasound images are difficult to interpret leading to high intra-observer variability. Previous work indicates that in order to approach this problem, the interactions between the different heart regions and their overall influence on the clinical condition of the heart need to be considered. To do this, we propose a method for jointly learning the structure and parameters of conditional random fields, formulating these tasks as a convex optimization problem. We consider block-L1 regularization for each set of features associated with an edge, and formalize an efficient projection method to find the globally optimal penalized maximum likelihood solution. We perform extensive numerical experiments comparing the presented method with related methods that approach the structure learning problem differently. We verify the robustness of our method on echocardiograms collected in routine clinical practice at one hospital. 1.
The group-lasso for generalized linear models: uniqueness of solutions and efficient
, 2008
"... The Group-Lasso method for finding important explanatory factors suffers from the potential non-uniqueness of solutions and also from high computational costs. We formulate conditions for the uniqueness of Group-Lasso solutions which lead to an easily implementable test procedure that allows us to i ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
The Group-Lasso method for finding important explanatory factors suffers from the potential non-uniqueness of solutions and also from high computational costs. We formulate conditions for the uniqueness of Group-Lasso solutions which lead to an easily implementable test procedure that allows us to identify all potentially active groups. These results are used to derive an efficient algorithm that can deal with input dimensions in the millions and can approximate the solution path efficiently. The derived methods are applied to large-scale learning problems where they exhibit excellent performance and where the testing procedure helps to avoid misinterpretations of the solutions. 1.
On the Asymptotic Properties of The Group Lasso Estimator in Least Squares Problems
"... We derive conditions guaranteeing estimation and model selection consistency, oracle properties and persistence for the group-lasso estimator and model selector proposed by Yuan and Lin (2006) for least squares problems when the covariates have a natural grouping structure. We study both the case of ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
We derive conditions guaranteeing estimation and model selection consistency, oracle properties and persistence for the group-lasso estimator and model selector proposed by Yuan and Lin (2006) for least squares problems when the covariates have a natural grouping structure. We study both the case of a fixed-dimensional parameter space with increasing sample size and the case when the model complexity changes with the sample size. 1
Joint covariate selection for grouped classification
"... We address the problem of recovering a common set of covariates that are relevant simultaneously to several classification problems. We propose a joint measure of complexity for the group of problems that couples covariate selection. By penalizing the sum of ℓ2-norms of the blocks of coefficients as ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
We address the problem of recovering a common set of covariates that are relevant simultaneously to several classification problems. We propose a joint measure of complexity for the group of problems that couples covariate selection. By penalizing the sum of ℓ2-norms of the blocks of coefficients associated with each covariate across different classification problems, we encourage similar sparsity patterns in all models. To fit parameters under this regularization, we propose a blockwise boosting scheme that follows the regularization path. As the regularization coefficient decreases, the algorithm maintains and updates concurrently a growing set of covariates that are simultaneously active for all problems. We show empirically that this approach outperforms independent ℓ1-based covariate selection on several data sets, both in accuracy and number of selected covariates. 1
HIGH-DIMENSIONAL ISING MODEL SELECTION USING ℓ1-REGULARIZED LOGISTIC REGRESSION
- SUBMITTED TO THE ANNALS OF STATISTICS
"... We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on ℓ1-regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an ℓ1-constraint. The method is ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on ℓ1-regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an ℓ1-constraint. The method is analyzed under high-dimensional scaling, in which both the number of nodes p and maximum neighborhood size d are allowed to grow as a function of the number of observations n. Our main results provide sufficient conditions on the triple (n, p, d) and the model parameters for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously. With coherence conditions imposed on the population Fisher information matrix, we prove that consistent neighborhood selection can be obtained for sample sizes n = Ω(d 3 log p), with exponentially decaying error. When these same conditions are imposed directly on the sample matrices, we show that a reduced sample size of n = Ω(d 2 log p) suffices for the method to estimate neighborhoods consistently. Although this paper focuses on the binary graphical models, we indicate how a generalization of the method of the paper would apply to general discrete Markov random fields.

