Results 1 - 10
of
43
Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems
- IEEE Journal of Selected Topics in Signal Processing
, 2007
"... Abstract—Many problems in signal processing and statistical inference involve finding sparse solutions to under-determined, or ill-conditioned, linear systems of equations. A standard approach consists in minimizing an objective function which includes a quadratic (squared ℓ2) error term combined wi ..."
Abstract
-
Cited by 180 (7 self)
- Add to MetaCart
Abstract—Many problems in signal processing and statistical inference involve finding sparse solutions to under-determined, or ill-conditioned, linear systems of equations. A standard approach consists in minimizing an objective function which includes a quadratic (squared ℓ2) error term combined with a sparseness-inducing (ℓ1) regularization term.Basis pursuit, the least absolute shrinkage and selection operator (LASSO), waveletbased deconvolution, and compressed sensing are a few wellknown examples of this approach. This paper proposes gradient projection (GP) algorithms for the bound-constrained quadratic programming (BCQP) formulation of these problems. We test variants of this approach that select the line search parameters in different ways, including techniques based on the Barzilai-Borwein method. Computational experiments show that these GP approaches perform well in a wide range of applications, often being significantly faster (in terms of computation time) than competing methods. Although the performance of GP methods tends to degrade as the regularization term is de-emphasized, we show how they can be embedded in a continuation scheme to recover their efficient practical performance. A. Background I.
Sparse Reconstruction by Separable Approximation
, 2008
"... Finding sparse approximate solutions to large underdetermined linear systems of equations is a common problem in signal/image processing and statistics. Basis pursuit, the least absolute shrinkage and selection operator (LASSO), waveletbased deconvolution and reconstruction, and compressed sensing ( ..."
Abstract
-
Cited by 73 (7 self)
- Add to MetaCart
Finding sparse approximate solutions to large underdetermined linear systems of equations is a common problem in signal/image processing and statistics. Basis pursuit, the least absolute shrinkage and selection operator (LASSO), waveletbased deconvolution and reconstruction, and compressed sensing (CS) are a few well-known areas in which problems of this type appear. One standard approach is to minimize an objective function that includes a quadratic (ℓ2) error term added to a sparsity-inducing (usually ℓ1) regularization term. We present an algorithmic framework for the more general problem of minimizing the sum of a smooth convex function and a nonsmooth, possibly nonconvex regularizer. We propose iterative methods in which each step is obtained by solving an optimization subproblem involving a quadratic term with diagonal Hessian (which is therefore separable in the unknowns) plus the original sparsity-inducing regularizer. Our approach is suitable for cases in which this subproblem can be solved much more rapidly than the original problem. In addition to solving the standard ℓ2 − ℓ1 case, our framework yields an efficient solution technique for other regularizers, such as an ℓ∞-norm regularizer and groupseparable (GS) regularizers. It also generalizes immediately to the case in which the data is complex rather than real. Experiments with CS problems show that our approach is competitive with the fastest known methods for the standard ℓ2 − ℓ1 problem, as well as being efficient on problems with other separable regularization terms.
Online learning for matrix factorization and sparse coding
"... Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the large-scale matrix factorization problem that consists of learning the basis set, adapting it t ..."
Abstract
-
Cited by 35 (10 self)
- Add to MetaCart
Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the large-scale matrix factorization problem that consists of learning the basis set, adapting it to specific data. Variations of this problem include dictionary learning in signal processing, non-negative matrix factorization and sparse principal component analysis. In this paper, we propose to address these tasks with a new online optimization algorithm, based on stochastic approximations, which scales up gracefully to large datasets with millions of training samples, and extends naturally to various matrix factorization formulations, making it suitable for a wide range of learning problems. A proof of convergence is presented, along with experiments with natural images and genomic data demonstrating that it leads to state-of-the-art performance in terms of speed and optimization for both small and large datasets.
Structure learning in random fields for heart motion abnormality detection
- In CVPR
, 2008
"... Coronary Heart Disease can be diagnosed by assessing the regional motion of the heart walls in ultrasound images of the left ventricle. Even for experts, ultrasound images are difficult to interpret leading to high intra-observer variability. Previous work indicates that in order to approach this pr ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
Coronary Heart Disease can be diagnosed by assessing the regional motion of the heart walls in ultrasound images of the left ventricle. Even for experts, ultrasound images are difficult to interpret leading to high intra-observer variability. Previous work indicates that in order to approach this problem, the interactions between the different heart regions and their overall influence on the clinical condition of the heart need to be considered. To do this, we propose a method for jointly learning the structure and parameters of conditional random fields, formulating these tasks as a convex optimization problem. We consider block-L1 regularization for each set of features associated with an edge, and formalize an efficient projection method to find the globally optimal penalized maximum likelihood solution. We perform extensive numerical experiments comparing the presented method with related methods that approach the structure learning problem differently. We verify the robustness of our method on echocardiograms collected in routine clinical practice at one hospital. 1.
Blockwise Coordinate Descent Procedures for the Multi-task Lasso, with Applications to Neural Semantic Basis Discovery
"... We develop a cyclical blockwise coordinate descent algorithm for the multi-task Lasso that efficiently solves problems with thousands of features and tasks. The main result shows that a closed-form Winsorization operator can be obtained for the sup-norm penalized least squares regression. This allow ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
We develop a cyclical blockwise coordinate descent algorithm for the multi-task Lasso that efficiently solves problems with thousands of features and tasks. The main result shows that a closed-form Winsorization operator can be obtained for the sup-norm penalized least squares regression. This allows the algorithm to find solutions to very largescale problems far more efficiently than existing methods. This result complements the pioneering work of Friedman, et al. (2007) for the single-task Lasso. As a case study, we use the multi-task Lasso as a variable selector to discover a semantic basis for predicting human neural activation. The learned solution outperforms the standard basis for this task on the majority of test participants, while requiring far fewer assumptions about cognitive neuroscience. We demonstrate how this learned basis can yield insights into how the brain represents the meanings of words. 1.
Simultaneous support recovery in high-dimensional regression: Benefits and perils of ℓ1,∞-regularization
, 2009
"... Abstract—Given a collection of 2 linear regression problems in dimensions, suppose that the regression coefficients share partially common supports of size at most. This set-up suggests the use of 1-regularized regression for joint estimation of the matrix of regression coefficients. We analyze the ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Abstract—Given a collection of 2 linear regression problems in dimensions, suppose that the regression coefficients share partially common supports of size at most. This set-up suggests the use of 1-regularized regression for joint estimation of the matrix of regression coefficients. We analyze the high-dimensional scaling of 1-regularized quadratic programming, considering both consistency rates in-norm, and how the minimal sample size required for consistent variable selection scales with model dimension, sparsity, and overlap between the supports. We first establish bounds on the-error as well sufficient conditions for exact variable selection for fixed design matrices, as well as for designs drawn randomly from general Gaussian distributions. Specializing to the case =2linear regression problems with standard Gaussian designs whose supports overlap in a fraction [0 1] of their entries, we prove that 1-regularized method undergoes a phase transition characterized by the rescaled sample size 1 () = (4 3) log ( (2) ).An implication is that the use of 1-regularization yields improved statistical efficiency if the overlap parameter is large enough ( 2 3), but has worse statistical efficiency than a naive Lasso-based approach for moderate to small overlap ( 2 3). Empirical simulations illustrate the close agreement between theory and actual behavior in practice. These results show that caution must be exercised in applying 1 block regularization: if the data does not match its structure very closely, it can impair statistical performance relative to computationally less expensive schemes. Index Terms — 1-constraints, compressed sensing, convex relaxation, group Lasso, high-dimensional inference, model selection, phase transitions, sparse approximation, subset selection.
On the ℓ1-ℓq Regularized Regression
, 2008
"... In this paper we consider the problem of grouped variable selection in high-dimensional regression using ℓ1-ℓq regularization (1 ≤ q ≤ ∞), which can be viewed as a natural generalization of the ℓ1-ℓ2 regularization (the group Lasso). The key condition is that the dimensionality pn can increase much ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
In this paper we consider the problem of grouped variable selection in high-dimensional regression using ℓ1-ℓq regularization (1 ≤ q ≤ ∞), which can be viewed as a natural generalization of the ℓ1-ℓ2 regularization (the group Lasso). The key condition is that the dimensionality pn can increase much faster than the sample size n, i.e. pn ≫ n (in our case pn is the number of groups), but the number of relevant groups is small. The main conclusion is that many good properties from ℓ1-regularization (Lasso) naturally carry on to the ℓ1-ℓq cases (1 ≤ q ≤ ∞), even if the number of variables within each group also increases with the sample size. With fixed design, we show that the whole family of estimators are both estimation consistent and variable selection consistent under different conditions. We also show the persistency result with random design under a much weaker condition. These results provide a unified treatment for the whole family of estimators ranging from q = 1 (Lasso) to q = ∞ (iCAP), with q = 2 (group Lasso)as a special case. When there is no group structure available, all the analysis reduces to the current results of the Lasso estimator (q = 1).
An Efficient Projection for l1, ∞ Regularization
"... In recent years the l1, ∞ norm has been proposed for joint regularization. In essence, this type of regularization aims at extending the l1 framework for learning sparse models to a setting where the goal is to learn a set of jointly sparse models. In this paper we derive a simple and effective proj ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
In recent years the l1, ∞ norm has been proposed for joint regularization. In essence, this type of regularization aims at extending the l1 framework for learning sparse models to a setting where the goal is to learn a set of jointly sparse models. In this paper we derive a simple and effective projected gradient method for optimization of l1, ∞ regularized problems. The main challenge in developing such a method resides on being able to compute efficient projections to the l1, ∞ ball. We present an algorithm that works in O(nlog n) time and O(n) memory where n is the number of parameters. We test our algorithm in a multi-task image annotation problem. Our results show that l1,∞ leads to better performance than both l2 and l1 regularization and that it is is effective in discovering jointly sparse solutions. 1.
Joint support recovery under high-dimensional scaling: Benefits and perils of ℓ1,∞-regularization
"... Given a collection of r ≥ 2 linear regression problems in p dimensions, suppose that the regression coefficients share partially common supports. This set-up suggests the use of ℓ1/ℓ∞-regularized regression for joint estimation of the p × r matrix of regression coefficients. We analyze the high-dime ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Given a collection of r ≥ 2 linear regression problems in p dimensions, suppose that the regression coefficients share partially common supports. This set-up suggests the use of ℓ1/ℓ∞-regularized regression for joint estimation of the p × r matrix of regression coefficients. We analyze the high-dimensional scaling of ℓ1/ℓ∞-regularized quadratic programming, considering both consistency rates in ℓ∞-norm, and also how the minimal sample size n required for performing variable selection grows as a function of the model dimension, sparsity, and overlap between the supports. We begin by establishing bounds on the ℓ∞error as well sufficient conditions for exact variable selection for fixed design matrices, as well as designs drawn randomly from general Gaussian matrices. These results show that the high-dimensional scaling of ℓ1/ℓ∞-regularization is qualitatively similar to that of ordinary ℓ1-regularization. Our second set of results applies to design matrices drawn from standard Gaussian ensembles, for which we provide a sharp set of necessary and sufficient conditions: the ℓ1/ℓ∞-regularized method undergoes a phase transition characterized by the rescaled sample size θ1,∞(n, p, s, α) = n/{(4 − 3α)s log(p − (2 − α)s)}. More precisely, for any δ> 0, the probability of successfully recovering both supports converges to 1 for scalings such that θ1, ∞ ≥ 1+δ, and converges to 0 for scalings for which θ1, ∞ ≤ 1 −δ. An implication of this threshold is that use of ℓ1,∞-regularization yields improved statistical efficiency if the overlap parameter is large enough (α> 2/3), but performs worse than a naive Lasso-based approach for moderate to small overlap (α < 2/3). We illustrate the close agreement between these theoretical predictions, and the actual behavior in simulations. 1

