## NESTA: A Fast and Accurate First-Order Method for Sparse Recovery (2009)

### Cached

### Download Links

Citations: | 75 - 2 self |

### BibTeX

@MISC{Becker09nesta:a,

author = {Stephen Becker and Jérôme Bobin and Emmanuel J. Candès},

title = {NESTA: A Fast and Accurate First-Order Method for Sparse Recovery },

year = {2009}

}

### OpenURL

### Abstract

Accurate signal recovery or image reconstruction from indirect and possibly undersampled data is a topic of considerable interest; for example, the literature in the recent field of compressed sensing is already quite immense. Inspired by recent breakthroughs in the development of novel first-order methods in convex optimization, most notably Nesterov’s smoothing technique, this paper introduces a fast and accurate algorithm for solving common recovery problems in signal processing. In the spirit of Nesterov’s work, one of the key ideas of this algorithm is a subtle averaging of sequences of iterates, which has been shown to improve the convergence properties of standard gradient-descent algorithms. This paper demonstrates that this approach is ideally suited for solving large-scale compressed sensing reconstruction problems as 1) it is computationally efficient, 2) it is accurate and returns solutions with several correct digits, 3) it is flexible and amenable to many kinds of reconstruction problems, and 4) it is robust in the sense that its excellent performance across a wide range of problems does not depend on the fine tuning of several parameters. Comprehensive numerical experiments on realistic signals exhibiting a large dynamic range show that this algorithm compares favorably with recently proposed state-of-the-art methods. We also apply the algorithm to solve other problems for which there are fewer alternatives, such as total-variation minimization, and

### Citations

3300 | Variational Analysis
- Rockafeller, Wets
- 1998
(Show Context)
Citation Context ...m a posteriori estimate in a Bayesian setting. In statistics, the same problem is more well-known as the lasso [49] (LSτ ) minimize ‖b − Ax‖ℓ2 subject to ‖x‖ℓ1 ≤ τ. (1.4) Standard optimization theory =-=[47]-=- asserts that these three problems are of course equivalent provided that ɛ, λ, τ obey some special relationships. With the exception of the case where the matrix A is orthogonal, this functional depe... |

2072 |
A Wavelet Tour of Signal Processing
- Mallat
- 1998
(Show Context)
Citation Context ...ed by an extra term, namely 2 CW where CW is the cost of applying W or W ∗ to a vector. In practical situations, there is often a fast algorithm for applying W and W ∗ , e.g. a fast wavelet transform =-=[39]-=-, a fast curvelet transform [11], a fast short-time Fourier transform [39] and so on, which makes this a low-cost extra step 3 . 6.2. Numerical results for nonstandard ℓ1 minimization. Because NESTA i... |

1881 | Regression shrinkage and selection via the lasso
- Tibshirani
- 1996
(Show Context)
Citation Context ...is popular in signal and image processing because of its loose interpretation as a maximum a posteriori estimate in a Bayesian setting. In statistics, the same problem is more well-known as the lasso =-=[49]-=- (LSτ ) minimize ‖b − Ax‖ℓ2 subject to ‖x‖ℓ1 ≤ τ. (1.4) Standard optimization theory [47] asserts that these three problems are of course equivalent provided that ɛ, λ, τ obey some special relationshi... |

1764 | Compressed sensing
- Donoho
- 2006
(Show Context)
Citation Context ...pproximations of nonsmooth functions, ℓ1 minimization, duality in convex optimization, continuation methods, compressed sensing, total-variation minimization. 1. Introduction. Compressed sensing (CS) =-=[13, 14, 25]-=- is a novel sampling theory, which is based on the revelation that one can exploit sparsity or compressibility when acquiring signals of general interest. In a nutshell, compressed sensing designs non... |

1690 | Atomic decomposition by Basis Pursuit
- Chen, Donoho, et al.
- 1999
(Show Context)
Citation Context ... frequently discussed approach considers solving this problem in Lagrangian form, i.e. (QP λ) minimize λ‖x‖ℓ1 1 + 2 ‖b − Ax‖2ℓ2 , (1.3) and is also known as the basis pursuit denoising problem (BPDN) =-=[18]-=-. This problem is popular in signal and image processing because of its loose interpretation as a maximum a posteriori estimate in a Bayesian setting. In statistics, the same problem is more well-know... |

1332 | Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information
- Candès, J, et al.
(Show Context)
Citation Context ...pproximations of nonsmooth functions, ℓ1 minimization, duality in convex optimization, continuation methods, compressed sensing, total-variation minimization. 1. Introduction. Compressed sensing (CS) =-=[13, 14, 25]-=- is a novel sampling theory, which is based on the revelation that one can exploit sparsity or compressibility when acquiring signals of general interest. In a nutshell, compressed sensing designs non... |

850 | Near-optimal signal recovery from random projections: Universal encoding strategies
- Candès, Tao
- 2006
(Show Context)
Citation Context ...pproximations of nonsmooth functions, ℓ1 minimization, duality in convex optimization, continuation methods, compressed sensing, total-variation minimization. 1. Introduction. Compressed sensing (CS) =-=[13, 14, 25]-=- is a novel sampling theory, which is based on the revelation that one can exploit sparsity or compressibility when acquiring signals of general interest. In a nutshell, compressed sensing designs non... |

431 |
An iterative thresholding algorithm for linear inverse problems with a sparsity constraint
- Daubechies, Defrise, et al.
(Show Context)
Citation Context ...int methods [10, 36, 48] are accurate but problematic for they need to solve large systems of linear equations to compute the Newton steps. On the other hand, inspired by iterative thresholding ideas =-=[24, 30, 20]-=-, we have now available a great number of first-order methods, see [31, 9, 34, 35] and the many earlier references therein, which may be faster but not necessarily accurate. Indeed, these methods are ... |

428 | The Dantzig selector: statistical estimation when p is much larger than n
- Candes, Tao
- 2007
(Show Context)
Citation Context ...ated problems, which do not have the special ℓ1 + ℓ2 2 structure. One example might be the Dantzig selector, which is a convenient and flexible estimator for recovering sparse signals from noisy data =-=[15]-=-: minimize ‖x‖ℓ1 subject to ‖A∗ (b − AX)‖ℓ∞ ≤ δ. (7.2) This is of course equivalent to the unconstrained problem minimize λ‖x‖ℓ1 + ‖A∗ (b − AX)‖ℓ∞ for some value of λ. Clearly, one could apply Nestero... |

378 | A fast iterative shrinkage-thresholding algorithm for linear inverse problems
- Beck, Teboulle
- 2009
(Show Context)
Citation Context ...optimal [41] two decades earlier. As a consequence of this breakthrough, a few recent works have followed up with improved techniques for some very special problems in signal or image processing, see =-=[3, 21, 52, 1]-=- for example, or for minimizing composite functions such as ℓ1-regularized least-squares problems [44]. In truth, these novel algorithms demonstrate great promise; they are fast, accurate and robust i... |

357 | CoSaMP: Iterative signal recovery from incomplete and inaccurate samples
- Needell, Tropp
- 2008
(Show Context)
Citation Context ...ficantly outperforms standard FPC. All parameters were set to default values. 5.1.7. FPC Active Set (FPC-AS) [53]. In 2009, inspired by both first-order algorithms, such as FPC, and greedy algorithms =-=[28, 40]-=-, Wen et al. [53] extend FPC into the two-part algorithm FPC Active Set to solve (QPλ). In the first stage, FPC-AS calls an improved version of FPC that allows the step-size to be updated dynamically,... |

298 | Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems
- Figueiredo, Nowak, et al.
- 2007
(Show Context)
Citation Context ...rge systems of linear equations to compute the Newton steps. On the other hand, inspired by iterative thresholding ideas [24, 30, 20], we have now available a great number of first-order methods, see =-=[31, 9, 34, 35]-=- and the many earlier references therein, which may be faster but not necessarily accurate. Indeed, these methods are shown to converge slowly, and typically need a very large number of iterations whe... |

283 |
V.R.: Signal recovery by proximal forward-backward splitting
- Combettes, Wajs
- 2005
(Show Context)
Citation Context ...int methods [10, 36, 48] are accurate but problematic for they need to solve large systems of linear equations to compute the Newton steps. On the other hand, inspired by iterative thresholding ideas =-=[24, 30, 20]-=-, we have now available a great number of first-order methods, see [31, 9, 34, 35] and the many earlier references therein, which may be faster but not necessarily accurate. Indeed, these methods are ... |

258 | Smooth minimization of non-smooth functions - Nesterov |

237 | An EM algorithm for waveletbased image restoration
- Figueiredo, Nowak
- 2003
(Show Context)
Citation Context ...int methods [10, 36, 48] are accurate but problematic for they need to solve large systems of linear equations to compute the Newton steps. On the other hand, inspired by iterative thresholding ideas =-=[24, 30, 20]-=-, we have now available a great number of first-order methods, see [31, 9, 34, 35] and the many earlier references therein, which may be faster but not necessarily accurate. Indeed, these methods are ... |

230 |
Sparse MRI: the application of compressed sensing for rapid MR imaging
- Lustig, Donoho, et al.
- 2007
(Show Context)
Citation Context ... to recover a signal accurately, engineers are changing the way they think about signal acquisition in areas ranging from analog-to-digital conversion [23], digital optics, magnetic resonance imaging =-=[38]-=-, seismics [37] and astronomy [8]. In this field, a signal x0 ∈ Rn is acquired by collecting data of the form b = Ax 0 + z, where x 0 is the signal of interest (or its coefficient sequence in a repres... |

205 | Regularized paths for generalized linear models via coordinate descent - Friedman, Hastie, et al. |

203 | Introductory Lectures on Convex Optimization: A Basic Course - Nesterov - 2004 |

193 | Gradient methods for minimizing composite objective function - Nesterov |

178 | Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit
- Donoho, Tsaig, et al.
- 2006
(Show Context)
Citation Context ...ficantly outperforms standard FPC. All parameters were set to default values. 5.1.7. FPC Active Set (FPC-AS) [53]. In 2009, inspired by both first-order algorithms, such as FPC, and greedy algorithms =-=[28, 40]-=-, Wen et al. [53] extend FPC into the two-part algorithm FPC Active Set to solve (QPλ). In the first stage, FPC-AS calls an improved version of FPC that allows the step-size to be updated dynamically,... |

174 | Sparse reconstruction by separable approximation
- Wright, Nowak, et al.
(Show Context)
Citation Context ...n the final step) was changed. Future releases of GPSR will probably contain a similarly updated continuation stopping criteria. 1920 5.1.3. Sparse reconstruction by separable approximation (SpaRSA) =-=[54]-=-. SpaRSA is an algorithm to minimize composite functions φ(x) = f(x) + λc(x) composed of a smooth term f and a separable non-smooth term c, e.g. (QP λ). At every step, a subproblem of the form minimiz... |

164 | A new approach to variable selection in least squares problems
- Osborne, Presnell, et al.
- 2000
(Show Context)
Citation Context ...olutions to the problem (QPλ) and, hence, the solutions to (1.1) and (1.4) may be found by solving a sequence a ℓ1 penalized least-squares problems. The point of this is that it has been noticed (see =-=[34, 45, 27]-=-) that solving (1.3) (resp. the lasso (1.4)) is faster when λ is large (resp. τ is low). This observation greatly motivates the use of continuation for solving (1.3) for a fixed λf . The idea is simpl... |

163 |
Two point step size gradient methods
- Barzilai, Borwein
- 1988
(Show Context)
Citation Context ... vector of ones, and v belongs to the nonnegative orthant, v[i] ≥ 0 for all i. The projection onto Q is then trivial. Different techniques for choosing the stepsize αk (backtracking, Barzilai-Borwein =-=[2]-=-, and so on) are discussed in [31]. The code is available at http://www.lx.it.pt/~mtf/GPSR/. In the forthcoming experiments, the parameters are set to their default values. GPSR also implements contin... |

162 | Probing the pareto frontier for basis pursuit solutions
- Berg, Friedlander
(Show Context)
Citation Context ...ction 5 presents a comprehensive series of numerical experiments which illustrate the behavior of several state-of-the-art methods including interior point methods [36], projected gradient techniques =-=[34, 51, 31]-=-, fixed point continuation and iterative thresholding algorithms [34, 56, 3]. It is important to consider that most of these methods have been perfected after several years of research [36, 31], and d... |

158 | An interior-point method for large-scale l1-regularized least squares - Kim, Koh, et al. |

136 | Nonmonotone spectral projected gradient methods on convex sets
- Birgin, Mart́ınez, et al.
(Show Context)
Citation Context ...de is available at http://www.stanford.edu/~boyd/l1_ls/. 5.1.5. Spectral projected gradient (SPGL1) [51]. In 2008, van den Berg et al. adapted the spectral projection gradient algorithm introduced in =-=[6]-=- to solve the LASSO (LSτ ). Interestingly, they introduced a clever root finding procedure such that solving a few instances of (LSτ ) for different values of τ enables them to equivalently solve (BPɛ... |

121 | BFast discrete curvelet transforms
- Candès, Demanet, et al.
- 2005
(Show Context)
Citation Context ... where CW is the cost of applying W or W ∗ to a vector. In practical situations, there is often a fast algorithm for applying W and W ∗ , e.g. a fast wavelet transform [39], a fast curvelet transform =-=[11]-=-, a fast short-time Fourier transform [39] and so on, which makes this a low-cost extra step 3 . 6.2. Numerical results for nonstandard ℓ1 minimization. Because NESTA is one of very few algorithms tha... |

120 |
The split Bregman method for L1regularized problems
- Goldstein, Osher
- 2009
(Show Context)
Citation Context ...g 31 minimize ‖x‖T V subject to ‖b − Ax‖ℓ2 ≤ ɛ. (6.4) To be sure, a number of efficient TV-minimization algorithms have been proposed to solve (6.4) in the special case A = I (denoising problem), see =-=[17, 22, 33]-=-. In comparison, only a few methods have been proposed to solve the more general problem (6.4) even when A is a projector. Known methods include interior point methods (ℓ1magic) [10], proximal-subgrad... |

119 | Fast discrete curvelet transforms, Multiscale Modeling - Candès, Demanet, et al. |

110 | A method for solving a convex programming problem with convergence rate 1/k2 - Nesterov - 1983 |

100 | A new TwIST: Two-step iterative shrinkage/thresholding algorithms for image restoration
- Bioucas-Dias, Figueiredo
- 2007
(Show Context)
Citation Context ...on, only a few methods have been proposed to solve the more general problem (6.4) even when A is a projector. Known methods include interior point methods (ℓ1magic) [10], proximal-subgradient methods =-=[5, 19]-=-, Split-Bregman [33], and the very recently introduced RecPF4 [55], which operates in the special case of partial Fourier measurements. Roughly, proximal gradient methods approach the solution to (6.4... |

78 | Enhancing sparsity by reweighted ℓ1 minimization - Candes, Wakin, et al. - 2008 |

73 |
On accelerated proximal gradient methods for convex-concave optimization
- Tseng
(Show Context)
Citation Context ...k from moving too far away from the center x c p. The point xk, at which the gradient of f is evaluated, is a weighted average between zk and yk. In truth, this is motivated by a theoretical analysis =-=[43, 50]-=-, which shows that if αk = 1/2(k + 1) and τk = 2/(k + 3), then the algorithm converges to with the convergence rate x ⋆ = argmin f(x) x∈Qp f(yk) − f(x ⋆ ) ≤ 4Lpp(x⋆ ) (k + 1) 2 . (2.3) σp6 This decay... |

72 | Analysis versus synthesis in signal priors - Elad, Milanfar, et al. - 2007 |

63 | Bregman Iterative Algorithms for ℓ1-Minimization with Applications to Compressed Sensing
- Yin, Osher, et al.
- 2008
(Show Context)
Citation Context ...rate the behavior of several state-of-the-art methods including interior point methods [36], projected gradient techniques [34, 51, 31], fixed point continuation and iterative thresholding algorithms =-=[34, 56, 3]-=-. It is important to consider that most of these methods have been perfected after several years of research [36, 31], and did not exist two years ago. For example, the Fixed Point Continuation method... |

62 | Accelerated projected gradient methods for linear inverse problems with sparsity constraints - Daubechies, Fornasier, et al. |

59 |
A method for unconstrained convex minimization problem with the rate of convergence O(1/k2
- Nesterov
- 1983
(Show Context)
Citation Context ...per which couples smoothing techniques (see [4] and the references therein) with an improved gradient method to derive first-order methods which achieve a convergence rate he had proved to be optimal =-=[41]-=- two decades earlier. As a consequence of this breakthrough, a few recent works have followed up with improved techniques for some very special problems in signal or image processing, see [3, 21, 52, ... |

58 | Fast linearized Bregman iteration for Compressive Sensing and Sparse Denoising
- Osher, Mao, et al.
(Show Context)
Citation Context ...rge systems of linear equations to compute the Newton steps. On the other hand, inspired by iterative thresholding ideas [24, 30, 20], we have now available a great number of first-order methods, see =-=[31, 9, 34, 35]-=- and the many earlier references therein, which may be faster but not necessarily accurate. Indeed, these methods are shown to converge slowly, and typically need a very large number of iterations whe... |

54 | Coordinate and subspace optimization methods for linear least squares with nonquadratic regularization - Elad, Matalon, et al. |

50 | M.: Fast image recovery using variable splitting and constrained optimization - Afonso, Bioucas-Dias, et al. - 2010 |

48 | Near-ideal model selection by ℓ1 minimization - Candes, Plan |

45 | Fixed-point continuation for ℓ1-minimization: Methodology and convergence - Hale, Yin, et al. |

44 | An efficient primal-dual hybrid gradient algorithm for total variation image restoration - Zhu, Chan - 2008 |

42 | Efficient schemes for total variation minimization under constraints in image processing - Weiss, Blanc-Féraud, et al. |

38 | A proximal decomposition method for solving convex variational inverse problems, Inverse Problems 24
- Combettes, Pesquet
- 2008
(Show Context)
Citation Context ...on, only a few methods have been proposed to solve the more general problem (6.4) even when A is a projector. Known methods include interior point methods (ℓ1magic) [10], proximal-subgradient methods =-=[5, 19]-=-, Split-Bregman [33], and the very recently introduced RecPF4 [55], which operates in the special case of partial Fourier measurements. Roughly, proximal gradient methods approach the solution to (6.4... |

31 |
Y.: Fixed-point continuation for l1-minimization: Methodology and convergence
- Hale, Yin, et al.
- 2008
(Show Context)
Citation Context ... sparse reconstruction algorithms. To repeat ourselves, many of these methodshave been improved after several years of research [36, 31], and many did not exist two years ago [34, 51]. For instance, =-=[35]-=- was submitted for publication less than three months before we put the final touches on this paper. Finally, our focus is on rapid algorithms so that we are interested in methods which can take advan... |

26 |
A fixed-point continuation method for l1-regularized minimization with applications to compressed sensing
- Hale, Yin, et al.
- 2007
(Show Context)
Citation Context ...rge systems of linear equations to compute the Newton steps. On the other hand, inspired by iterative thresholding ideas [24, 30, 20], we have now available a great number of first-order methods, see =-=[31, 9, 34, 35]-=- and the many earlier references therein, which may be faster but not necessarily accurate. Indeed, these methods are shown to converge slowly, and typically need a very large number of iterations whe... |

25 | A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation
- Wen, Yin, et al.
- 2009
(Show Context)
Citation Context ...ormance. In the numerical tests, the Barzilai-Borwein version (referred to as FPC-BB) significantly outperforms standard FPC. All parameters were set to default values. 5.1.7. FPC Active Set (FPC-AS) =-=[53]-=-. In 2009, inspired by both first-order algorithms, such as FPC, and greedy algorithms [28, 40], Wen et al. [53] extend FPC into the two-part algorithm FPC Active Set to solve (QPλ). In the first stag... |

23 |
Some first-order algorithms for total variation based image restoration
- Aujol
(Show Context)
Citation Context ...optimal [41] two decades earlier. As a consequence of this breakthrough, a few recent works have followed up with improved techniques for some very special problems in signal or image processing, see =-=[3, 21, 52, 1]-=- for example, or for minimizing composite functions such as ℓ1-regularized least-squares problems [44]. In truth, these novel algorithms demonstrate great promise; they are fast, accurate and robust i... |

23 | Compressed sensing in astronomy
- Bobin, Stack, et al.
- 2008
(Show Context)
Citation Context ...ngineers are changing the way they think about signal acquisition in areas ranging from analog-to-digital conversion [23], digital optics, magnetic resonance imaging [38], seismics [37] and astronomy =-=[8]-=-. In this field, a signal x0 ∈ Rn is acquired by collecting data of the form b = Ax 0 + z, where x 0 is the signal of interest (or its coefficient sequence in a representation where it is assumed to b... |