## Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit (2006)

Citations: | 174 - 20 self |

### BibTeX

@TECHREPORT{Donoho06sparsesolution,

author = {David L. Donoho and Yaakov Tsaig and Iddo Drori and Jean-luc Starck},

title = {Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit},

institution = {},

year = {2006}

}

### Years of Citing Articles

### OpenURL

### Abstract

Finding the sparsest solution to underdetermined systems of linear equations y = Φx is NP-hard in general. We show here that for systems with ‘typical’/‘random ’ Φ, a good approximation to the sparsest solution is obtained by applying a fixed number of standard operations from linear algebra. Our proposal, Stagewise Orthogonal Matching Pursuit (StOMP), successively transforms the signal into a negligible residual. Starting with initial residual r0 = y, at the s-th stage it forms the ‘matched filter ’ Φ T rs−1, identifies all coordinates with amplitudes exceeding a specially-chosen threshold, solves a least-squares problem using the selected coordinates, and subtracts the leastsquares fit, producing a new residual. After a fixed number of stages (e.g. 10), it stops. In contrast to Orthogonal Matching Pursuit (OMP), many coefficients can enter the model at each stage in StOMP while only one enters per stage in OMP; and StOMP takes a fixed number of stages (e.g. 10), while OMP can take many (e.g. n). StOMP runs much faster than competing proposals for sparse solutions, such as ℓ1 minimization and OMP, and so is attractive for solving large-scale problems. We use phase diagrams to compare algorithm performance. The problem of recovering a k-sparse vector x0 from (y, Φ) where Φ is random n × N and y = Φx0 is represented by a point (n/N, k/n)

### Citations

2870 |
Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. JRStatSocSerB (Methodol
- Benjamini
- 1995
(Show Context)
Citation Context ...rity indicates approximate Gaussianity. a fixed fraction q of all discoveries, and to make the maximum number of discoveries possible subject to that constraint. This leads us to consider Simes’ rule =-=[2, 1]-=-. The False Alarm Control strategy requires knowledge of the number of nonzeros k or some upper bound. False Discovery Control does not require such knowledge, which makes it more convenient for appli... |

1745 | Compressed sensing
- Donoho
- 2006
(Show Context)
Citation Context ...e n < N means that we collect fewer data than unknowns. Despite the indeterminacy, sparsity of x0 allows for accurate reconstruction of the object from what would naively seem to be ‘too few samples’ =-=[17, 8, 48]-=-. App2: Error Correction. Information is transmitted in a coded block in which a small fraction of the entries may be corrupted. From the received data, one constructs a system y = Φx0; here x0 repres... |

1672 | Atomic decomposition by basis pursuit
- Chen, Donoho, et al.
- 1998
(Show Context)
Citation Context ...ng growing attention. Over the years, several applications have been found where signals of interest have sparse representations and exploiting this sparsity offers striking benefits; see for example =-=[11, 28, 26, 25, 7]-=-. At the ICASSP 2005 conference a special session addressed the theme of exploiting sparsity, and a recent international workshop, SPARS05, was largely devoted to this topic. Very recently, considerab... |

1318 | Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information
- Candès, Romberg, et al.
(Show Context)
Citation Context ...ng growing attention. Over the years, several applications have been found where signals of interest have sparse representations and exploiting this sparsity offers striking benefits; see for example =-=[11, 28, 26, 25, 7]-=-. At the ICASSP 2005 conference a special session addressed the theme of exploiting sparsity, and a recent international workshop, SPARS05, was largely devoted to this topic. Very recently, considerab... |

1052 | Matching pursuits with time-frequency dictionaries
- Mallat, Zhang
- 1993
(Show Context)
Citation Context ...n overcomplete expansion; those terms are the columns of Φ. The sparsity allows to recover a unique representation using only a few terms, despite the fact that representation is in general nonunique =-=[43, 11, 21, 20, 50, 51]-=-. In these applications, several algorithms are available to pursue sparse solutions; in some cases attractive theoretical results are known, guaranteeing that the solutions found are the sparsest pos... |

1021 | Multi-user Detection
- Verdu
- 1998
(Show Context)
Citation Context ... − x0 is a measure of the disturbance to exact reconstruction caused by multiple nonzeros in x0. The same notion arises in digital communications where it is called Multiple-Access Interference (MAI) =-=[60]-=-. Perhaps surprisingly - because there is no noise in the problem - the MAI in our setting typically has a Gaussian behavior. More 5s1 0.5 0 −0.5 −1 1 0.5 0 −0.5 −1 1 0.5 0 −0.5 −1 (a) Matched filteri... |

840 | Near optimal signal recovery from random projections: Universal encoding strategies
- Candès, Tao
- 2006
(Show Context)
Citation Context ...e n < N means that we collect fewer data than unknowns. Despite the indeterminacy, sparsity of x0 allows for accurate reconstruction of the object from what would naively seem to be ‘too few samples’ =-=[17, 8, 48]-=-. App2: Error Correction. Information is transmitted in a coded block in which a small fraction of the entries may be corrupted. From the received data, one constructs a system y = Φx0; here x0 repres... |

681 | Adapting to unknown smoothness via wavelet shrinkage
- Donoho, Johnstone
- 1995
(Show Context)
Citation Context ...ruly nonzero’ entries, combined with a large number of ‘Gaussian noise’ entries. The problem of separating ‘signal’ from ‘noise’ in such problems has generated a large literature including the papers =-=[24, 27, 26, 1, 23, 37]-=-, which influenced our way of thinking. We adopt language from statistical decision theory [39] and the field of multiple comparisons [38]. Recall that the support I0 of x0 is being (crudely) estimate... |

553 |
Ideal spatial adaptation via wavelet shrinkage. Biometrika 81
- Donoho, Johnstone
- 1994
(Show Context)
Citation Context ...ng growing attention. Over the years, several applications have been found where signals of interest have sparse representations and exploiting this sparsity offers striking benefits; see for example =-=[11, 28, 26, 25, 7]-=-. At the ICASSP 2005 conference a special session addressed the theme of exploiting sparsity, and a recent international workshop, SPARS05, was largely devoted to this topic. Very recently, considerab... |

527 | Greed is good: Algorithmic results for sparse approximation
- Tropp
(Show Context)
Citation Context ... is much too slow for large-scale applications. Some have advocated a heuristic approach, Orthogonal Matching Pursuit (OMP), (also called greedy approximation and stepwise regression in other fields) =-=[43, 52, 53, 55, 54]-=-, which though often effective in empirical work, does not offer the strong theoretical guarantees that attach to ℓ1 minimization. (For other heuristic approaches, see [50, 51, 29].) In this paper we ... |

360 | Introduction to Space-Time Wireless Communications - Paulraj, Nabar, et al. - 2003 |

343 | For most large underdetermined systems of linear equations, the minimal ℓ1 solution is also the sparsest solution
- Donoho
- 2006
(Show Context)
Citation Context ... Also called Basis Pursuit (BP) [11], this method enjoys some particularly striking theoretical properties, such as rigorous proofs of exact reconstruction under seemingly quite general circumstances =-=[21, 35, 32, 7, 16, 8, 17, 18]-=- Unfortunately, some of the most powerful theoretical results are associated with fairly heavy computationally burdens. The research reported here began when, in applying the theory of compressed sens... |

225 |
Tilborg. On the inherent intractability of certain coding problems
- Berlekamp, McEliece, et al.
- 1978
(Show Context)
Citation Context ...ted numbers with the ability to correct up to k arbitrary errors; unfortunately for general linear coding schemes the task of identifying the k most likely sites for the errors is known to be NP-hard =-=[3]-=-. Lately, there has been much interest in developing good fast decoding schemes. The literature in IEEE Transactions on Information Theory on message passing decoding and turbo decoding is literally t... |

217 | Shitz), “Spectral efficiency of CDMA with random spreading
- Verdú, Shamai
- 1999
(Show Context)
Citation Context ...mation means that the MAI has an approximately Gaussian marginal distribution. (The reader interested in formal proofs of Gaussian approximation can consult the literature of multiuser detection e.g. =-=[46, 61, 12]-=-; such a proof is implicit in the proofs of Theorems 1 and 2 below. The connection between our work and MUD theory will be amplified in Section 11 below). Properly speaking, the term ‘MAI’ applies onl... |

215 |
Sparse representations in unions of bases
- Gribonval, Nielsen
(Show Context)
Citation Context ... Also called Basis Pursuit (BP) [11], this method enjoys some particularly striking theoretical properties, such as rigorous proofs of exact reconstruction under seemingly quite general circumstances =-=[21, 35, 32, 7, 16, 8, 17, 18]-=- Unfortunately, some of the most powerful theoretical results are associated with fairly heavy computationally burdens. The research reported here began when, in applying the theory of compressed sens... |

158 | On sparse representations in arbitrary redundant bases
- Fuchs
- 2004
(Show Context)
Citation Context ... Also called Basis Pursuit (BP) [11], this method enjoys some particularly striking theoretical properties, such as rigorous proofs of exact reconstruction under seemingly quite general circumstances =-=[21, 35, 32, 7, 16, 8, 17, 18]-=- Unfortunately, some of the most powerful theoretical results are associated with fairly heavy computationally burdens. The research reported here began when, in applying the theory of compressed sens... |

147 | Signal recovery from partial information via Orthogonal Matching Pursuit. Submitted to
- Tropp, Gilbert
(Show Context)
Citation Context ... is much too slow for large-scale applications. Some have advocated a heuristic approach, Orthogonal Matching Pursuit (OMP), (also called greedy approximation and stepwise regression in other fields) =-=[43, 52, 53, 55, 54]-=-, which though often effective in empirical work, does not offer the strong theoretical guarantees that attach to ℓ1 minimization. (For other heuristic approaches, see [50, 51, 29].) In this paper we ... |

144 | Probability of error in mmse multiuser detection
- Poor, Verdú
- 1997
(Show Context)
Citation Context ...mation means that the MAI has an approximately Gaussian marginal distribution. (The reader interested in formal proofs of Gaussian approximation can consult the literature of multiuser detection e.g. =-=[46, 61, 12]-=-; such a proof is implicit in the proofs of Theorems 1 and 2 below. The connection between our work and MUD theory will be amplified in Section 11 below). Properly speaking, the term ‘MAI’ applies onl... |

138 | Tensor Methods in Statistics - McCullagh - 1987 |

129 | Image decomposition via the combination of sparse representations and a variational approach - Starck, Elad, et al. - 2005 |

109 | Adapting to unknown sparsity by controlling the false discovery rate
- Abramovich, Benjamini, et al.
- 2006
(Show Context)
Citation Context ...mple ˆxS = x0. (3.1) We give a simple example showing that the procedure works in a special case. We generated a coefficient vector x0 with k = 32 nonzeros, having amplitudes uniformly distributed on =-=[0, 1]-=-. We sampled a matrix Φ at random from the USE with n = 256, N = 1024, and computed a linear measurement vector y = Φx0. Thus the problem of recovering x0 given y is 1 : 4 underdetermined (i.e. δ = n/... |

89 | Wavelab and reproducible research
- Buckheit, Donoho
- 1995
(Show Context)
Citation Context ...ance of StOMP and the thresholding strategies in concrete instances, inspired by applications in spectroscopy and imaging. 9.1.1 Bumps Our first example uses the object Bumps from the Wavelab package =-=[5]-=-, rendered with N = 4096 samples. This object, shown in panel (a) of Figure 12, is a caricature of signals arising in NMR spectroscopy, characterized by a few localized peaks. Such signals are known t... |

89 | Just relax: Convex programming methods for subset selection and sparse approximation
- Tropp
- 2004
(Show Context)
Citation Context ...ons. However, we are seeking a sparse solution, and for certain matrices Φ, sparsity will prove a powerful constraint. Some of the rapidly accumulating literature documenting this phenomenon includes =-=[21, 20, 32, 55, 56, 50, 51, 8, 18, 16, 57, 58, 48]-=-. For now, we consider a specific collection of matrices where sparsity proves valuable. Until we say otherwise, let Φ be a random matrix taken from the Uniform Spherical ensemble (USE); the columns o... |

87 | Geometric approach to error correcting codes and reconstruction of signals
- Rudelson, Veshynin
- 2005
(Show Context)
Citation Context ...e n < N means that we collect fewer data than unknowns. Despite the indeterminacy, sparsity of x0 allows for accurate reconstruction of the object from what would naively seem to be ‘too few samples’ =-=[17, 8, 48]-=-. App2: Error Correction. Information is transmitted in a coded block in which a small fraction of the entries may be corrupted. From the received data, one constructs a system y = Φx0; here x0 repres... |

85 | Neighborly polytopes and sparse solution of underdetermined linear equations. Stanford dept. of statistics working paper
- Donoho
- 2004
(Show Context)
Citation Context ...11.5 Fine Points Two small but essential points: • The notion of phase transition considered here is weaker than notions often mentioned in connection with the study of ℓ1 optimization. As the papers =-=[18, 19]-=- may help clarify, much of the literature discusses the notion of strong equivalence of ℓ1 and ℓ0; in this notion that for a given Φ, every sparse x0 generates a problem (y, Φ) for which the ℓ1 soluti... |

80 | Nearoptimal sparse Fourier representations via sampling
- Gilbert, Guha, et al.
(Show Context)
Citation Context ... as well. 9 Stylized Applications We now illustrate the performance of StOMP and the thresholding strategies. 9.1 Compressed Sensing Recently, there has been considerable interest both from theorists =-=[33, 7, 17, 8]-=- and from practitioners [42, 47, 31, 40, 41] in the possibility of dramatically reducing the ‘number of samples’ that ‘have to be measured’ in various remote sensing problems. In effect, one views the... |

80 |
Redundant multiscale transforms and their application for morphological component analysis
- Starck, Elad, et al.
(Show Context)
Citation Context ...n overcomplete expansion; those terms are the columns of Φ. The sparsity allows to recover a unique representation using only a few terms, despite the fact that representation is in general nonunique =-=[43, 11, 21, 20, 50, 51]-=-. In these applications, several algorithms are available to pursue sparse solutions; in some cases attractive theoretical results are known, guaranteeing that the solutions found are the sparsest pos... |

65 |
Maximum entropy and the nearly black object
- Donoho, Johnstone, et al.
- 1992
(Show Context)
Citation Context |

63 | Analyzing the Turbo decoder using the Gaussian approximation
- Gamal, Hammons
- 2001
(Show Context)
Citation Context ...iterative decoder. Finally the specific analysis technique we have developed – the conditioned Gaussian limit – bears some resemblance to density evolution schemes used in the MUD/CDMA literature, eg =-=[4, 6, 13, 30]-=-. However, there are important differences in the problem, the primary one being that in the binary CDMA case the vector x0 is not sparse and takes known values ±1, which cause important differences i... |

56 | Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences
- IM, BW
- 2004
(Show Context)
Citation Context ...ruly nonzero’ entries, combined with a large number of ‘Gaussian noise’ entries. The problem of separating ‘signal’ from ‘noise’ in such problems has generated a large literature including the papers =-=[24, 27, 26, 1, 23, 37]-=-, which influenced our way of thinking. We adopt language from statistical decision theory [39] and the field of multiple comparisons [38]. Recall that the support I0 of x0 is being (crudely) estimate... |

48 | Output MAI distributions of linear MMSE multiuser receivers in DS–CDMA systems
- Zhang, Chong, et al.
(Show Context)
Citation Context ...mation means that the MAI has an approximately Gaussian marginal distribution. (The reader interested in formal proofs of Gaussian approximation can consult the literature of multiuser detection e.g. =-=[46, 61, 12]-=-; such a proof is implicit in the proofs of Theorems 1 and 2 below. The connection between our work and MUD theory will be amplified in Section 11 below). Properly speaking, the term ‘MAI’ applies onl... |

47 | Nonlinear methods of approximation
- Temlyakov
- 2002
(Show Context)
Citation Context ... is much too slow for large-scale applications. Some have advocated a heuristic approach, Orthogonal Matching Pursuit (OMP), (also called greedy approximation and stepwise regression in other fields) =-=[43, 52, 53, 55, 54]-=-, which though often effective in empirical work, does not offer the strong theoretical guarantees that attach to ℓ1 minimization. (For other heuristic approaches, see [50, 51, 29].) In this paper we ... |

39 | Iterative multiuser joint decoding: optimal power allocation and low-complexity implementation
- Caire, Mueller, et al.
- 2004
(Show Context)
Citation Context ... Poor, Verdú, and others [46, 61, 12, 36]. Also, the idea that sophisticated multistage 29salgorithms can be applied to successively reduce MAI – e.g. onion-peeling schemes [10] or iterative decoders =-=[4, 6, 13]-=- - is completely consistent with our approach in this paper: stagewise least-squares projection when the nonzeros in x0 have a power-law distribution is something like onion-peeling; ℓ1 minimization i... |

38 | Asymptotic normality of linear multiuser receiver outputs
- Guo, Verdu, et al.
- 2002
(Show Context)
Citation Context ... be sparse; and (b) Φ need not have more columns than rows. In the MUD literature, the idea that z = x0 − Φ T y looks like Gaussian noise is clearly established in the work of Poor, Verdú, and others =-=[46, 61, 12, 36]-=-. Also, the idea that sophisticated multistage 29salgorithms can be applied to successively reduce MAI – e.g. onion-peeling schemes [10] or iterative decoders [4, 6, 13] - is completely consistent wit... |

32 |
Signal recovery and the large sieve
- Donoho, Logan
- 1992
(Show Context)
Citation Context |

23 | PDCO: Primal-Dual interior method for Convex Objectives - Saunders, Kim - 2007 |

20 |
Optimally Sparse Representation from Overcomplete Dictionaries via l1-norm minimization
- Donoho, Elad
- 2002
(Show Context)
Citation Context ...n overcomplete expansion; those terms are the columns of Φ. The sparsity allows to recover a unique representation using only a few terms, despite the fact that representation is in general nonunique =-=[43, 11, 21, 20, 50, 51]-=-. In these applications, several algorithms are available to pursue sparse solutions; in some cases attractive theoretical results are known, guaranteeing that the solutions found are the sparsest pos... |

20 |
Loan: Matrix Computations, third edition, The Johns Hopkins
- Golub, van
- 1996
(Show Context)
Citation Context ...onal flops. The core of the computation lies in calculating the projection of y onto the subset of columns ΦIs, to get a new approximation xs. This is implemented via a Conjugate Gradient (CG) solver =-=[34]-=-. Each CG iteration involves application of ΦIs and ΦT Is , costing at most 2nN + O(N) flops. The number of CG iterations used is a small constant, independent of n and N, which we denote ν. In our im... |

19 |
Minimax risk over ℓp–balls for ℓq–error. Probability Theory and Related
- Donoho, Johnstone
- 1994
(Show Context)
Citation Context ...ruly nonzero’ entries, combined with a large number of ‘Gaussian noise’ entries. The problem of separating ‘signal’ from ‘noise’ in such problems has generated a large literature including the papers =-=[24, 27, 26, 1, 23, 37]-=-, which influenced our way of thinking. We adopt language from statistical decision theory [39] and the field of multiple comparisons [38]. Recall that the support I0 of x0 is being (crudely) estimate... |

16 |
Testing Statistical Hypotheses, 3rd edition
- Lehmann, Romano
- 2010
(Show Context)
Citation Context ...’ from ‘noise’ in such problems has generated a large literature including the papers [24, 27, 26, 1, 23, 37], which influenced our way of thinking. We adopt language from statistical decision theory =-=[39]-=- and the field of multiple comparisons [38]. Recall that the support I0 of x0 is being (crudely) estimated in the StOMP algorithm. If a coordinate belonging to I0 does not appear in IS, we call this a... |

15 |
M and DeMol C 2004 An iterative thresholding algorithm for linear inverse problems with a sparsity constraint
- Daubechies, Defries
(Show Context)
Citation Context ... columns surviving thresholding. The use of alternating thresholding to obtain sparse solution has been extensively deployed by Starck and co-workers [50, 51] and studied by Daubechies and co-workers =-=[15]-=-. An early reference applying a kind of alternating thresholding to seek sparse solutions to underdetermined systems was Coifman and Wickerhauser (1993) [14]. To our knowledge, the alternating thresho... |

15 |
High-dimensional centrosymmetric polytopes with neighborliness proportional to dimension
- Donoho
- 2006
(Show Context)
Citation Context ...is perfectly sharp. Suppose that problem (y, Φ) is drawn at random from the standard problem suite, and consider the event Ek,n,N that x0 = x1 i.e. that ℓ1 minimization exactly recovers x0. The paper =-=[19]-=- defines a function ρℓ1 (δ) (called there ρW ) with the following property. Consider sequences of (kn), (Nn) obeying kn/n → ρ and n/Nn → δ. Suppose that ρ < ρℓ1 (δ). Then as n → ∞ P rob(Ekn,n,Nn ) → 1... |

14 | Asymptotic minimaxity of false discovery rate thresholding for sparse exponential data
- DONOHO, J
- 2006
(Show Context)
Citation Context |

11 |
Decoding via linear programming
- Candès, Tao
- 2005
(Show Context)
Citation Context ...rature discusses the notion of strong equivalence of ℓ1 and ℓ0; in this notion that for a given Φ, every sparse x0 generates a problem (y, Φ) for which the ℓ1 solution is the unique sparsest solution =-=[21, 20, 16, 32, 35, 48, 55, 8, 9]-=-. In contrast, in discussing ℓ1 minimization in Section 4.2 above, we used the notion of weak equivalence, which says that equivalence holds for the typical sparse x0 (rather than for every sparse x0)... |

11 | Optimum Asymptotic Multiuser Efficiency of Randomly Spread CDMA - Tse, Verdú - 2000 |

7 |
Breakdown of Equivalence between the Minimal ℓ1-norm Solution and
- Tsaig, Donoho
- 2006
(Show Context)
Citation Context ...ons. However, we are seeking a sparse solution, and for certain matrices Φ, sparsity will prove a powerful constraint. Some of the rapidly accumulating literature documenting this phenomenon includes =-=[21, 20, 32, 55, 56, 50, 51, 8, 18, 16, 57, 58, 48]-=-. For now, we consider a specific collection of matrices where sparsity proves valuable. Until we say otherwise, let Φ be a random matrix taken from the Uniform Spherical ensemble (USE); the columns o... |

6 |
Greedy algorithms and m-term approximation
- Temlyakov
- 1999
(Show Context)
Citation Context |

5 |
Xiaoming (2001) Uncertainty Principles and Ideal Atomic Decomposition
- Donoho, Huo
- 2001
(Show Context)
Citation Context |

5 |
Rapid MR angiography with randomly under-sampled 3DFT trajectories and non-linear reconstruction
- Lustig, Santos, et al.
- 2006
(Show Context)
Citation Context ...ow illustrate the performance of StOMP and the thresholding strategies. 9.1 Compressed Sensing Recently, there has been considerable interest both from theorists [33, 7, 17, 8] and from practitioners =-=[42, 47, 31, 40, 41]-=- in the possibility of dramatically reducing the ‘number of samples’ that ‘have to be measured’ in various remote sensing problems. In effect, one views the problem as one of reconstructing a high-dim... |

3 |
Caire G (2002) Iterative multiuser joint decoding: unified framework and asymptotic analysis
- Boutros
(Show Context)
Citation Context ... Poor, Verdú, and others [46, 61, 12, 36]. Also, the idea that sophisticated multistage 29salgorithms can be applied to successively reduce MAI – e.g. onion-peeling schemes [10] or iterative decoders =-=[4, 6, 13]-=- - is completely consistent with our approach in this paper: stagewise least-squares projection when the nonzeros in x0 have a power-law distribution is something like onion-peeling; ℓ1 minimization i... |