## From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images (2007)

### Cached

### Download Links

- [www.cs.technion.ac.il]
- [www.ece.duke.edu]
- [people.ee.duke.edu]
- [www.cs.technion.ac.il]
- [www.cs.technion.ac.il]
- CiteULike

### Other Repositories/Bibliography

Citations: | 225 - 31 self |

### BibTeX

@MISC{Bruckstein07fromsparse,

author = {Alfred M. Bruckstein and David L. Donoho and Michael Elad},

title = {From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images },

year = {2007}

}

### OpenURL

### Abstract

A full-rank matrix A ∈ IR n×m with n < m generates an underdetermined system of linear equations Ax = b having infinitely many solutions. Suppose we seek the sparsest solution, i.e., the one with the fewest nonzero entries: can it ever be unique? If so, when? As optimization of sparsity is combinatorial in nature, are there efficient methods for finding the sparsest solution? These questions have been answered positively and constructively in recent years, exposing a wide variety of surprising phenomena; in particular, the existence of easily-verifiable conditions under which optimally-sparse solutions can be found by concrete, effective computational methods. Such theoretical results inspire a bold perspective on some important practical problems in signal and image processing. Several well-known signal and image processing problems can be cast as demanding solutions of undetermined systems of equations. Such problems have previously seemed, to many, intractable. There is considerable evidence that these problems often have sparse solutions. Hence, advances in finding sparse solutions to underdetermined systems energizes research on such signal and image processing problems – to striking effect. In this paper we review the theoretical results on sparse solutions of linear systems, empirical

### Citations

5035 |
Matrix Analysis
- Horn, Johnson
- 1985
(Show Context)
Citation Context ...| ≤µ : 1 ≤ k, j ≤ m, k ̸= j} . Consider an arbitrary minor from G of size p × p, built by choosing a subgroup of p columns from Ã and computing their sub-Gram matrix. From the Gershgorin disk theorem =-=[91]-=-, if this minor is diagonally dominant—i.e., if ∑ j̸=i |Gi,j| < |Gi,i| for every i—then this submatrix of G is positive definite, and so those p columns from Ã are linearly independent. The condition ... |

2287 |
A wavelet tour of signal processing
- Mallat
- 1999
(Show Context)
Citation Context ...e then processed to produce the encoded bitstream. JPEG relies on the discrete cosine transform (DCT)—a variant of the Fourier transform—while JPEG-2000 relies on the discrete wavelet transform (DWT) =-=[116]-=-. These transforms can be viewed analytically as rotations of coordinate axes from the standard Euclidean basis to a new basis. Why does it make sense to change coordinates in this way? Sparsity provi... |

1866 | Compressed sensing
- Donoho
- 2006
(Show Context)
Citation Context ...tries, rather than y, which has n. Attempt recovery by solving min ‖x‖0 subject to ‖c − PAx‖2 ≤ ɛ x to obtain the sparse representation and then synthesizing an approximate reconstruction using Axɛ 0 =-=[18, 15, 17, 58, 42]-=-. • Morphological Component Analysis (MCA). Suppose that the observed signal is a superposition of two different subsignals y1, y2 (i.e., y = y1 + y2), where y1 is sparsely generated using model M1 an... |

1779 | Atomic decomposition by basis pursuit
- Chen
- 1998
(Show Context)
Citation Context ...l local minimum will actually be a good approximation to a global minimum of (P0). Another strategy is to replace the ℓ0 norm by the ℓ1 norm, which is, in a natural sense, its best convex approximant =-=[24, 25, 142]-=-; many optimization tools are available “off the shelf” for solving (P1). Turning from (P0) to its regularizations (Pp) with 0 <p≤ 1, care must be taken with respect to normalization of the columns in... |

1754 |
Vector Quantization and Signal Compression
- Gersho, Gray
- 1992
(Show Context)
Citation Context ...tation coefficients to be binary (1 or 0), the above-posed problem reduces to a clustering task. Furthermore, in such a case the above training algorithms simplify to the well-known K-means algorithm =-=[81]-=-. While each iteration of K-means computes means over K different subsets, the K-SVD algorithm performs the SVD over each of K different submatrices, hence the name K-SVD (K is assumed to be the numbe... |

1509 |
Nonlinear total variation based noise removal algorithms
- Rudin, Osher, et al.
- 1992
(Show Context)
Citation Context ... + λ‖Ψ T y‖1. We now step out of the Bayesian framework and interpret these optimization criteria as algorithm generators. We recognize that the first generates a variant of total-variation denoising =-=[141, 41]-=-, while the second generates an instance of wavelet denoising—both very successful algorithms with hundreds of application papers in print. The frequent success of these algorithms causes difficulties... |

1401 | Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information - Candès, Romberg, et al. |

1136 | Matching pursuit with time-frequency dictionnaries
- Mallat, Zhang
- 1993
(Show Context)
Citation Context ...fficult to evaluate as solving (P0). Thus, simpler ways to guarantee uniqueness are of interest. A very simple way exploits the mutual coherence of the matrix A, defined as follows. Definition 3 (see =-=[118, 49, 46]-=-). The mutual coherence of a given matrix A is the largest absolute normalized inner product between different columns from A. Denoting the kth column in A by ak, the mutual coherence is given by ∣ ∣ ... |

1083 |
Learning the parts of objects by non-negative matrix factorization
- Lee, Seung
- 1999
(Show Context)
Citation Context ...he matrix Y as AX, where A and X have the indicated shapes and X has sparse columns. The matrix factorization viewpoint connects this problem with related problems of nonnegative matrix factorization =-=[104, 55]-=- and sparse nonnegative matrix factorization [92, 1]. Clearly, there is no general practical algorithm for solving problem (52) or (53), for the same reasons that there is no general practical algorit... |

991 |
Emergence of simple-cell receptive field properties by learning a sparse code for natural images
- Field, Olshausen
- 1996
(Show Context)
Citation Context ...own model M {A,k0,α,ɛ}. Can this training database allow us to identify the generating model, specifically the dictionary A? This rather difficult problem was studied initially by Olshausen and Field =-=[128, 126, 127]-=-, who were motivated by an analogy between the atoms of a dictionary and the population of simple Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.68 ALFRED M. BRUCKSTEIN,... |

909 | Ideal spatial adaptation by wavelet shrinkage, Biometrika 81
- Donoho, Johnstone
- 1994
(Show Context)
Citation Context ...n is key to widely used techniques of transformbased image compression. Transform sparsity is also a driving factor for other important signal and image processing problems, including image denoising =-=[50, 51, 27, 43, 53, 52, 144, 124, 96]-=- and image deblurring [76, 75, 74, 41]. Repeatedly, it has been shown that a better representation technique—one that leads to more sparsity—can be the basis for a practically better solution to such ... |

896 | De-noising by soft Thresholding
- Donoho, L
- 1995
(Show Context)
Citation Context ...n is key to widely used techniques of transformbased image compression. Transform sparsity is also a driving factor for other important signal and image processing problems, including image denoising =-=[50, 51, 27, 43, 53, 52, 144, 124, 96]-=- and image deblurring [76, 75, 74, 41]. Repeatedly, it has been shown that a better representation technique—one that leads to more sparsity—can be the basis for a practically better solution to such ... |

888 | Near-optimal signal recovery from random projections: Universal encoding strategies - Candès, Tao |

803 |
Stable signal recovery from incomplete and inaccurate measurements
- Candès, Romberg, et al.
(Show Context)
Citation Context ...ilarity of the coherence-based results for the two methods. A great deal of algorithmic progress was made while this paper was in review and revision. We mention only two examples. Candès and Romberg =-=[14]-=- have developed a fast approximate ℓ1 solver using projections onto convex sets. Stephen Boyd and coworkers [100] have found a way to speed up standard interior-point methods so that, when the solutio... |

705 | Decoding by linear programming
- Candès, Tao
- 2005
(Show Context)
Citation Context ...k ≈ r(δ)n for an unspecified function r > 0. Candès and Tao considered random Gaussian matrices and were able to show that k ≤ r(m/n)n was sufficient for equivalence for a certain explicit function r =-=[18]-=-. These qualitative results opened the way to asking for the precise quantitative behavior, i.e., for ρW above. • Tropp, Gilbert, and coworkers [158] studied running OMP over the problem suite consist... |

641 | Sparse coding with an overcomplete basis set: A strategy employed by v1?” Vision Research
- Olshausen, Field
- 1997
(Show Context)
Citation Context ...own model M {A,k0,α,ɛ}. Can this training database allow us to identify the generating model, specifically the dictionary A? This rather difficult problem was studied initially by Olshausen and Field =-=[128, 126, 127]-=-, who were motivated by an analogy between the atoms of a dictionary and the population of simple Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.68 ALFRED M. BRUCKSTEIN,... |

559 | Greed is good: Algorithmic results for sparse approximation
- Tropp
(Show Context)
Citation Context ...obtained by standard optimization tools—in fact, linear programming. Even more surprisingly, for the same class A, some very simple greedy algorithms (GAs) can also find the sparsest solution to (P0) =-=[156]-=-. Today many pure and applied mathematicians are pursuing results concerning sparse solutions to underdetermined systems of linear equations. The results achieved so far range from identifying conditi... |

462 | Shiftable Multi-scale Transforms
- Simoncelli, Freeman, et al.
- 1992
(Show Context)
Citation Context ... A. How can we wisely choose A to perform well on the signals we have in mind? One line of work considered choosing preconstructed dictionaries, such as undecimated wavelets [149], steerable wavelets =-=[145, 37, 136]-=-, contourlets [38, 39, 40, 70, 71], curvelets [146, 12], and others [22, 123]. These are generally suitable for stylized “cartoon-like” image content, assumed to be piecewise smooth and with smooth bo... |

460 | An iterative thresholding algorithm for linear inverse problems with a sparsity constraint - Daubechies, Defrise, et al. |

437 | K-SVD: An algorithm for designing overcomplete dictionaries for sparse representations - Aharon, Elad, et al. - 2006 |

394 | Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization
- Donoho, Elad
(Show Context)
Citation Context ...=J1(x) =‖x‖1. This problem is somehow intermediate between (P2) and (P0). It is a convex optimization problem, and among convex problems it is in some sense the one closest to (P0). We will see below =-=[49, 93, 46]-=- that for matrices A with incoherent columns, whenever (P0) has a sufficiently sparse solution, that solution is unique and is equal to the solution of (P1). Since (P1) is convex, the solution can thu... |

386 | Image denoising using scale mixtures of gaussians in the wavelet domain
- Portilla, Strela, et al.
(Show Context)
Citation Context ... A. How can we wisely choose A to perform well on the signals we have in mind? One line of work considered choosing preconstructed dictionaries, such as undecimated wavelets [149], steerable wavelets =-=[145, 37, 136]-=-, contourlets [38, 39, 40, 70, 71], curvelets [146, 12], and others [22, 123]. These are generally suitable for stylized “cartoon-like” image content, assumed to be piecewise smooth and with smooth bo... |

381 | Uncertainty principles and ideal atomic decomposition
- Donoho, Huo
(Show Context)
Citation Context ...rinciple was true; while the uncertainty principle guarantees that the number of nonzeros in the combined time-frequency analysis must exceed √ n; in fact, the typical number is closer to n. Also, in =-=[49]-=- simulations very much like those reported above in section 3.3.1 were presented to show that the equivalence between ℓ1 and ℓ0 representations is typical at surprisingly weak levels of sparsity; in f... |

369 | Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition
- Pati, Rezaiifar, et al.
- 1993
(Show Context)
Citation Context ...ns why this type of algorithm has earned the name “greedy algorithm” in approximation theory. Many variants on this algorithm are available, offering improvements in accuracy or in complexity or both =-=[118, 34, 33, 23, 130, 30, 159, 82]-=-. This family of GAs is well known and extensively used, and, in fact, these algorithms have been reinvented in various fields. In the setting of statistical modeling, greedy stepwise least squares is... |

362 | 2004) For Most Large Underdetermined Systems of Linear Equations the Minimal l1 norm Solution is also the sparsest Solution. URL : http://stat.stanford.edu/˜donoho/Reports/2004
- Donoho
(Show Context)
Citation Context ... 0% success. As the problem size increases, the transition from typicality of success to typicality of failure becomes increasingly sharp—in the large-n limit, perfectly sharp. A rigorous result from =-=[44, 56, 57]-=- explains the meaning of the curve in panel (a). Theorem 11. Fix a (δ, ρ) pair.At problem size n, set mn = ⌊n/δ⌋ and kn = ⌊nρ⌋.Draw a problem instance y = Ax at random with A an n × mn matrix from the... |

349 |
JPEG 2000: Image Compression Fundamentals, Standards and Practice
- Taubman, Marcellin
- 2002
(Show Context)
Citation Context ...a can be fruitfully viewed as finding sparse solutions to underdetermined systems of linear equations. Many readers will be familiar with the media encoding standard JPEG and its successor, JPEG-2000 =-=[153]-=-. Both standards are based on the notion of transform encoding. The data vector representing the raw pixel samples are transformed— i.e., represented in a new coordinate system—and the resulting coord... |

334 |
Sparse approximate solutions to linear systems
- Natarajan
- 1995
(Show Context)
Citation Context ... columns of A with indices in S, and checking whether b = ASxS can be solved. The complexity of exhaustive search is exponential in m and, indeed, it has been proven that (P0) is, in general, NP-hard =-=[125]-=-. Thus, a mandatory and crucial set of questions arises: Can (P0) be efficiently solved by some other means? Can approximate solutions be accepted? How accurate can those be? What kind of approximatio... |

330 |
The contourlet transform: an efficient directional multiresolution image representation
- Do, Vetterli
(Show Context)
Citation Context ... A to perform well on the signals we have in mind? One line of work considered choosing preconstructed dictionaries, such as undecimated wavelets [149], steerable wavelets [145, 37, 136], contourlets =-=[38, 39, 40, 70, 71]-=-, curvelets [146, 12], and others [22, 123]. These are generally suitable for stylized “cartoon-like” image content, assumed to be piecewise smooth and with smooth boundaries. Some of these papers pro... |

324 |
Image compression through wavelet transform coding
- DeVore, Jawerth, et al.
- 1992
(Show Context)
Citation Context ...ld in such cases seek a better prior. Careful empirical modeling of wavelet coefficients of images with edges has shown that, in many cases, the prior model p(y) ∝ exp(−λ‖Ty‖1) can indeed be improved =-=[35, 144, 10]-=-. The Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.SPARSE MODELING OF SIGNALS AND IMAGES 63 general form p(y) ∝ exp(−λ‖Ty‖r r) with 0 <r<1 has been studied, and values... |

324 | Signal recovery from random measurements via orthogonal matching pursuit
- Tropp, Gilbert
- 2007
(Show Context)
Citation Context ...t for equivalence for a certain explicit function r [18]. These qualitative results opened the way to asking for the precise quantitative behavior, i.e., for ρW above. • Tropp, Gilbert, and coworkers =-=[158]-=- studied running OMP over the problem suite consisting of the Gaussian matrix ensemble and k-sparse coefficients solution; they showed that the sparsest solution is found with high probability provide... |

322 | Non-negative matrix factorization with sparseness constraints
- Hoyer
- 2004
(Show Context)
Citation Context ...apes and X has sparse columns. The matrix factorization viewpoint connects this problem with related problems of nonnegative matrix factorization [104, 55] and sparse nonnegative matrix factorization =-=[92, 1]-=-. Clearly, there is no general practical algorithm for solving problem (52) or (53), for the same reasons that there is no general practical algorithm for solving (P0), only more so! However, just as ... |

319 | Stable recovery of sparse overcomplete representations in the presence of noise
- Donoho, Elad, et al.
(Show Context)
Citation Context ...esponds to the OMP algorithm as described in Exhibit 1, and the other for BP (i.e., solving (P1) in place of (P0)). 2.3.1. The GA Solves (P0) in Sufficiently Sparse Cases. Theorem 6 (equivalence: OGA =-=[156, 48]-=-). For a system of linear equations Ax = b (A ∈ Rn×m full-rank with n<m), if a solution x exists obeying (9) ‖x‖0 < 1 ( 1+ 2 1 ) , µ(A) an OGA run with threshold parameter ɛ0 =0is guaranteed to find i... |

310 |
The Volume of Convex Bodies and Banach Space Geometry
- Pisier
- 1989
(Show Context)
Citation Context ...of (P1) and (P0) and in [158] in studying OMP. Other fundamental ideas include Kashin’s results on n-widths of the octahedron [98], Milman’s quotient of a subspace theorem, and Szarek’s volume bounds =-=[132]-=-, all reflecting the miraculous properties of ℓ1 norms when restricted to random subspaces, which lie at the heart of the (P0)–(P1) equivalence. Rudelson and Vershynin [140] have made very effective u... |

310 | The curvelet transform for image denoising
- Starck, s, et al.
- 2002
(Show Context)
Citation Context ... coworkers [148, 147], where the image Barbara is decomposed into piecewise smooth (cartoon) and texture, using MCA as described above. They used a dictionary combining two representations: curvelets =-=[146, 12, 13]-=- for representing the cartoon part, and local overlapped DCT for the texture. The second row in this figure, taken from [68], presents inpainting results, where missing values (the text) are recovered... |

307 |
Image denoising via sparse and redundant representations over learned dictionaries
- Elad, Aharon
(Show Context)
Citation Context ...f results from JPEG, JPEG2000, PCA, and sparse coding with K-SVD dictionary training. The values below each result show the PSNR. 6.2.2. Methodology and Algorithms. The denoising methods described in =-=[63, 64]-=- take a different approach: by training a dictionary on the image content directly. One option is to use a standard library of clean images, e.g., the Corel library of 60,000 images, and develop a sta... |

284 | Learning overcomplete representations - Lewicki, Sejnowski - 2000 |

275 | Good quantum error correcting codes exist
- Calderbank, Shor
- 1996
(Show Context)
Citation Context ...nformation theory, constructing error-correcting codes using a collection of orthogonal bases with minimal coherence, obtaining similar bounds on the mutual coherence for amalgams of orthogonal bases =-=[11]-=-. Mutual coherence, relatively easy to compute, allows us to lower bound the spark, which is often hard to compute. Lemma 4 (see [46]). For any matrix A ∈ R n×m , the following relationship holds: (7)... |

271 | New tight frames of curvelets and optimal representations of objects with C2 singularities
- Candès, Donoho
(Show Context)
Citation Context ...ery powerful sparsity constraint. The weak ℓp norm is a popular measure of sparsity in the mathematical analysis community; models of cartoon images have sparse representations as measured in weak ℓp =-=[26, 13]-=-. Almost equivalent are the usual ℓ p norms, defined by ( ∑ ‖x‖p = |xi| p i ) 1/p These will seem more familiar objects than the weak ℓp norms, in the range 1 ≤ p ≤∞; however, for measuring sparsity, ... |

265 | Minimax estimation via wavelet shrinkage
- Donoho, Johnstone
- 1998
(Show Context)
Citation Context ...n is key to widely used techniques of transformbased image compression. Transform sparsity is also a driving factor for other important signal and image processing problems, including image denoising =-=[50, 51, 27, 43, 53, 52, 144, 124, 96]-=- and image deblurring [76, 75, 74, 41]. Repeatedly, it has been shown that a better representation technique—one that leads to more sparsity—can be the basis for a practically better solution to such ... |

257 | Adaptive wavelet thresholding for image denoising and compression
- Chang, Yu, et al.
(Show Context)
Citation Context ...resent edges than do Fourier and wavelet methods. By shrinkage of transform coefficients followed by reconstruction, some reduction in image noise is observed, while edges are approximately preserved =-=[103, 19, 20, 21, 146, 136, 70, 71, 88, 89]-=-. Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.SPARSE MODELING OF SIGNALS AND IMAGES 73 Original JPEG (26.59dB) JPEG-2000 (27.81dB) PCA (29.27dB) K-SVD (33.26dB) Origi... |

256 | Wavelet shrinkage: Asymptopia - Donoho, Johnstone, et al. - 1995 |

252 | An EM algorithm for wavelet-based image restoration
- Figueiredo, Nowak
- 2003
(Show Context)
Citation Context ...ompression. Transform sparsity is also a driving factor for other important signal and image processing problems, including image denoising [50, 51, 27, 43, 53, 52, 144, 124, 96] and image deblurring =-=[76, 75, 74, 41]-=-. Repeatedly, it has been shown that a better representation technique—one that leads to more sparsity—can be the basis for a practically better solution to such problems. For example, it has been fou... |

244 |
Sparse MRI: The application of compressed sensing for rapid
- Lustig, Donoho, et al.
- 2007
(Show Context)
Citation Context ...ures 5 and 6 two worked out large-scale applications. Figure 5 presents compressed sensing of dynamic MRI—real-time acquisition of heart motion—by Michael Lustig and coworkers at the Stanford MRI lab =-=[112, 111]-=-. They obtain a successful reconstruction of moving imagery of the beating heart from raw pseudorandom samples of the k-t space, with a factor of 7 undersampling, i.e., they solve a system of equation... |

242 | Translation-invariant de-noising
- Coifman, Donoho
- 1995
(Show Context)
Citation Context |

235 | Sparse signal reconstruction from limited data using FOCUSS: A re-weighted minimum norm algorithm
- Gorodnitsky, Rao
- 1997
(Show Context)
Citation Context ...k, as it calls for a combinatorial search over all possible subsets of columns from A. The importance of this property of matrices for the study of the uniqueness of sparse solutions was unraveled in =-=[84]-=-. Interestingly, this property previously appeared in the literature of psychometrics (termed Kruskal rank), used in the context of studying uniqueness of tensor decomposition [102, 110]. The spark is... |

210 | source separation by sparse decomposition in a signal dictionary
- Blind
- 2001
(Show Context)
Citation Context ...Such source separation problems are fundamental in the processing of acoustic signals, for example, in the separation of speech from impulsive noise by independent component analysis (ICA) algorithms =-=[94, 164, 109]-=-. Turning to the signal model presented here, if we could solve min x1,x2 ‖x1‖0 + ‖x2‖0 subject to ‖y − A1x1 − A2x2‖ 2 2 ≤ ɛ 2 1 + ɛ 2 2, the resulting solution (x ɛ 1, x ɛ 2) would generate a plausib... |

209 | Noise Removal via Bayesian Wavelet Coring
- Simoncelli, Adelson
- 1996
(Show Context)
Citation Context |

207 | Image Compression via Joint Statistical Characterization
- Buccigrossi, Simoncelli
- 1999
(Show Context)
Citation Context ...ld in such cases seek a better prior. Careful empirical modeling of wavelet coefficients of images with edges has shown that, in many cases, the prior model p(y) ∝ exp(−λ‖Ty‖1) can indeed be improved =-=[35, 144, 10]-=-. The Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.SPARSE MODELING OF SIGNALS AND IMAGES 63 general form p(y) ∝ exp(−λ‖Ty‖r r) with 0 <r<1 has been studied, and values... |

195 | A Wavelet Tour - Mallat - 1999 |

190 | Spatially adaptive wavelet thresholding with context modeling for image denoising - Chang, Yu, et al. - 2000 |