## Just relax: Convex programming methods for subset selection and sparse approximation (2004)

### Cached

### Download Links

- [www.ices.utexas.edu]
- [users.cms.caltech.edu]
- CiteULike

### Other Repositories/Bibliography

Citations: | 90 - 4 self |

### BibTeX

@TECHREPORT{Tropp04justrelax:,

author = {Joel A. Tropp},

title = {Just relax: Convex programming methods for subset selection and sparse approximation},

institution = {},

year = {2004}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. Subset selection and sparse approximation problems request a good approximation of an input signal using a linear combination of elementary signals, yet they stipulate that the approximation may only involve a few of the elementary signals. This class of problems arises throughout electrical engineering, applied mathematics and statistics, but small theoretical progress has been made over the last fifty years. Subset selection and sparse approximation both admit natural convex relaxations, but the literature contains few results on the behavior of these relaxations for general input signals. This report demonstrates that the solution of the convex program frequently coincides with the solution of the original approximation problem. The proofs depend essentially on geometric properties of the ensemble of elementary signals. The results are powerful because sparse approximation problems are combinatorial, while convex programs can be solved in polynomial time with standard software. Comparable new results for a greedy algorithm, Orthogonal Matching Pursuit, are also stated. This report should have a major practical impact because the theory applies immediately to many real-world signal processing problems. 1.

### Citations

4708 |
Matrix analysis
- Horn, Johnson
- 1990
(Show Context)
Citation Context ...mplicitly by Gilbert, Muthukrishnan, and Strauss in [GMS03]. The present result was first published by Donoho and Elad [DE03]. Proof. Consider the Gram matrix G = ΦΛ ∗ ΦΛ. The Gerˇsgorin Disc Theorem =-=[HJ85]-=- states that every eigenvalue of G lies in one of the m discs � def = z : |Gλλ − z| ≤ � � for each λ in Λ. ∆λ ω�=λ |Gλω| The normalization of the atoms implies that Gλλ ≡ 1. Meanwhile, the sum is boun... |

3736 |
Convex optimization
- oyd, Vandenberghe
- 2004
(Show Context)
Citation Context ...tion approach replaces the nonconvex sparsity measure with a related convex function to obtain a convex programming problem. The convex program can be solved in polynomial time with standard software =-=[BV04]-=-, and one expects that it will yield a good sparse approximation. Much more will be said in the sequel. (2) Greedy methods make a sequence of locally optimal choices in an effort to produce a good glo... |

3296 | Convex Analysis
- ROCKAFELLAR
- 1970
(Show Context)
Citation Context ...subdifferential is additive, viz. ∂(f1 + f2)(z) = ∂f1(z) + ∂f2(z). It is straightforward (but tedious) to verify that the complex subdifferential satisfies all the properties of real subdifferentials =-=[Roc70]-=-.sCONVEX PROGRAMMING METHODS FOR SPARSE APPROXIMATION 23 Lemma 4.6. Suppose that the vector b⋆ minimizes the objective function (4.1) over all coefficient vectors supported on Λ. A necessary and suffi... |

2068 |
A Wavelet Tour of Signal Processing
- Mallat
- 1998
(Show Context)
Citation Context ... dictionary is orthonormal, one may solve the subset selection problem by applying a hard threshold operator (see Figure 7) with cutoff τ to each coefficient in the orthogonal expansion of the signal =-=[Mal99]-=-. In effect, one retains every atom whose inner product with the signal is larger than τ and discards the rest. This heuristic is nearly correct even if the dictionary is not orthonormal. Proposition ... |

1881 | Regression shrinkage and selection via the lasso
- Tibshirani
- 1996
(Show Context)
Citation Context ...sis and γ is related to the variance of the noise. Slightly later, Tibshirani proposed that (2.5), which he calls the lasso, could be used to solve subset selection problems in the stochastic setting =-=[Tib96]-=-. From here, it is only a short step to Basis Pursuit and Basis Pursuit de-noising [CDS99]. This history could not be complete without mention of parallel developments in the theoretical computer scie... |

1688 | Atomic decomposition by basis pursuit
- Chen, Donoho, et al.
- 1999
(Show Context)
Citation Context ...lysis of audio [GB03], images [FViVK04] and video [NZ03]. Sparsity criteria also arise in deconvolution [TBM79], signal modeling [Ris79], pre-conditioning [GH97], machine learning [Gir98], de-noising =-=[CDS99]-=- and regularization [DDM03]. Most sparse approximation problems employ a linear model in which the collection of elementary signals is both linearly dependent and large. These models are often called ... |

1665 |
Vector Quantization and Signal Compression
- Gersho, Gary
- 1991
(Show Context)
Citation Context ...e. An N-point optimal codebook for quantizing dx is a set that solves the mathematical program min |Y | = N quant(Y ). Optimal quantization is a difficult problem, and it has been studied extensively =-=[GG92]-=-. Heuristic methods are available for constructing good codebooks [Llo57, Csi84, Ros98]. Suppose that we define a probability measure dν on the projective space Pd−1 (C). The expected error in quantiz... |

1281 |
Combinatorial Optimization: Algorithms and Complexity
- Papadimitriou, Steiglitz
- 1982
(Show Context)
Citation Context ...h continuous convex programming problems. In particular, the problem of determining the maximum value that an affine function attains at some vertex of a polytope can be solved using a linear program =-=[PS98]-=-. A major theme in modern computer science is that many other combinatorial problems can be solved approximately by means of a convex relaxation. For example, a celebrated paper of Goemans and William... |

1174 |
Modeling by shortest data description
- Rissanen
- 1978
(Show Context)
Citation Context ...has become interested in sparse representations for compression and analysis of audio [GB03], images [FViVK04] and video [NZ03]. Sparsity criteria also arise in deconvolution [TBM79], signal modeling =-=[Ris79]-=-, pre-conditioning [GH97], machine learning [Gir98], de-noising [CDS99] and regularization [DDM03]. Most sparse approximation problems employ a linear model in which the collection of elementary signa... |

1059 | Matching pursuits with time-frequency dictionaries - Mallat, Zhang - 1993 |

942 | Improved approximation algorithms for maximum cut and satis problems using semide programming
- Goemans, Williamson
- 1995
(Show Context)
Citation Context ...s of a convex relaxation. For example, a celebrated paper of Goemans and Williamson proves that a certain convex program can be used to produce a graph cut whose weight exceeds 87% of the maximum cut =-=[GW95]-=-. The present work draws deeply on the fundamental idea that a combinatorial problem and its convex relaxation often have closely related solutions. 3. Dictionaries, Matrices and Geometry The study of... |

932 |
Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature
- Olshausen
- 1996
(Show Context)
Citation Context ...nverse problems [DDM03]. Most intriguing, perhaps, Olshausen and Field have argued that the mammalian visual cortex may solve similar minimization problems to produce sparse representations of images =-=[OF96]-=-. This paper asks what relationship, if any, a solution to the convex relaxation (2.6) might share with a solution to the subset selection problem (2.5). We shall see that, if the dictionary is incohe... |

640 | A 1998 Nonlinear approximation
- DeVore
(Show Context)
Citation Context ...rists have also studied the asymptotic error rate of greedy approximations. Most of these results are not immediately relevant to the problems we have stated here. See the superb monographs of DeVore =-=[DeV98]-=- and Temlyakov [Tem02] for an introduction to this literature. 2.8. A Brief History. I believe that the ascendance of convex relaxation for sparse approximation was propelled by two theoretical-techno... |

534 | Greed is Good: Algorithmic Results for Sparse Approximation
- Tropp
- 2004
(Show Context)
Citation Context ...s finding the sparsest exact representation of the input signal. It is somewhat academic because the signals that can be represented with fewer than d atoms form a set of Lebesgue measure zero in C d =-=[Tro03a]-=-. The obvious convex relaxation of (2.7) is min b ∈ CΩ �b�1 subject to �s − Φ b�2 ≤ δ. (2.8) Since (2.6) requests the minimizer of a convex function over a convex set, we may apply standard mathematic... |

494 | Entropy-based algorithms for best basis selection,” Information Theory - Coifman, Wickerhauser - 1992 |

430 |
An iterative thresholding algorithm for linear inverse problems with a sparsity constraint
- Daubechies, Defriese, et al.
(Show Context)
Citation Context ...es [FViVK04] and video [NZ03]. Sparsity criteria also arise in deconvolution [TBM79], signal modeling [Ris79], pre-conditioning [GH97], machine learning [Gir98], de-noising [CDS99] and regularization =-=[DDM03]-=-. Most sparse approximation problems employ a linear model in which the collection of elementary signals is both linearly dependent and large. These models are often called redundant or overcomplete. ... |

360 | Uncertainty principles and ideal atomic decomposition
- Donoho, Huo
- 2001
(Show Context)
Citation Context ... with slightly higher coherence [GMS03].s10 J. A. TROPP The coherence parameter of a dictionary was mentioned as a quantity of heuristic interest in [DMA97], but the first formal treatment appears in =-=[DH01]-=-. It is also related to an eponymous concept from the geometry of numbers [Yap00]. The concept of cumulative coherence was developed independently in [DE03, Tro03a]. In Section 3.10, we shall suggest ... |

358 | P.: Orthogonal matching pursuit : recursive function approximation with application to wavelet decomposition - Pati, Rezaiifar, et al. - 1993 |

321 |
Sparse approximate solutions to linear systems
- Natarajan
- 1995
(Show Context)
Citation Context ...term in the linear combination whenever the approximation is evaluated—perhaps trillions of times. Therefore, one may wish to maximize the sparsity of the approximation subject to an error constraint =-=[Nat95]-=-.sCONVEX PROGRAMMING METHODS FOR SPARSE APPROXIMATION 7 Suppose that s is an arbitrary input signal, and fix an error level ε. The sparse approximation problem we have described may be stated as min c... |

303 | Stable recovery of sparse overcomplete representations in the presence of noise
- Donoho, Elad, et al.
- 2006
(Show Context)
Citation Context ... approximation problem, just try to relax. After the first draft of this report had been completed, it came to my attention that Donoho, Elad and Temlyakov had been studying some of the same problems =-=[DET04]-=-. Their methods and results have a rather different flavor, and their report will certainly generate significant interest on its own. I would like to ensure that the reader is aware of their investiga... |

262 | Learning overcomplete representations - Lewicki, Sejnowski - 2000 |

250 | Deterministic annealing for clustering, compression, classification, regression, and related optimization problems - Rose - 1998 |

247 | Minimax estimation via wavelet shrinkage - Donoho, Johnstone - 1998 |

238 |
Subset Selection in Regression
- Miller
- 1990
(Show Context)
Citation Context ...tes the first example in a 1907 paper of Schmidt [Sch07]. In the 1950s, statisticians launched an extensive investigation of another sparse approximation problem called subset selection in regression =-=[Mil02]-=-. Later, approximation theorists began a systematic study of m-term approximation with respect to orthonormal bases and redundant systems [DeV98, Tem02]. Date: 14 February 2004. Revised 10 April 2004.... |

226 | Introductory functional analysis with applications - Kreyszig - 1978 |

215 |
Sparse representations in unions of bases
- Gribonval, Nielsen
(Show Context)
Citation Context ...ter in terms of the dimension d and the number of atoms N [SH03]: � 0 N ≤ d µ ≥ N > d. � N−d d (N−1) If N > d and the dictionary contains an orthonormal basis, the lower bound increases to µ ≥ 1/ √ d =-=[GN03b]-=-. Although the coherence exhibits a quantum jump as soon as the number of atoms exceeds the dimension, it is possible to construct very large dictionaries with low coherence. When d = 2 k , Calderbank... |

203 | An equivalence between sparse approximation and support vector machines
- Girosi
(Show Context)
Citation Context ... compression and analysis of audio [GB03], images [FViVK04] and video [NZ03]. Sparsity criteria also arise in deconvolution [TBM79], signal modeling [Ris79], pre-conditioning [GH97], machine learning =-=[Gir98]-=-, de-noising [CDS99] and regularization [DDM03]. Most sparse approximation problems employ a linear model in which the collection of elementary signals is both linearly dependent and large. These mode... |

185 | Parallel preconditioning with sparse approximate inverses
- Grote, Huckle
- 1997
(Show Context)
Citation Context ...parse representations for compression and analysis of audio [GB03], images [FViVK04] and video [NZ03]. Sparsity criteria also arise in deconvolution [TBM79], signal modeling [Ris79], pre-conditioning =-=[GH97]-=-, machine learning [Gir98], de-noising [CDS99] and regularization [DDM03]. Most sparse approximation problems employ a linear model in which the collection of elementary signals is both linearly depen... |

174 | A generalized uncertainty principle and sparse representation in pairs of bases
- Elad, Bruckstein
(Show Context)
Citation Context ...s sufficient for a convex relaxation method, Basis Pursuit (BP), to accomplish the same feat [Tro03a]. Independently, Fuchs exhibited a slightly weaker sufficient condition for BP in the real setting =-=[Fuc02]-=-. His work was extended to the complex setting in [Tro03b]. Gribonval and Nielsen have established a necessary and sufficient condition for BP to recover every signal over Λ, but they have provided no... |

164 | Uncertainty principles and signal recovery - Donoho, Stark - 1989 |

160 | On sparse representations in arbitrary redundant bases
- Fuchs
(Show Context)
Citation Context ...s sufficient for a convex relaxation method, Basis Pursuit (BP), to accomplish the same feat [Tro03a]. Independently, Fuchs exhibited a slightly weaker sufficient condition for BP in the real setting =-=[Fuc02]-=-. His work was extended to the complex setting in [Tro03b]. Gribonval and Nielsen have established a necessary and sufficient condition for BP to recover every signal over Λ, but they have provided no... |

159 | Orthogonal least squares methods and their application to non-linear system identification - Chen, Billings, et al. - 1989 |

153 | Grassmannian frames with applications to coding and communication
- Strohmer, Heath
(Show Context)
Citation Context ...ferences are non-positive. (3) For an orthonormal basis, µ1(m) = 0 for each m. It is possible to develop a lower bound on the coherence parameter in terms of the dimension d and the number of atoms N =-=[SH03]-=-: � 0 N ≤ d µ ≥ N > d. � N−d d (N−1) If N > d and the dictionary contains an orthonormal basis, the lower bound increases to µ ≥ 1/ √ d [GN03b]. Although the coherence exhibits a quantum jump as soon ... |

140 |
Fundamental Problems of Algorithmic Algebra
- Yap
- 2000
(Show Context)
Citation Context ...of a dictionary was mentioned as a quantity of heuristic interest in [DMA97], but the first formal treatment appears in [DH01]. It is also related to an eponymous concept from the geometry of numbers =-=[Yap00]-=-. The concept of cumulative coherence was developed independently in [DE03, Tro03a]. In Section 3.10, we shall suggest geometric interpretations of both quantities. 3.3. Operator Norms. One of the mos... |

123 | On basis pursuit - Chen, Donoho - 1994 |

94 | Some remarks on greedy algorithms - Devore, Temlyakov - 1996 |

86 | Packing lines, planes, etc.: packings in Grassmannian spaces, Experiment
- Conway, Hardin, et al.
- 1996
(Show Context)
Citation Context ...ave unit norm, dist(z, w) = � 1 − |〈z, w〉| 2 . Evidently, the distance between two lines ranges between zero and one. Equipped with this metric, P d−1 (C) forms a smooth, compact, Riemannian manifold =-=[CHS96]-=-. 3.9. Minimum Distance, Maximum Correlation. We view the dictionary D as a finite set of lines in the projective space P d−1 (C). Given an arbitary nonzero signal s, we shall calculate the minimum di... |

81 | An affine scaling methodology for best basis selection
- Rao, Kreutz-Delgado
(Show Context)
Citation Context ..., SO02, Mil02]. (4) Some researchers have developed specialized nonlinear programming software that attempts to solve sparse approximation problems directly using, for example, interior point methods =-=[RKD99]-=-. These techniques are only guaranteed to discover a locally optimal solution. (5) Brute force methods sift through all potential approximations to find the global optimum. Exhaustive searches quickly... |

79 | Block coordinate relaxation methods for nonparametric signal denoising with wavelet dictionaries
- Sardy, Bruce, et al.
- 1998
(Show Context)
Citation Context ...putations [CDS99]. Sardy, Bruce, and Tseng have also written a paper on ands36 J. A. TROPP using cyclic minimization to solve the Basis Pursuit problem when the dictionary has a block-basis structure =-=[SBT00]-=-. Starck, Donoho, and Candès have proposed an iterative method for solving (A.1) when the dictionary has block-basis structure [SDC03]. Most recently, Daubechies, Defrise, and De Mol have developed an... |

78 | The role of Occam’s Razor in knowledge discovery
- Domingos
- 1999
(Show Context)
Citation Context ...ques, such as branch-and-bound, do not accelerate the hunt significantly enough to be practical [Mil02]. 1 Beware! The antiquity of Occam’s Razor guarantees neither its accuracy nor its applicability =-=[Dom99]-=-.sCONVEX PROGRAMMING METHODS FOR SPARSE APPROXIMATION 3 This paper focuses on the convex relaxation approach. Convex programming has been applied for over three decades to recover sparse representatio... |

76 | Sanov property, generalized I-projection and a conditional limit theorem - Csiszár - 1984 |

76 | Adaptive time-frequency decompositions - Davis, Mallat, et al. - 1994 |

71 |
Zur theorie der linearen und nichtlinearen integralgleichungen - i
- Schmidt
- 1907
(Show Context)
Citation Context ...utions to both problems. ⋆ ⋆ ⋆ ⋆ ⋆ Sparse approximation has been studied for nearly a century, and it has numerous applications. Temlyakov [Tem02] locates the first example in a 1907 paper of Schmidt =-=[Sch07]-=-. In the 1950s, statisticians launched an extensive investigation of another sparse approximation problem called subset selection in regression [Mil02]. Later, approximation theorists began a systemat... |

70 | Approximation of functions over redundant dictionaries using coherence
- Gilbert, Muthukrishnan, et al.
- 2003
(Show Context)
Citation Context ...ains (d+1) orthonormal bases yet retains coherence 1/ √ d [CCKS97]. Gilbert, Muthukrishnan and Strauss have exhibited a method for constructing even larger dictionaries with slightly higher coherence =-=[GMS03]-=-.s10 J. A. TROPP The coherence parameter of a dictionary was mentioned as a quantity of heuristic interest in [DMA97], but the first formal treatment appears in [DH01]. It is also related to an eponym... |

58 | Z4-Kerdock codes, orthogonal spreads, and extremal euclidean line-sets
- Calderbank, Cameron, et al.
- 1997
(Show Context)
Citation Context ...struct very large dictionaries with low coherence. When d = 2 k , Calderbank et al. have produced a striking example of a dictionary that contains (d+1) orthonormal bases yet retains coherence 1/ √ d =-=[CCKS97]-=-. Gilbert, Muthukrishnan and Strauss have exhibited a method for constructing even larger dictionaries with slightly higher coherence [GMS03].s10 J. A. TROPP The coherence parameter of a dictionary wa... |

56 |
Greedy adaptive approximation
- Davis, Mallat, et al.
- 1997
(Show Context)
Citation Context ... a method for constructing even larger dictionaries with slightly higher coherence [GMS03].s10 J. A. TROPP The coherence parameter of a dictionary was mentioned as a quantity of heuristic interest in =-=[DMA97]-=-, but the first formal treatment appears in [DH01]. It is also related to an eponymous concept from the geometry of numbers [Yap00]. The concept of cumulative coherence was developed independently in ... |

56 | Highly sparse representations from dictionaries are unique and independent of the sparseness measure
- Gribonval, Nielsen
- 2003
(Show Context)
Citation Context ...ression would measure diversity as the number of bits necessary to represent the coefficient vector with a certain precision. Gribonval and Nielsen have made some preliminary progress on this problem =-=[GN03a]-=-. Second, one may wish to consider sparse approximation problems with respect to error measures other than the usual Euclidean distance. For example, the uniform norm would promote sparse approximatio... |

55 |
Harmonic decompositions of audio signals with matching pursuit
- Gribonval, Bacry
- 2003
(Show Context)
Citation Context ...al processing community—spurred by the work of Coifman et al. [CM89, CW92] and Mallat et al. [MZ93, DMZ94, DMA97]—has become interested in sparse representations for compression and analysis of audio =-=[GB03]-=-, images [FViVK04] and video [NZ03]. Sparsity criteria also arise in deconvolution [TBM79], signal modeling [Ris79], pre-conditioning [GH97], machine learning [Gir98], de-noising [CDS99] and regulariz... |

55 |
Linear inversion of band-limited reflection seismograms
- antosa, Symes
- 1986
(Show Context)
Citation Context ...M79, LF81, OSL83]. In 1986, Santosa and Symes proposed the convex relaxation (2.5) as a method for recovering sparse spike trains, and they proved that the method succeeds under moderate restrictions =-=[SS86]-=-. Around 1990, the work on ℓ1 criteria in signal processing recycled to the statistics community. Donoho and Johnstone wrote a pathbreaking paper which proved that one could determine a nearly optimal... |

53 | On sparse representation in pairs of bases - Feuer, Nemirovski |