## Sparse Representation For Computer Vision and Pattern Recognition (2009)

Citations: | 47 - 1 self |

### BibTeX

@MISC{Wright09sparserepresentation,

author = {John Wright and Yi Ma and Julien Mairal and Guillermo Sapiro and Thomas Huang and Shuicheng Yan},

title = {Sparse Representation For Computer Vision and Pattern Recognition},

year = {2009}

}

### OpenURL

### Abstract

Techniques from sparse signal representation are beginning to see significant impact in computer vision, often on non-traditional applications where the goal is not just to obtain a compact high-fidelity representation of the observed signal, but also to extract semantic information. The choice of dictionary plays a key role in bridging this gap: unconventional dictionaries consisting of, or learned from, the training samples themselves provide the key to obtaining state-of-theart results and to attaching semantic meaning to sparse signal representations. Understanding the good performance of such unconventional dictionaries in turn demands new algorithmic and analytical techniques. This review paper highlights a few representative examples of how the interaction between sparse signal representation and computer vision can enrich both fields, and raises a number of open questions for further study.

### Citations

4011 |
Convex Optimization
- Boyd, Vandenberghe
- 2004
(Show Context)
Citation Context ...n Figure 1, has a strong tendency to separate the identity of the face (red coefficients) from the error due to corruption or occlusion. Once the ℓ 1 -minimization problem has been solved (see, e.g., =-=[9]-=-, [26], [30]), classification (identifying the subject pictured) or validation (determining if the subject is present in the training database) can proceed by considering how strongly the recovered co... |

2982 | Eigenfaces for Recognition
- Turk, Pentland
- 1991
(Show Context)
Citation Context ...evels of synthetic block occlusion. Figure 2 compares the sparsity-based approach outlined here with several popular methods from the literature 3 : the Principal Component Analysis (PCA) approach of =-=[67]-=-, Independent Component Analysis (ICA) [43], and Local Nonnegative Matrix Factorization (LNMF) [46]. The first provides a standard baseline of comparison, while the latter two methods are more directl... |

2750 | Normalized cuts and image segmentation
- Shi, Malik
- 2000
(Show Context)
Citation Context ...sks such as data clustering, subspace learning, and semi-supervised learning. Popular spectral approaches to clustering start with a graph representing pairwise relationships between the data samples =-=[61]-=-. Manifold learning algorithms such as ISOMAP [63], Locally Linear Embedding (LLE) [58], and Laplacian Eigenmaps (LE) [8], all rely on graphs constructed with different motivations [73]. Moreover, mos... |

2024 | Regression shrinkage and selection via the LASSO
- Tibshirani
- 1996
(Show Context)
Citation Context ...eter has been studied in [35], [39], [55], [65], [79]). As mentioned above, the ℓ1 norm can be used as an approximation to ℓ0 , making the problem convex in A while still encouraging sparse solutions =-=[64]-=-. While for reconstruction we found that the ℓ0 penalty often produces better results, ℓ1 leads to more stable active sets and is preferred for the classification tasks introduced in the next section.... |

1864 | Compressed sensing
- Donoho
- 2006
(Show Context)
Citation Context ... liner combination of a few atoms from such dictionary leads to state-ofthe-art results in image and video restoration and classification. The emerging area of compressed sensing (CS), see [3], [14], =-=[27]-=- and references therein, has shown that sparse signals can be recovered from far fewer samples than required by the classical Shannon-Nyquist Theorem. The samples used in CS correspond to linear proje... |

1775 | Atomic decomposition by basis pursuit
- Chen, Donoho, et al.
- 1999
(Show Context)
Citation Context ...butions to the theory and practice of learning such collections of atoms (usually called dictionaries or codebooks), e.g., [4], [34], [52], and of representing the actual data in terms of them, e.g., =-=[17]-=-, [20], [30], have been developed in recent years, leading to state-ofthe-art results in many signal and image processing tasks [11], [32], [44], [48], [51], [54]. We refer the reader to [10] for a re... |

1708 | Nonlinear dimensionality reduction by locally linear embedding
- Saul, Roweis
- 2000
(Show Context)
Citation Context ...pectral approaches to clustering start with a graph representing pairwise relationships between the data samples [61]. Manifold learning algorithms such as ISOMAP [63], Locally Linear Embedding (LLE) =-=[58]-=-, and Laplacian Eigenmaps (LE) [8], all rely on graphs constructed with different motivations [73]. Moreover, most popular subspace learning algorithms, e.g., Principal Component Analysis (PCA) [42] a... |

1619 | Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection
- Belhumeur, Hespanha, et al.
- 1997
(Show Context)
Citation Context ...l rely on graphs constructed with different motivations [73]. Moreover, most popular subspace learning algorithms, e.g., Principal Component Analysis (PCA) [42] and Linear Discriminant Analysis (LDA) =-=[7]-=-, can all be explained within the graph embedding framework [73]. Also, a number of semisupervised learning algorithms are driven by the regularizing graphs constructed over both labeled and unlabeled... |

954 | Face Recognition: A Literature Survey
- Chellappa, Zhao
- 2000
(Show Context)
Citation Context ...ms in computer vision. II. ROBUST FACE RECOGNITION: CONFLUENCE OF PRACTICE AND THEORY Automatic face recognition remains one of the most visible and challenging application domains of computer vision =-=[77]-=-. Foundational results in the theory of sparse representation have recently inspired significant progress on this difficult problem. 1 We use the term “basis” loosely here, since the dictionary can be... |

809 | Least angle regression
- Efron
- 2004
(Show Context)
Citation Context ... has a strong tendency to separate the identity of the face (red coefficients) from the error due to corruption or occlusion. Once the ℓ 1 -minimization problem has been solved (see, e.g., [9], [26], =-=[30]-=-), classification (identifying the subject pictured) or validation (determining if the subject is present in the training database) can proceed by considering how strongly the recovered coefficients c... |

781 | Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput
- Belkin, Niyogi
- 2003
(Show Context)
Citation Context ...art with a graph representing pairwise relationships between the data samples [61]. Manifold learning algorithms such as ISOMAP [63], Locally Linear Embedding (LLE) [58], and Laplacian Eigenmaps (LE) =-=[8]-=-, all rely on graphs constructed with different motivations [73]. Moreover, most popular subspace learning algorithms, e.g., Principal Component Analysis (PCA) [42] and Linear Discriminant Analysis (L... |

703 | Decoding by linear programming
- Candes, Tao
- 2005
(Show Context)
Citation Context ...in ‖α‖0 + ‖e‖0 subj x = Dα + e. (4) Here, the ℓ 0 “norm” ‖ · ‖0 counts the number of nonzeros in a vector. Originally inspired by theoretical results on equivalence between ℓ 1 and ℓ 0 -minimizations =-=[13]-=-, [24], in [71] the authors proposed to seek this informative vector α0 by solving the convex relaxation min ‖α‖1 + ‖e‖1 subj x = Dα + e, (5) . where ‖α‖1 = ∑ i |αi|. That work reported striking empir... |

688 |
Compressive sampling
- Candès
(Show Context)
Citation Context ...a as a liner combination of a few atoms from such dictionary leads to state-ofthe-art results in image and video restoration and classification. The emerging area of compressed sensing (CS), see [3], =-=[14]-=-, [27] and references therein, has shown that sparse signals can be recovered from far fewer samples than required by the classical Shannon-Nyquist Theorem. The samples used in CS correspond to linear... |

640 | Sparse coding with an overcomplete basis set: A strategy employed by V1?, Vision Research 37 - Olshausen, Field - 1997 |

615 | Sparse Bayesian learning and the relevance vector machine
- Tipping
- 2001
(Show Context)
Citation Context ... quadratic fitting term and an ℓ0 or ℓ1 regularization term for each column of A, the balance of the two being defined by the penalty parameter λ (this parameter has been studied in [35], [39], [55], =-=[65]-=-, [79]). As mentioned above, the ℓ1 norm can be used as an approximation to ℓ0 , making the problem convex in A while still encouraging sparse solutions [64]. While for reconstruction we found that th... |

559 | Greed is good: algorithmic results for sparse approximation
- Tropp
(Show Context)
Citation Context ...the coding algorithms, are related to intrinsic properties of the dictionary such as the mutual coherence, the cumulative coherence, and the Gram matrix norm of the dictionary [28], [31], [40], [59], =-=[66]-=-. Dictionaries can be learned by locally optimizing these and related objectives [29], [55]. In this section, we present basic concepts associated with dictionary learning, and provide illustrative ex... |

497 | Semi-supervised learning literature survey,” University of Wisconsin-Madison
- Zhu
- 2005
(Show Context)
Citation Context ...all be explained within the graph embedding framework [73]. Also, a number of semisupervised learning algorithms are driven by the regularizing graphs constructed over both labeled and unlabeled data =-=[78]-=-. Most of the works described above rely on one of two popular approaches to graph construction: the k-nearest-neighbor method and the ε-ball method. The first assigns edges between each data point an... |

474 | From few to many: Illumination cone models for face recognition under variable lighting and pose
- Georghiades, Belhumeur
- 2001
(Show Context)
Citation Context ...Rm×Ni , each normalized to have unit ℓ2 norm. One classical observation from computer vision is that images of the same face under varying illumination lie near a special lowdimensional subspace [6], =-=[38]-=-, often called a face subspace. So, given a sufficiently expressive training set Di, a new image of subject i taken under different illumination and also stacked as a vector x ∈ Rm , can be represente... |

459 |
An iterative thresholding algorithm for linear inverse problems with a sparsity constraint
- Daubechies, Defrise, et al.
(Show Context)
Citation Context ...s to the theory and practice of learning such collections of atoms (usually called dictionaries or codebooks), e.g., [4], [34], [52], and of representing the actual data in terms of them, e.g., [17], =-=[20]-=-, [30], have been developed in recent years, leading to state-ofthe-art results in many signal and image processing tasks [11], [32], [44], [48], [51], [54]. We refer the reader to [10] for a recent r... |

437 |
K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation
- Aharon, Elad, et al.
- 2006
(Show Context)
Citation Context ...few typical patterns (atoms) learned from the data itself. Significant contributions to the theory and practice of learning such collections of atoms (usually called dictionaries or codebooks), e.g., =-=[4]-=-, [34], [52], and of representing the actual data in terms of them, e.g., [17], [20], [30], have been developed in recent years, leading to state-ofthe-art results in many signal and image processing ... |

392 | Optimally sparse representation in general (non-orthogonal) dictionaries via minimization
- Donoho, Elad
- 2002
(Show Context)
Citation Context ...rs are tightly bundled together as a “bouquet,” whereas the standard pixel basis ±I with respect to which we represent the errors e forms a “cross” in Rm , as illustrated in Figure 3. The incoherence =-=[25]-=- and restricted isometry [13] properties that are so useful in providing performance guarantees for ℓ1-minimization therefore do not hold for the “cross-andbouquet” matrix [D I] (similarly, conditions... |

364 | Robust face recognition via sparse representation
- Wright, Yang, et al.
- 2009
(Show Context)
Citation Context ...the Department of Electrical and Computer Engineering, National University of Singapore. Address: Office E4-05-11, 4 Engineering Drive 3, 117576, Singapore. Email: eleyans@nus.edu.sg face recognition =-=[71]-=-, image super-resolution [75], motion and data segmentation [33], [56], supervised denoising and inpainting [51] and background modeling [16], [21] and image classification [47], [48]. In almost all o... |

363 | Lambertian reflectance and linear subspaces. Pattern Analysis and Machine Intelligence
- Basri, Jacobs
- 2003
(Show Context)
Citation Context ... ] ∈ Rm×Ni , each normalized to have unit ℓ2 norm. One classical observation from computer vision is that images of the same face under varying illumination lie near a special lowdimensional subspace =-=[6]-=-, [38], often called a face subspace. So, given a sufficiently expressive training set Di, a new image of subject i taken under different illumination and also stacked as a vector x ∈ Rm , can be repr... |

362 | 2004) For Most Large Underdetermined Systems of Linear Equations the Minimal l1 norm Solution is also the sparsest Solution. URL : http://stat.stanford.edu/˜donoho/Reports/2004
- Donoho
(Show Context)
Citation Context ...0 + ‖e‖0 subj x = Dα + e. (4) Here, the ℓ 0 “norm” ‖ · ‖0 counts the number of nonzeros in a vector. Originally inspired by theoretical results on equivalence between ℓ 1 and ℓ 0 -minimizations [13], =-=[24]-=-, in [71] the authors proposed to seek this informative vector α0 by solving the convex relaxation min ‖α‖1 + ‖e‖1 subj x = Dα + e, (5) . where ‖α‖1 = ∑ i |αi|. That work reported striking empirical r... |

306 |
Image denoising via sparse and redundant representations over learned dictionaries
- Elad, Aharon
(Show Context)
Citation Context ..., and of representing the actual data in terms of them, e.g., [17], [20], [30], have been developed in recent years, leading to state-ofthe-art results in many signal and image processing tasks [11], =-=[32]-=-, [44], [48], [51], [54]. We refer the reader to [10] for a recent review on the subject. The actual dictionary plays a critical role, and it has been shown again and again that learned and data adapt... |

292 | A review of image denoising algorithms, with a new one. Multiscale Modeling
- Buades, Coll, et al.
- 2005
(Show Context)
Citation Context ...se), with a sparsity of about 1/10th of the signal dimension m. State-of-the-art results obtained in [51] are “shared” with those in [19], which extends the non-local means approach developed in [5], =-=[12]-=-. Interestingly, the two frameworks are quite related, since they both use patches as building blocks (in [51], the sparse coding is applied to all overlapping image patches), and while a dictionary i... |

282 |
The adaptive LASSO and its oracle properties
- Zou
- 2006
(Show Context)
Citation Context ...atic fitting term and an ℓ0 or ℓ1 regularization term for each column of A, the balance of the two being defined by the penalty parameter λ (this parameter has been studied in [35], [39], [55], [65], =-=[79]-=-). As mentioned above, the ℓ1 norm can be used as an approximation to ℓ0 , making the problem convex in A while still encouraging sparse solutions [64]. While for reconstruction we found that the ℓ0 p... |

225 | From sparse solutions of systems of equations to sparse modeling of signals and images
- Bruckstein, Donoho, et al.
(Show Context)
Citation Context ... concatenations of such bases. Moreover, efficient and provably effective algorithms based on convex optimization or greedy pursuit are available for computing such representations with high fidelity =-=[10]-=-. While these successes in classical signal processing applications are inspiring, in computer vision we are often more interested in the content or semantics of an image rather than a compact, high-f... |

224 |
Sparse representations in unions of bases
- Gribonval, Nielsen
(Show Context)
Citation Context ...ficiency of the coding algorithms, are related to intrinsic properties of the dictionary such as the mutual coherence, the cumulative coherence, and the Gram matrix norm of the dictionary [28], [31], =-=[40]-=-, [59], [66]. Dictionaries can be learned by locally optimizing these and related objectives [29], [55]. In this section, we present basic concepts associated with dictionary learning, and provide ill... |

207 | Self-taught learning: Transfer learning from unlabeled data
- Raina, Battle, et al.
- 2007
(Show Context)
Citation Context ...of recent attention. The main idea is to improve classifier performance by using additional unlabeled training samples to characterize the intrinsic geometry of the observation space (see for example =-=[54]-=- for the application of sparse models for semisupervised learning problems). For classification algorithms that rely on optimal projections or embeddings of the data, this can be achieved by adding a ... |

184 |
A database for handwritten text recognition research
- Hull
- 1994
(Show Context)
Citation Context ...purposes. D. Experimental Results In this section, we systematically evaluate the effectiveness of the ℓ 1 -graph in the machine learning scenarios outlined above. The USPS handwritten digit database =-=[41]-=- (200 samples are selected for each class), forest covertype database [1] (120 samples are selected for each class), and ETH-80 object recognition database [2] are used for the experiments. Note that ... |

144 |
Principal component analysis
- Joliffe
- 1986
(Show Context)
Citation Context ...) [58], and Laplacian Eigenmaps (LE) [8], all rely on graphs constructed with different motivations [73]. Moreover, most popular subspace learning algorithms, e.g., Principal Component Analysis (PCA) =-=[42]-=- and Linear Discriminant Analysis (LDA) [7], can all be explained within the graph embedding framework [73]. Also, a number of semisupervised learning algorithms are driven by the regularizing graphs ... |

144 | Acquiring linear subspaces for face recognition under variable lighting
- Lee, Ho, et al.
- 2005
(Show Context)
Citation Context ...e the latter two methods are more directly suited for occlusion, as they produce lowerdimensional feature sets that are spatially localized. Figure 2 left also compares to the Nearest Subspace method =-=[45]-=-, which makes similar use of linear illumination models, but is not based on sparsity and does not correct sparse errors. The ℓ 1 -based approach achieves the highest overall recognition rate of the m... |

133 | Learning spatially localized parts-based representation
- Li, Hou, et al.
(Show Context)
Citation Context ...h several popular methods from the literature 3 : the Principal Component Analysis (PCA) approach of [67], Independent Component Analysis (ICA) [43], and Local Nonnegative Matrix Factorization (LNMF) =-=[46]-=-. The first provides a standard baseline of comparison, while the latter two methods are more directly suited for occlusion, as they produce lowerdimensional feature sets that are spatially localized.... |

122 | Sparse multinomial logistic regression: Fast algorithms and generalization bounds
- Krishnapuram, Carin, et al.
- 2005
(Show Context)
Citation Context ...of representing the actual data in terms of them, e.g., [17], [20], [30], have been developed in recent years, leading to state-ofthe-art results in many signal and image processing tasks [11], [32], =-=[44]-=-, [48], [51], [54]. We refer the reader to [10] for a recent review on the subject. The actual dictionary plays a critical role, and it has been shown again and again that learned and data adaptive di... |

117 | Sparse Representation for Color Image Restoration
- Mairal, Elad, et al.
- 2008
(Show Context)
Citation Context ...se approaches have been extended for multiscale dictionaries and color images in [51], leading to state-of-the-art results. See Figure 4 for an example of color image denosing with this approach, and =-=[49]-=-, [51] for numerous additional examples, comparisons, and applications in image demosaicing, image inpainting, and image denoising. An example of a 5 The expression (11) can be derived from a MAP esti... |

105 | Supervised dictionary learning - Mairal, Bach, et al. - 2008 |

104 |
Enhancing sparsity by reweighted l1 minimization
- Candes, MBWakin
(Show Context)
Citation Context ...the classification tasks introduced in the next section. In addition, these costs can be replaced by a (non-convex) Lorentzian penalty function, motivated either by further approximating the ℓ0 by ℓ1 =-=[15]-=-, or by considering a mixture of Laplacians prior for the coefficients in A and exploiting MDL concepts [55], instead of the more classical Laplacian prior. 5 Since (11) is not simultaneously convex i... |

101 | S.: Graph Embedding and Extensions: A General Framework For Dimensionality Reduction
- Yan, Xu, et al.
- 2007
(Show Context)
Citation Context ...e data samples [61]. Manifold learning algorithms such as ISOMAP [63], Locally Linear Embedding (LLE) [58], and Laplacian Eigenmaps (LE) [8], all rely on graphs constructed with different motivations =-=[73]-=-. Moreover, most popular subspace learning algorithms, e.g., Principal Component Analysis (PCA) [42] and Linear Discriminant Analysis (LDA) [7], can all be explained within the graph embedding framewo... |

88 | 2005) Neighborly Polytopes and the Sparse Solution of Underdetermined Systems of Linear Equations. To appear
- Donoho
(Show Context)
Citation Context ... the minimizer is uniquely defined. From the geometry of ℓ1-minimization, if (8) does not hold for some pair (α0, e0), then it does not hold for any (α, e) with the same signs and support as (α0, e0) =-=[23]-=-. Understanding ℓ1-recoverability at each (I, J, σ) completely characterizes which solutions to x = Dα + e can be correctly recovered. In this language, the following characterization of the error cor... |

85 | Sparse subspace clustering
- Elhamifar, Vidal
- 2009
(Show Context)
Citation Context ...University of Singapore. Address: Office E4-05-11, 4 Engineering Drive 3, 117576, Singapore. Email: eleyans@nus.edu.sg face recognition [71], image super-resolution [75], motion and data segmentation =-=[33]-=-, [56], supervised denoising and inpainting [51] and background modeling [16], [21] and image classification [47], [48]. In almost all of these applications, using sparsity as a prior leads to state-o... |

65 | Image superresolution as sparse representation of raw image patches
- Yang, Wright, et al.
- 2008
(Show Context)
Citation Context ...and Computer Engineering, National University of Singapore. Address: Office E4-05-11, 4 Engineering Drive 3, 117576, Singapore. Email: eleyans@nus.edu.sg face recognition [71], image super-resolution =-=[75]-=-, motion and data segmentation [33], [56], supervised denoising and inpainting [51] and background modeling [16], [21] and image classification [47], [48]. In almost all of these applications, using s... |

64 | Bayesian inference and optimal design for the sparse linear model
- Seeger
- 2008
(Show Context)
Citation Context ...ses of the proposed framework include the optimization of the sensing matrix for a given dictionary as well as the optimization of the dictionary for a pre-defined sensing environment (see also [31], =-=[60]-=-, [69]). D. Learning to Sense As we have seen, learning overcomplete dictionaries that facilitate a sparse representation of the data as a liner combination of a few atoms from such dictionary leads t... |

64 | Distance metric learning: A comprehensive survey
- Yang, Jin
- 2006
(Show Context)
Citation Context ...ry for the task as well as the processing and acquisition pipeline. Sparse modeling, and in particular the (semi-) supervised case, can be considered as a non-linear extension of metric learning (see =-=[76]-=- for bibliography on the subject and [62] for details on the connections between sparse modeling and metric learning). Such interesting connection brings yet another exciting aspect into the ongoing s... |

60 | Learning Multiscale Sparse Representations for Image and Video Restoration,” Multiscale Modeling and Simulation
- Mairal, Sapiro, et al.
- 2008
(Show Context)
Citation Context ...1, 4 Engineering Drive 3, 117576, Singapore. Email: eleyans@nus.edu.sg face recognition [71], image super-resolution [75], motion and data segmentation [33], [56], supervised denoising and inpainting =-=[51]-=- and background modeling [16], [21] and image classification [47], [48]. In almost all of these applications, using sparsity as a prior leads to state-of-the-art results. The ability of sparse represe... |

46 | Fast solution of ℓ1-norm minimization problems when the solution may be sparse
- Donoho, Tsaig
(Show Context)
Citation Context ...ure 1, has a strong tendency to separate the identity of the face (red coefficients) from the error due to corruption or occlusion. Once the ℓ 1 -minimization problem has been solved (see, e.g., [9], =-=[26]-=-, [30]), classification (identifying the subject pictured) or validation (determining if the subject is present in the training database) can proceed by considering how strongly the recovered coeffici... |

44 | Towards a practical face recognition system: Robust alignment and illumination by sparse representation
- Wagner, Wright, et al.
- 2011
(Show Context)
Citation Context ...number of observations, and we cannot directly solve for α0. However, under mild conditions [28], the desired solution (α0, e0) is 2 For a detailed explanation of how such images can be obtained, see =-=[68]-=-. Fig. 1. Overview of the face recognition approach. The method represents a test image (left), which is potentially occluded (top) or corrupted (bottom), as a sparse linear combination of all the tra... |

40 | Adaptive sparseness using jeffreys prior
- Figueiredo
- 2002
(Show Context)
Citation Context ...(11) consists of a quadratic fitting term and an ℓ0 or ℓ1 regularization term for each column of A, the balance of the two being defined by the penalty parameter λ (this parameter has been studied in =-=[35]-=-, [39], [55], [65], [79]). As mentioned above, the ℓ1 norm can be used as an approximation to ℓ0 , making the problem convex in A while still encouraging sparse solutions [64]. While for reconstructio... |

36 | Compressive sensing for background subtraction
- CEVHER, SANKARANARAYANAN, et al.
- 2009
(Show Context)
Citation Context ...576, Singapore. Email: eleyans@nus.edu.sg face recognition [71], image super-resolution [75], motion and data segmentation [33], [56], supervised denoising and inpainting [51] and background modeling =-=[16]-=-, [21] and image classification [47], [48]. In almost all of these applications, using sparsity as a prior leads to state-of-the-art results. The ability of sparse representations to uncover semantic ... |

36 | Motion segmentation via robust subspace separation in the presence of outlying, incomplete, or corrupted trajectories
- Rao, Tron, et al.
- 2008
(Show Context)
Citation Context ...sity of Singapore. Address: Office E4-05-11, 4 Engineering Drive 3, 117576, Singapore. Email: eleyans@nus.edu.sg face recognition [71], image super-resolution [75], motion and data segmentation [33], =-=[56]-=-, supervised denoising and inpainting [51] and background modeling [16], [21] and image classification [47], [48]. In almost all of these applications, using sparsity as a prior leads to state-of-the-... |