### Citations

7647 | Matrix Analysis - Horn, Johnson - 1986 |

3191 |
A Wavelet Tour of Signal Processing
- Mallat
- 1999
(Show Context)
Citation Context ... which makes the question of designing “good” dictionaries prominent. A great deal of effort has been dedicated to come up with efficient predefined dictionaries, e.g., the various types of wavelets [=-=Mallat, 2008-=-]. These representations have notably contributed to many successful image processing applications such as compression, denoising and deblurring. More recently, the idea of simultaneously learning the... |

2695 | Atomic decomposition by basis pursuit
- Chen, Donoho, et al.
- 1998
(Show Context)
Citation Context ... we introduce FX(D) , 1n n∑ i=1 fxi(D). (3) 5 Based on problem (2) with the ℓ1 penalty, g(α) , λ‖α‖1, (4) refered to as Lasso in statistics [Tibshirani, 1996], and basis pursuit in signal processing [=-=Chen et al., 1998-=-], the standard approach to perform sparse coding [Olshausen and Field, 1997, Zibulevsky and Pearlmutter, 2001, Mairal et al., 2010] solves the minimization problem min D∈D FX(D), (5) where the regula... |

1298 | Emergence of simple-cell receptive field properties by learning a sparse code for natural images - Olshausen, Field - 1996 |

954 | Sparse coding with an overcomplete basis set: a strategy employed by V1? Vision Research
- Olshausen, Field
- 1997
(Show Context)
Citation Context ...Dαi for i in J1;nK. By sparse, we mean that the vector αi has k ≪ p non-zero coefficients, so that we aim at reconstructing xi from only a few atoms. Before introducing the sparse coding formulation [=-=Olshausen and Field, 1997-=-, Zibulevsky and Pearlmutter, 2001, Mairal et al., 2010], we need some definitions. We denote by g : Rp → R+ a penalty function that will typically promote sparsity. Definition 1. For any dictionary D... |

904 | Greed is good: Algorithmic results for sparse approximation - Tropp - 2004 |

575 | Uncertainty principles and ideal atomic decomposition - Donoho, Huo - 2001 |

471 | On model selection consistency of Lasso - Zhao, Yu - 2007 |

323 | Online learning for matrix factorization and sparse coding - Mairal, Bach, et al. |

291 |
Optimization algorithms on matrix manifolds
- Absil, Mahony, et al.
- 2008
(Show Context)
Citation Context ...eoff between sparsity and approximation quality, while D ⊆ Rm×p is a compact constraint set; in this paper, D denotes the set of dictionaries with unit ℓ2norm atoms, also called the oblique manifold [=-=Absil et al., 2008-=-], which is a natural choice in signal and image processing [Mairal et al., 2010, Gribonval and Schnass, 2010, Rubinstein et al., 2010, Tosic and Frossard, 2011]. Note however that other choices for t... |

272 | Blind source separation by sparse decomposition in a signal dictionary - Zibulevsky, Pearlmutter - 2001 |

134 | Recovery of exact sparse representations in the presence of bounded noise - Fuchs |

111 | Optimization with sparsity-inducing penalties - Bach, Jenatton, et al. - 2012 |

111 |
Concentration Inequalities: A Non-asymptotic Theory of Independence
- Boucheron, Lugosi, et al.
- 2013
(Show Context)
Citation Context ...ions on a measurable set X , and n i.i.d. random variables X1, . . . , Xn, in X . We assume that all functions are bounded by B (i.e., |f(X)| 6 B almost surely). Using usual symmetrisation arguments [=-=Boucheron et al., 2013-=-, Sec. 9.3], we get EX sup f∈F ( 1 n n∑ i=1 f(Xi)− EXf(X) ) 6 2EX,ε sup f∈F ( 1 n n∑ i=1 εif(Xi) ) , where εi, 1 6 i 6 n are independent Rademacher random variables, i.e., with values 1 and −1 with eq... |

111 | Sharp thresholds for noisy and high–dimensional recovery of sparsity using ℓ1–constrained quadratic programming (lasso - Wainwright |

102 | On the conditions used to prove oracle results for the lasso - Geer, Bühlmann |

92 | Spike and slab variable selection: Frequentist and Bayesian strategies. The Annals of Statistics - Ishwaran, Rao - 2005 |

91 | Some sharp performance bounds for least squares regression with L1 penalization - Zhang - 2009 |

91 | Nonparametric bayesian dictionary learning for analysis of noisy and incomplete images,” IMA
- Zhou, Chen, et al.
- 2010
(Show Context)
Citation Context ...e.g., [Mairal et al., 2010] and references therein). Although sparse dictionary learning can sometimes be formulated as convex [Bach et al., 2008, Bradley and Bagnell, 2009], non-parametric Bayesian [=-=Zhou et al., 2009-=-] and submodular [Krause and Cevher, 2010] problems, the most popular and widely used definition of sparse coding brings into play a non-convex optimization problem. Despite its empirical and practica... |

84 | Near-ideal model selection by ℓ1 minimization - Candès, Plan - 2009 |

79 | Proximal methods for hierarchical sparse coding - Jenatton, Mairal, et al. |

74 | Sparse component analysis and blind source separation of underdetermined mixtures - Georgiev, Theis, et al. - 2005 |

73 |
Concentration Inequalities and Model Selection : Ecole d’Et de Probabilits de Saint-Flour
- Massart
- 2007
(Show Context)
Citation Context ...n i=1 ηihi(D) and CD = R√ n ∑m i=1 ∑p j=1 ζij(D−Do)ij , where η and ζ are standard Gaussian vectors. We have, for all D and D′, E|AD −AD′ |2 6 R2n ‖D−D′‖2F = E|CD − CD′ |2. Hence, by Slepian’s lemma [=-=Massart, 2003-=-, Sec. 3.3], E sup‖D−Do‖F6r AD 6 E sup‖D−Do‖F6r CD = Rr√ n E‖ζ‖F 6 Rr √ mp√ n . Thus, by applying the above reasoning to the functions hi and −hi and taking the expectation with respect to the draw of... |

52 | On the uniqueness of overcomplete dictionaries, and a practical way to retrieve them
- Aharon, Elad, et al.
(Show Context)
Citation Context ...ization under equality constraints [Gribonval and Schnass, 2010, Geng et al., 2011], for which there is no known efficient heuristic implementation, or on an ℓ0 criterion [Schnass, 2013] à la K-SVD [=-=Aharon et al., 2006-=-]. More algorithmic approaches have also recently emerged [Spielman et al., 2012, Arora et al., 2013] demonstrating the existence of provably good algorithms of polynomial complexity for dictionary le... |

51 | Dictionary learning - Tošić, Frossard |

36 | Exact recovery of sparselyused dictionaries
- Spielman, Wang, et al.
- 2012
(Show Context)
Citation Context ...2011], for which there is no known efficient heuristic implementation, or on an ℓ0 criterion [Schnass, 2013] à la K-SVD [Aharon et al., 2006]. More algorithmic approaches have also recently emerged [=-=Spielman et al., 2012-=-, Arora et al., 2013] demonstrating the existence of provably good algorithms of polynomial complexity for dictionary learning. Agarwal et al. [2013a] combine the best of both worlds by providing a po... |

27 | Submodular dictionary selection for sparse representation.
- Krause, Cevher
- 2010
(Show Context)
Citation Context ...erences therein). Although sparse dictionary learning can sometimes be formulated as convex [Bach et al., 2008, Bradley and Bagnell, 2009], non-parametric Bayesian [Zhou et al., 2009] and submodular [=-=Krause and Cevher, 2010-=-] problems, the most popular and widely used definition of sparse coding brings into play a non-convex optimization problem. Despite its empirical and practical success, the theoretical analysis of th... |

25 | Convex sparse matrix factorizations
- Bach, Mairal, et al.
(Show Context)
Citation Context ...ance in many tasks, including inpainting and image classification (see, e.g., [Mairal et al., 2010] and references therein). Although sparse dictionary learning can sometimes be formulated as convex [=-=Bach et al., 2008-=-, Bradley and Bagnell, 2009], non-parametric Bayesian [Zhou et al., 2009] and submodular [Krause and Cevher, 2010] problems, the most popular and widely used definition of sparse coding brings into pl... |

24 | K-dimensional coding schemes in hilbert spaces - Maurer, Pontil |

21 | The sample complexity of dictionary learning - Vainsencher, Mannor, et al. |

19 | Foucart and Holger Rauhut. A Mathematical Introduction to Compressive Sensing. Applied and Numerical Harmonic Analysis. Birkhäuser - Simon - 2013 |

19 | Local stability and robustness of sparse dictionary learning in the presence of noise - Jenatton, Gribonval, et al. - 2012 |

14 | Compressible Distributions for Highdimensional Statistics - Gribonval, Cevher, et al. - 2012 |

13 | Convex coding.
- Bradley, Bagnell
- 2009
(Show Context)
Citation Context ... including inpainting and image classification (see, e.g., [Mairal et al., 2010] and references therein). Although sparse dictionary learning can sometimes be formulated as convex [Bach et al., 2008, =-=Bradley and Bagnell, 2009-=-], non-parametric Bayesian [Zhou et al., 2009] and submodular [Krause and Cevher, 2010] problems, the most popular and widely used definition of sparse coding brings into play a non-convex optimizatio... |

8 | Animashree Anandkumar, Prateek Jain, Praneeth Netrapalli, and Rashish Tandon, Learning sparsely used overcomplete dictionaries via alternating minimization, arXiv preprint arXiv:1310.7991 - Agarwal - 2013 |

8 | Ankur Moitra. New algorithms for learning incoherent and overcomplete dictionaries - Arora, Ge - 2014 |

8 | On the identifiability of overcomplete dictionaries via the minimisation principle underlying
- Schnass
(Show Context)
Citation Context ...e function based on ℓ1 minimization under equality constraints [Gribonval and Schnass, 2010, Geng et al., 2011], for which there is no known efficient heuristic implementation, or on an ℓ0 criterion [=-=Schnass, 2013-=-] à la K-SVD [Aharon et al., 2006]. More algorithmic approaches have also recently emerged [Spielman et al., 2012, Arora et al., 2013] demonstrating the existence of provably good algorithms of polyn... |

7 | Animashree Anandkumar, and Praneeth Netrapalli, Exact recovery of sparsely used overcomplete dictionaries, arXiv preprint arXiv:1309.1952 - Agarwal - 2013 |

7 | Sample Complexity of Dictionary Learning and other Matrix Factorizations
- Gribonval, Jenatton, et al.
- 2013
(Show Context)
Citation Context ...ich is weaker). 12 4 Main steps of the analysis For many classical penalty functions g, including the considered ℓ1 penalty g(α) = λ‖α‖1, the function D 7→ FX(D) is continuous, and in fact Lipschitz [=-=Gribonval et al., 2013-=-] with respect to the Frobenius metric ρ(D′,D) , ‖D′ −D‖F on all Rm×p, hence in particular on the compact constraint set D ⊂ Rm×p. Given a dictionary D ∈ D, we have ‖D‖F = √p, and for any radius 0 < r... |

5 | Local identification of overcomplete dictionaries,” ArXiv e-prints
- Schnass
- 2014
(Show Context)
Citation Context ...o zero. (b) how many signals n are sufficient to hope for the existence of such a controlled local minimum, i.e., n = Ω(mp3). In contrast to several recent results [Schnass, 2013, Arora et al., 2013, =-=Schnass, 2014-=-] where the sample complexity depends on the targeted resolution r such that ‖D̂−Do‖ 6 r, our main sample complexity estimates are resolution-independent. This is similar in nature to the better sampl... |

2 | Performance limits of dictionary learning for sparse coding - Jung, Eldar, et al. - 2014 |

2 | On the sample complexity of predictive sparse coding
- Mehta, Gray
- 2012
(Show Context)
Citation Context ...rance, and CMAP, Ecole Polytechnique (UMR CNRS 7641), 91128 Palaiseau, France. 1 dictionary learning is still in its infancy. A recent line of work [Maurer and Pontil, 2010, Vainsencher et al., 2011, =-=Mehta and Gray, 2012-=-] establishes generalization bounds which quantify how much the expected signalreconstruction error differs from the empirical one, computed from a random and finite-size sample of signals. In particu... |

2 | Jalal M Fadili. Model Consistency of Partly Smooth Regularizers. arXiv - Vaiter, Peyré - 2014 |

2 | Denote D the oblique manifold. Given any D1,D2 ∈ D, there exists a matrix W ∈ Rm×p with diag(D⊤W) = 0 and diag(W⊤W - Lemma |

2 | These properties are already well-known - Proof - 2005 |

2 | first check that α̂ is a solution of the Lasso program. It is well-known [e.g., see Fuchs, 2005, Wainwright, 2009] that this statement is equivalent to the existence of a subgradient z ∈ ∂‖α̂‖1 such that −D⊤(x−Dα̂) + λz = 0, where zj = sign(α̂j) if α̂j 6= - We |

1 | Jc(x −Dα̂) = D⊤Jc(I−PJ)x+ λD⊤JcDJ(D⊤J DJ)−1s. (77) Using our assumption, we have ‖zJc‖∞ < 1. We have therefore proved that α̂ is a solution of the Lasso program. The uniqueness comes from - Lemma |