#### DMCA

## Learning efficient sparse and low rank models

Venue: | CoRR |

Citations: | 5 - 3 self |

### Citations

3999 | Regression shrinkage and selection via the lasso
- Tibshirani
- 1996
(Show Context)
Citation Context ...esented in some domain, usually referred to as dictionary. The pursuit of sparse representations was shown to be possible using tools from convex optimization, in particular, via `1 norm minimization =-=[1, 2]-=-. Works [3, 4], followed by many others, introduced efficient computational techniques for dictionary learning and adaptation. Sparse modeling is in the heart of modern approaches to image enhancement... |

2684 | Atomic decomposition by basis pursuit
- Chen, Donoho, et al.
- 1998
(Show Context)
Citation Context ...esented in some domain, usually referred to as dictionary. The pursuit of sparse representations was shown to be possible using tools from convex optimization, in particular, via `1 norm minimization =-=[1, 2]-=-. Works [3, 4], followed by many others, introduced efficient computational techniques for dictionary learning and adaptation. Sparse modeling is in the heart of modern approaches to image enhancement... |

1645 |
Learning the parts of objects by non-negative matrix factorization. Nature
- Lee, Seung
- 1999
(Show Context)
Citation Context ... in important applications such as face recognition and modeling, background modeling, and audio source separation. Another relevant low rank modeling scheme is non-negative matrix factorization (NMF)=-=[16]-=-, where the input vectors are represented as nonnegative linear combination of a non-negative under-complete dictionary. NMF has been particularly successful in applications such as object recognition... |

1288 |
Emergence of simple-cell receptive field properties by learning a sparse code for natural images
- Olshausen, Field
- 1996
(Show Context)
Citation Context ...e domain, usually referred to as dictionary. The pursuit of sparse representations was shown to be possible using tools from convex optimization, in particular, via `1 norm minimization [1, 2]. Works =-=[3, 4]-=-, followed by many others, introduced efficient computational techniques for dictionary learning and adaptation. Sparse modeling is in the heart of modern approaches to image enhancement such as denoi... |

1151 |
Nonlinear Programming
- Bertsekas
- 1995
(Show Context)
Citation Context ...mization tools often constitute a major computational bottleneck. The quest for efficiently solving sparse representation pursuit has given rise to a rich family of algorithms, both for sparse coding =-=[17, 18, 19, 20, 21, 22]-=- , RPCA [13, 23, 24, 25] and NMF [16, 26] problems. Despite the permanent progress reported in the literature, the state-of-the-art algorithms require hundreds or thousands of iterations to converge, ... |

1135 |
On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications 16
- Vapnik, Chervonenkis
- 1971
(Show Context)
Citation Context ...at is, the generalization capabilities of the model. When the family F is sufficiently restrictive, the statistical learning theory justifies minimizing the empirical risk instead of the expected risk=-=[43]-=-. We will come back to this issue in Section 4.2, where we address the accuracy of the proposed encoders in approximatingsL(zΘ). When the functions belonging to F are almost everywhere differentiable ... |

1127 | Model Selection and Estimation in Regression with Grouped Variables
- Yuan, Lin
(Show Context)
Citation Context ...parsity model and the `1 norm inducing it, more elaborate structured sparse models have been developed, in which non-zero elements are no more unrelated, but appear in groups or hierarchies of groups =-=[5, 6, 7, 8, 9, 10]-=-. Such models have been shown useful in the analysis of functional MRI and genetic data for example. In the case of matrix-valued data, complexity is naturally measured by the rank, which also induces... |

1025 | A fast iterative shrinkage-thresholding algorithm for linear inverse problems
- Beck, Teboulle
(Show Context)
Citation Context ...mization tools often constitute a major computational bottleneck. The quest for efficiently solving sparse representation pursuit has given rise to a rich family of algorithms, both for sparse coding =-=[17, 18, 19, 20, 21, 22]-=- , RPCA [13, 23, 24, 25] and NMF [16, 26] problems. Despite the permanent progress reported in the literature, the state-of-the-art algorithms require hundreds or thousands of iterations to converge, ... |

919 | The K-SVD: An algorithm for designing of overcomplete dictionaries for sparse representation
- Aharon, Elad, et al.
- 2006
(Show Context)
Citation Context ...e domain, usually referred to as dictionary. The pursuit of sparse representations was shown to be possible using tools from convex optimization, in particular, via `1 norm minimization [1, 2]. Works =-=[3, 4]-=-, followed by many others, introduced efficient computational techniques for dictionary learning and adaptation. Sparse modeling is in the heart of modern approaches to image enhancement such as denoi... |

832 | Exact matrix completion via convex optimization
- Candès, Recht
(Show Context)
Citation Context ...arsimony. A recent series of works have elucidated the beautiful relationship between sparsity and low rank representations, showing that rank minimization can be achieved through convex optimization =-=[11, 12]-=-. The combination of low-rank and sparse models paved the path to new robust alternatives of principal component analysis (RPCA) [13, 14] and nonnegative matrix factorization (RNMF) [15], and addressi... |

755 | Reducing the dimensionality of data with neural networks - Hinton, Salakhutdinov - 2006 |

733 | An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” Communications on pure and applied mathematics
- Daubechies, Defrise, et al.
- 2004
(Show Context)
Citation Context ...mization tools often constitute a major computational bottleneck. The quest for efficiently solving sparse representation pursuit has given rise to a rich family of algorithms, both for sparse coding =-=[17, 18, 19, 20, 21, 22]-=- , RPCA [13, 23, 24, 25] and NMF [16, 26] problems. Despite the permanent progress reported in the literature, the state-of-the-art algorithms require hundreds or thousands of iterations to converge, ... |

543 | Robust principal component analysis
- Candes, Li, et al.
(Show Context)
Citation Context ... rank minimization can be achieved through convex optimization [11, 12]. The combination of low-rank and sparse models paved the path to new robust alternatives of principal component analysis (RPCA) =-=[13, 14]-=- and nonnegative matrix factorization (RNMF) [15], and addressing challenging matrix completion problems [12]. RPCA was also found useful in important applications such as face recognition and modelin... |

527 | A Singular Value Thresholding Algorithm for Matrix Completion
- Cai, Candès, et al.
(Show Context)
Citation Context ...a major computational bottleneck. The quest for efficiently solving sparse representation pursuit has given rise to a rich family of algorithms, both for sparse coding [17, 18, 19, 20, 21, 22] , RPCA =-=[13, 23, 24, 25]-=- and NMF [16, 26] problems. Despite the permanent progress reported in the literature, the state-of-the-art algorithms require hundreds or thousands of iterations to converge, making their use impract... |

503 | The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions
- Hirsch, Pearce
- 2000
(Show Context)
Citation Context ...e GRID clips were artificially contaminated by six categories of noise recorded from different real environments (street, restaurant, car, exhibition, train, and airport) taken from the AURORA corpus =-=[51]-=-. The voice and the noise clips were mixed linearly with equal energy (0 dB SNR). In all experiments, the spectrogram of each mixture was computed using a window of size 512 and a step size of 128 sam... |

388 | Gradient methods for minimizing composite objective function
- Nesterov
- 2007
(Show Context)
Citation Context |

359 |
Filtering for texture classification: A comparative study
- Randen, Husoy
- 1999
(Show Context)
Citation Context ...valuate the performance of the unstructured CoD (Unsupervised) encoders in the online learning regime, we used 30× 104 randomly located 8× 8 patches from three images from the Brodatz texture dataset =-=[46]-=-. The patches were ordered in three consecutive blocks of 104 patches from each image. Dictionary size was fixed to q = 64 atoms, and T = 4 layers were used in all CoD encoders. Unsupervised online le... |

289 | Convergence of a block coordinate descent method for nondifferentiable minimization
- Tseng
- 2001
(Show Context)
Citation Context ...(or group in case of structured sparsity) at a time in a (block) coordinate manner. Several variants of coordinate descent (CoD) and blockcoordinate descent (BCoD) proximal methods have been proposed =-=[42, 19]-=-. Typically, one proceeds as in Algorithm 1, first applying the proximal operator y = pi(bk). Next, the residual e = y − zk is evaluated, and the group is selected e.g. according to r = arg maxr ‖er‖2... |

273 | Projected gradient methods for nonnegative matrix factorization
- Lin
- 2007
(Show Context)
Citation Context ...tleneck. The quest for efficiently solving sparse representation pursuit has given rise to a rich family of algorithms, both for sparse coding [17, 18, 19, 20, 21, 22] , RPCA [13, 23, 24, 25] and NMF =-=[16, 26]-=- problems. Despite the permanent progress reported in the literature, the state-of-the-art algorithms require hundreds or thousands of iterations to converge, making their use impractical in scenarios... |

264 |
C.: Performance measurement in blind audio source separation
- Vincent, Gribonval, et al.
- 2006
(Show Context)
Citation Context ...ings closely followed that of [48], to which the reader is referred for further details. As the evaluation criteria, we used the normalized source-to-distortion ratio (NSDR) from the BSS-EVAL metrics =-=[49]-=-, averaged over the test set. Encoders with RNMF architecture composed by T = 20 layers and q = 25 were trained using different training regimes. We used λ = √ 2nσ and λ∗ = √ 2σ with σ = 0.3 set follo... |

239 | What is the best multistage architecture for object recognition
- Jarrett, Kavukcuoglu, et al.
- 2009
(Show Context)
Citation Context ...odestly sized input vectors; however, since real data tend to have low intrinsic dimensionality, the mapping can be inferred explicitly on the support of the distribution of the input data. Recently, =-=[28, 29]-=- have proposed to trade off precision in the sparse representation for computational speed-up by learning non-linear regressors capable 3 of producing good approximations of sparse codes in a fixed am... |

237 | Online dictionary learning for sparse coding
- Mairal, Bach, et al.
(Show Context)
Citation Context ...a priori. The data samples {xt}t∈N, xt ∈ Rm, arrive sequentially; the index t should be interpreted as time. Online parsimonious modeling aims at estimating and refining the model as the data come in =-=[36]-=-. The need for online schemes also arises when the available training data are simply too large to be handled 5 together. When the regularized function ψ is vector-wise separable, ψ(Z) = n∑ i=1 ψ(zi),... |

229 |
Principal Component Analysis, 2nd ed
- Jolliffe
- 2002
(Show Context)
Citation Context ...X is an m-dimensional data vector), is decomposed into X = L + E, where L is a low rank matrix and E is a perturbation matrix. PCA is known to produce very good results when the perturbation is small =-=[38]-=-. However, its performance is highly sensitive to the presence of samples not following the model; even a single outlier in the data matrix X can render 7 the estimation of the low rank component arbi... |

222 | Group lasso with overlap and graph lasso
- Jacob, Obozinski, et al.
- 2009
(Show Context)
Citation Context ...parsity model and the `1 norm inducing it, more elaborate structured sparse models have been developed, in which non-zero elements are no more unrelated, but appear in groups or hierarchies of groups =-=[5, 6, 7, 8, 9, 10]-=-. Such models have been shown useful in the analysis of functional MRI and genetic data for example. In the case of matrix-valued data, complexity is naturally measured by the rank, which also induces... |

156 | RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images
- Peng, Ganesh, et al.
- 2012
(Show Context)
Citation Context ...n particular, in Section 5 we show a very simple and efficient extension of the proposed RPCA framework to cases where the data undergo an unknown transformation that is sought for during the pursuit =-=[33]-=-. We also show the construction of discriminative parsimonious models. Finally, in Secion 6 we demonstrate our approaches on applications in image classification, face modeling, signal separation and ... |

147 | Large-scale machine learning with stochastic gradient descent
- Bottou
- 2010
(Show Context)
Citation Context ...elonging to F are almost everywhere differentiable with respect to the parameters Θ, stochastic gradient descent (SGD) can be used to optimize (21), with almost sure convergence to a stationary point =-=[44]-=-. At each iteration, a random subset of the training data, {xi1 , . . . ,xir}, is selected and used to produce an estimate of the (sub)-gradient of the objective function. Specifically, in our case th... |

143 | The composite absolute penalties family for grouped and hierarchical variable selection
- Zhao, Rocha, et al.
(Show Context)
Citation Context ...parsity model and the `1 norm inducing it, more elaborate structured sparse models have been developed, in which non-zero elements are no more unrelated, but appear in groups or hierarchies of groups =-=[5, 6, 7, 8, 9, 10]-=-. Such models have been shown useful in the analysis of functional MRI and genetic data for example. In the case of matrix-valued data, complexity is naturally measured by the rank, which also induces... |

107 |
An Audio-visual Corpus for Speech Perception and Automatic Speech Recognition
- Cooke, Barker, et al.
- 2006
(Show Context)
Citation Context ... NMF for both noise and human speech. In all our examples we used T = 10 layers and q = 50. Parameters λ and λ∗ were chosen as in Section 6.4.1. As speech dataset we used a subset of the GRID dataset =-=[50]-=- containing 10 distinct speakers; each speaker comprising 1,000 short clips. Three sets of 200 distinct clips each were used for training, validation, and testing. The GRID clips were artificially con... |

100 | Average case analysis of multichannel sparse recovery using convex relaxation
- Eldar, Rauhut
(Show Context)
Citation Context ...trices of Z. Collaborative sparse coding generalizes the concept of structured sparse coding to collections of input vectors by promoting given patterns of non-zero elements in the coefficient matrix =-=[9, 37]-=-. 2.2 Low-rank models and robust PCA Another significant manifestation of parsimony typical to many classes of data is low rank. The classical low rank model is principal component analysis (PCA), in ... |

89 | Robust pca via outlier pursuit
- Xu, Caramanis, et al.
- 2012
(Show Context)
Citation Context ... rank minimization can be achieved through convex optimization [11, 12]. The combination of low-rank and sparse models paved the path to new robust alternatives of principal component analysis (RPCA) =-=[13, 14]-=- and nonnegative matrix factorization (RNMF) [15], and addressing challenging matrix completion problems [12]. RPCA was also found useful in important applications such as face recognition and modelin... |

81 | Proximal methods for hierarchical sparse coding
- Jenatton, Mairal, et al.
(Show Context)
Citation Context |

78 | Convex optimization with sparsity-inducing norms
- Bach, Jenatton, et al.
- 2011
(Show Context)
Citation Context |

70 | Parallel stochastic gradient algorithms for large-scale matrix completion
- Recht, Re
- 2011
(Show Context)
Citation Context ...a major computational bottleneck. The quest for efficiently solving sparse representation pursuit has given rise to a rich family of algorithms, both for sparse coding [17, 18, 19, 20, 21, 22] , RPCA =-=[13, 23, 24, 25]-=- and NMF [16, 26] problems. Despite the permanent progress reported in the literature, the state-of-the-art algorithms require hundreds or thousands of iterations to converge, making their use impract... |

68 | A note on the group lasso and a sparse group lasso
- Friedman, Hastie, et al.
- 2010
(Show Context)
Citation Context ...roup of each level. A particular case of the tree-structured hierarchical sparse model is the two-level HiLasso model introduced to simultaneously promote sparsity at both group and coefficient level =-=[9, 41]-=-. Algorithm 1 is straightforward to generalize to the case of hierarchical sparsity by using the appropriate proximal operator. It is worthwhile noting that the update in Algorithm 1 can be applied to... |

66 | Fast inference in sparse coding algorithms with applications to object recognition
- Kavukcuoglu, Ranzato, et al.
- 2008
(Show Context)
Citation Context ...odestly sized input vectors; however, since real data tend to have low intrinsic dimensionality, the mapping can be inferred explicitly on the support of the distribution of the input data. Recently, =-=[28, 29]-=- have proposed to trade off precision in the sparse representation for computational speed-up by learning non-linear regressors capable 3 of producing good approximations of sparse codes in a fixed am... |

65 | An overview of bilevel optimization
- Colson, Marcotte, et al.
(Show Context)
Citation Context ...sulting bilevel optimization problems are notoriously difficult to solve in general; the non-differentiability of the lowerlevel parsimony-inducing objective makes the solution practically impossible =-=[27]-=-. This partially explains why sparse representations, that are so widely adopted for the construction of generative models, had such a modest success in the construction of discriminative models. In t... |

50 | Learning fast approximations of sparse coding
- Gregor, LeCun
- 2010
(Show Context)
Citation Context ...d amount of time. However, the large number of degrees of freedom, for which a good initialization is difficult to provide, made this effort only modestly successful. In their inspiring recent paper, =-=[30]-=- showed that a particular network architecture can be derived from the iterative shrinkage-thresholding (ISTA) [17] and proximal coordinate descent (CoD) algorithms [19]. These works were among the fi... |

36 | C-Hilasso: a collaborative hierarchical sparse modeling framework
- Sprechmann, Ramirez, et al.
- 2011
(Show Context)
Citation Context |

24 | trace-norm and max-norm
- Srebro, Shraibman, et al.
(Show Context)
Citation Context ...arsimony. A recent series of works have elucidated the beautiful relationship between sparsity and low rank representations, showing that rank minimization can be achieved through convex optimization =-=[11, 12]-=-. The combination of low-rank and sparse models paved the path to new robust alternatives of principal component analysis (RPCA) [13, 14] and nonnegative matrix factorization (RNMF) [15], and addressi... |

23 | Exploiting statistical dependencies in sparse representations for signal recovery
- Peleg, Eldar, et al.
- 2012
(Show Context)
Citation Context |

21 |
On the improvement of singing voice separation for monaural recordings using the mir-1k dataset
- Hsu, Jang
- 2010
(Show Context)
Citation Context ...inging voice separation We now evaluate the source separation problem (singing-voice/backgroundaccompaniment), described in Section 4.4. The separation performance was evaluated on the MIR-1K dataset =-=[48]-=-, containing 1,000 Chinese karaoke clips performed by amateur singers. The experimental settings closely followed that of [48], to which the reader is referred for further details. As the evaluation c... |

16 | Robust PCA as bilinear decomposition with outliersparsity regularization
- Mateos, Giannakis
(Show Context)
Citation Context ...cently exploited in parallel processing across multiple processors to produce state-ofthe-art algorithms for matrix completion problems [25], as well as an alternative approach to robustifying PCA in =-=[39]-=-. In (7), neither the rank of L nor the level of sparsity in O are assumed known a priori. However, in many applications, it is a reasonable to have a rough upper bound of the rank, say rank(L) ≤ q. C... |

13 | Accelerated low-rank visual recovery by random projection
- Mu, Dong, et al.
(Show Context)
Citation Context ...a major computational bottleneck. The quest for efficiently solving sparse representation pursuit has given rise to a rich family of algorithms, both for sparse coding [17, 18, 19, 20, 21, 22] , RPCA =-=[13, 23, 24, 25]-=- and NMF [16, 26] problems. Despite the permanent progress reported in the literature, the state-of-the-art algorithms require hundreds or thousands of iterations to converge, making their use impract... |

8 |
Coordinate descent optimization for `1 minimization with application to compressed sensing; a greedy algorithm
- Li, Osher
- 2009
(Show Context)
Citation Context |

8 |
The cosparse analysis model and algorithms. Applied and Computational Harmonic Analysis
- Nam, Davies, et al.
- 2012
(Show Context)
Citation Context ...en several order of magnitude speedups when compared to exact solvers. While we limited our attention to synthesis models, the proposed framework can be naturally extended to analysis cosparse models =-=[52, 53]-=-, in which the signal is known to be sparse in a transformed domain. Specifically, given a “sensing” matrix M ∈ Rn×q and an analysis dictionary Ω ∈ Rp×m, in an analysis counterpart of (21), one looks ... |

6 | Learning efficient structured sparse models
- Sprechmann, Bronstein, et al.
- 2012
(Show Context)
Citation Context ... than real-time state-of-the-art results are achieved in several such applications. The present paper generalizes and gives a more rigorous treatment to results previously published by the authors in =-=[34, 35]-=-. 2 Parsimonious models Let X ∈ Rm×n be a give data matrix. In this work, we concentrate our attention on the general parsimonious modeling problem that can be posed as the solution of the minimizatio... |

6 |
Robust sparse analysis regularization. arXiv:1109.6222
- Vaiter, Peyré, et al.
- 2011
(Show Context)
Citation Context ...en several order of magnitude speedups when compared to exact solvers. While we limited our attention to synthesis models, the proposed framework can be naturally extended to analysis cosparse models =-=[52, 53]-=-, in which the signal is known to be sparse in a transformed domain. Specifically, given a “sensing” matrix M ∈ Rn×q and an analysis dictionary Ω ∈ Rp×m, in an analysis counterpart of (21), one looks ... |

5 |
Robust non-negative matrix factorization
- Zhang, Chen, et al.
- 2011
(Show Context)
Citation Context ...imization [11, 12]. The combination of low-rank and sparse models paved the path to new robust alternatives of principal component analysis (RPCA) [13, 14] and nonnegative matrix factorization (RNMF) =-=[15]-=-, and addressing challenging matrix completion problems [12]. RPCA was also found useful in important applications such as face recognition and modeling, background modeling, and audio source separati... |

4 |
Giannakis, “Unveiling network anomalies in large-scale networks via sparsity and low rank
- Mardani, Mateos, et al.
- 2011
(Show Context)
Citation Context ...etting (3)). However, problem (9) is no longer convex. Fortunately, it can be shown that any stationary point of (9), {D0,S,O}, satisfying ||X − D0S − O||2 ≤ λ∗ is an globally optimal solution of (9) =-=[40]-=-. Thus, problem (9) can be solved using an alternating minimization, as in our online setting, without the risk of falling into a stationary point that is not globally optimal. 2.3 Non-negative matrix... |

4 | Collaborative sources identification in mixed signals via hierarchical sparse modeling, tech
- SPRECHMANN, RAMIREZ, et al.
- 2010
(Show Context)
Citation Context ...entification misclassification rates. Code Error rate Exact HiLasso 2.35% NN CoD (Approximation) 6.08% NN BCoD (Approximation) 3.53% NN BCoD (Discriminative) 3.44% identification task reproduced from =-=[47]-=-. In this application the authors use hierarchical sparse coding to automatically detect the speakers in a given mixed signal. The dataset consists of recordings of five different radio speakers, two ... |

1 |
online singing voice separation from monaural recordings using robust low-rank modeling
- “Real-time
- 2012
(Show Context)
Citation Context ... than real-time state-of-the-art results are achieved in several such applications. The present paper generalizes and gives a more rigorous treatment to results previously published by the authors in =-=[34, 35]-=-. 2 Parsimonious models Let X ∈ Rm×n be a give data matrix. In this work, we concentrate our attention on the general parsimonious modeling problem that can be posed as the solution of the minimizatio... |

1 |
Learnable low rank sparse models for speech denoising,” arXiv.org:1221.1288
- Sprechmann, Bronstein, et al.
- 2012
(Show Context)
Citation Context ...ion is typically non-invertible and is intended to capture various invariant properties of the data. As an illustration, we use the simultaneous speech denoising and speaker identification model from =-=[45]-=-, in which the spectrogram of the input signal is decomposed into X ≈ D0S + DO, where D0S capturing the noise is required to be low-rank, while the activation O representing the speech is required to ... |