Results 1 
6 of
6
Sparse and spurious: dictionary learning with noise and outliers
, 2014
"... A popular approach within the signal processing and machine learning communities consists in modelling signals as sparse linear combinations of atoms selected from a learned dictionary. While this paradigm has led to numerous empirical successes in various fields ranging from image to audio process ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
A popular approach within the signal processing and machine learning communities consists in modelling signals as sparse linear combinations of atoms selected from a learned dictionary. While this paradigm has led to numerous empirical successes in various fields ranging from image to audio processing, there have only been a few theoretical arguments supporting these evidences. In particular, sparse coding, or sparse dictionary learning, relies on a nonconvex procedure whose local minima have not been fully analyzed yet. In this paper, we consider a probabilistic model of sparse signals, and show that, with high probability, sparse coding admits a local minimum around the reference dictionary generating the signals. Our study takes into account the case of overcomplete dictionaries, noisy signals, and possible outliers, thus extending previous work limited to noiseless settings and/or undercomplete dictionaries. The analysis we conduct is nonasymptotic and makes it possible to understand how the key quantities of the problem, such as the coherence or the level of noise, can scale with respect to the dimension of the signals, the number of atoms, the sparsity and the number of observations. 1
SelectandSample for SpikeandSlab Sparse Coding
"... Abstract Probabilistic inference serves as a popular model for neural processing. It is still unclear, however, how approximate probabilistic inference can be accurate and scalable to very highdimensional continuous latent spaces. Especially as typical posteriors for sensory data can be expected t ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Probabilistic inference serves as a popular model for neural processing. It is still unclear, however, how approximate probabilistic inference can be accurate and scalable to very highdimensional continuous latent spaces. Especially as typical posteriors for sensory data can be expected to exhibit complex latent dependencies including multiple modes. Here, we study an approach that can efficiently be scaled while maintaining a richly structured posterior approximation under these conditions. As example model we use spikeandslab sparse coding for V1 processing, and combine latent subspace selection with Gibbs sampling (selectandsample). Unlike factored variational approaches, the method can maintain large numbers of posterior modes and complex latent dependencies. Unlike pure sampling, the method is scalable to very highdimensional latent spaces. Among all sparse coding approaches with nontrivial posterior approximations (MAP or ICAlike models), we report the largestscale results. In applications we firstly verify the approach by showing competitiveness in standard denoising benchmarks. Secondly, we use its scalability to, for the first time, study highlyovercomplete settings for V1 encoding using sophisticated posterior representations. More generally, our study shows that very accurate probabilistic inference for multimodal posteriors with complex dependencies is tractable, functionally desirable and consistent with models for neural inference.
Complete Dictionary Recovery Using Nonconvex Optimization
"... We consider the problem of recovering a complete (i.e., square and invertible) dictionary A0, from Y = A0X0 with Y ∈ Rn×p. This recovery setting is central to the theoretical understanding of dictionary learning. We give the first efficient algorithm that provably recoversA0 whenX0 has O (n) nonze ..."
Abstract
 Add to MetaCart
We consider the problem of recovering a complete (i.e., square and invertible) dictionary A0, from Y = A0X0 with Y ∈ Rn×p. This recovery setting is central to the theoretical understanding of dictionary learning. We give the first efficient algorithm that provably recoversA0 whenX0 has O (n) nonzeros per column, under suitable probability model forX0. Prior results provide recovery guarantees whenX0 has only O ( n) nonzeros per column. Our algorithm is based on nonconvex optimization with a spherical constraint, and hence is naturally phrased in the language of manifold optimization. Our proofs give a geometric characterization of the highdimensional objective landscape, which shows that with high probability there are no spurious local minima. Experiments with synthetic data corroborate our theory. Full version of this paper is available online: