Results 1 -
6 of
6
Sparse and spurious: dictionary learning with noise and outliers
, 2014
"... A popular approach within the signal processing and machine learning communities consists in mod-elling signals as sparse linear combinations of atoms selected from a learned dictionary. While this paradigm has led to numerous empirical successes in various fields ranging from image to audio process ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
A popular approach within the signal processing and machine learning communities consists in mod-elling signals as sparse linear combinations of atoms selected from a learned dictionary. While this paradigm has led to numerous empirical successes in various fields ranging from image to audio process-ing, there have only been a few theoretical arguments supporting these evidences. In particular, sparse coding, or sparse dictionary learning, relies on a non-convex procedure whose local minima have not been fully analyzed yet. In this paper, we consider a probabilistic model of sparse signals, and show that, with high probability, sparse coding admits a local minimum around the reference dictionary generating the signals. Our study takes into account the case of over-complete dictionaries, noisy signals, and possible outliers, thus extending previous work limited to noiseless settings and/or under-complete dictionaries. The analysis we conduct is non-asymptotic and makes it possible to understand how the key quantities of the problem, such as the coherence or the level of noise, can scale with respect to the dimension of the signals, the number of atoms, the sparsity and the number of observations. 1
Select-and-Sample for Spike-and-Slab Sparse Coding
"... Abstract Probabilistic inference serves as a popular model for neural processing. It is still unclear, however, how approximate probabilistic inference can be accurate and scalable to very high-dimensional continuous latent spaces. Especially as typical posteriors for sensory data can be expected t ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Probabilistic inference serves as a popular model for neural processing. It is still unclear, however, how approximate probabilistic inference can be accurate and scalable to very high-dimensional continuous latent spaces. Especially as typical posteriors for sensory data can be expected to exhibit complex latent dependencies including multiple modes. Here, we study an approach that can efficiently be scaled while maintaining a richly structured posterior approximation under these conditions. As example model we use spike-and-slab sparse coding for V1 processing, and combine latent subspace selection with Gibbs sampling (selectand-sample). Unlike factored variational approaches, the method can maintain large numbers of posterior modes and complex latent dependencies. Unlike pure sampling, the method is scalable to very high-dimensional latent spaces. Among all sparse coding approaches with non-trivial posterior approximations (MAP or ICAlike models), we report the largest-scale results. In applications we firstly verify the approach by showing competitiveness in standard denoising benchmarks. Secondly, we use its scalability to, for the first time, study highly-overcomplete settings for V1 encoding using sophisticated posterior representations. More generally, our study shows that very accurate probabilistic inference for multi-modal posteriors with complex dependencies is tractable, functionally desirable and consistent with models for neural inference.
Complete Dictionary Recovery Using Nonconvex Optimization
"... We consider the problem of recovering a complete (i.e., square and invertible) dictionary A0, from Y = A0X0 with Y ∈ Rn×p. This recovery set-ting is central to the theoretical understanding of dictionary learning. We give the first efficient al-gorithm that provably recoversA0 whenX0 has O (n) nonze ..."
Abstract
- Add to MetaCart
We consider the problem of recovering a complete (i.e., square and invertible) dictionary A0, from Y = A0X0 with Y ∈ Rn×p. This recovery set-ting is central to the theoretical understanding of dictionary learning. We give the first efficient al-gorithm that provably recoversA0 whenX0 has O (n) nonzeros per column, under suitable proba-bility model forX0. Prior results provide recov-ery guarantees whenX0 has only O ( n) nonze-ros per column. Our algorithm is based on non-convex optimization with a spherical constraint, and hence is naturally phrased in the language of manifold optimization. Our proofs give a geomet-ric characterization of the high-dimensional objec-tive landscape, which shows that with high prob-ability there are no spurious local minima. Ex-periments with synthetic data corroborate our the-ory. Full version of this paper is available online: