## Learning Overcomplete Representations (2000)

Citations: | 277 - 11 self |

### BibTeX

@MISC{Lewicki00learningovercomplete,

author = {Michael S. Lewicki and Terrence J. Sejnowski},

title = { Learning Overcomplete Representations},

year = {2000}

}

### Years of Citing Articles

### OpenURL

### Abstract

In an overcomplete basis, the number of basis vectors is greater than the dimensionality of the input, and the representation of an input is not a unique combination of basis vectors. Overcomplete representations have been advocated because they have greater robustness in the presence of noise, can be sparser, and can have greater flexibility in matching structure in the data. Overcomplete codes have also been proposed as a model of some of the response properties of neurons in primary visual cortex. Previous work has focused on finding the best representation of a signal using a fixed overcomplete basis (or dictionary). We present an algorithm for learning an overcomplete basis by viewing it as probabilistic model of the observed data. We show that overcomplete bases can yield a better approximation of the underlying statistical distribution of the data and can thus lead to greater coding efficiency. This can be viewed as a generalization of the technique of independent component analysis and provides a method for Bayesian reconstruction of signals in the presence of noise and for blind source separation when there are more sources than mixtures.

### Citations

2380 |
Density estimation for statistics and data analysis
- Silverman
- 1986
(Show Context)
Citation Context ...the data x i and is exact if A is orthogonal. We use the mean value of ffis i to obtain a single quantization level ffis for all coefficients. The function f(s) by applying kernel density estimation (=-=Silverman, 1986-=-) to the distribution of coefficients fit to a training data set. We use a Laplacian kernel with a window width of 2ffis. The most straightforward method of estimating the coding cost (in bits per pat... |

1761 | Atomic Decomposition by Basis Pursuit
- Chen, Donoho, et al.
- 2001
(Show Context)
Citation Context ...called overcomplete dictionaries), which allow a greater number of basis functions (also called dictionary elements) than samples in the input signal (Simoncelli et al., 1992; Mallat and Zhang, 1993; =-=Chen et al., 1996-=-). Overcomplete bases are typically constructed by merging a set of complete bases (e.g. Fourier, wavelet, and Gabor), or by adding basis functions to a complete basis (e.g. adding frequencies to a Fo... |

1413 |
Independent component analysis, a new concept
- Comon
- 1994
(Show Context)
Citation Context ...resentation, because they can better approximate the underlying statistical density of the input data. This also generalizes the technique of independent component analysis (Jutten and Herault, 1991; =-=Comon, 1994-=-; Bell and Sejnowski, 1995) and provides a method for the identification of more sources than mixtures. 2 Model We assume that each data vector, x = x 1 ; : : : ; xL , can be described with an overcom... |

1113 | Matching pursuit with time-frequency dictionaries
- Mallat, Zhang
- 1993
(Show Context)
Citation Context ...ercomplete" bases (also called overcomplete dictionaries), which allow a greater number of basis functions (also called dictionary elements) than samples in the input signal (Simoncelli et al., 1=-=992; Mallat and Zhang, 1993-=-; Chen et al., 1996). Overcomplete bases are typically constructed by merging a set of complete bases (e.g. Fourier, wavelet, and Gabor), or by adding basis functions to a complete basis (e.g. adding ... |

632 | Sparse coding with an overcomplete basis set: A strategy employed by V1
- Olshausen, Field
- 1997
(Show Context)
Citation Context ...n the special case of zero noise and a complete representation, i.e. A is invertible, this integral can be solved and leads to the well known ICA algorithm (MacKay, 1996; Pearlmutter and Parra, 1997; =-=Olshausen and Field, 1997-=-; Cardoso, 1997). But in the case of an overcomplete basis, such a solution is not possible. Some recent approaches have tried to approximate this integral by evaluating P (s)P (xjA; s) at its maximum... |

545 |
The wavelet transform, time-frequency localization and signal analysis
- Daubechies
- 1990
(Show Context)
Citation Context .... the representation of the signal. Developing efficient algorithms to solve this equation is an active area of research. One approach to removing the degeneracy in (1) is to place a constraint on s (=-=Daubechies, 1990-=-; Chen et al., 1996), e.g. find s satisfying (1) with minimum L 1 norm. A different approach is to iteratively construct a sparse representation of the signal (Coifman and Wickerhauser, 1992; Mallat a... |

529 | A new learning algorithm for blind signal separation
- Amari, Cichocki, et al.
- 1996
(Show Context)
Citation Context ...d the vector z involves only the derivative of the log prior. We note that in the case where A is square, this form of the rule is exactly the natural gradient ICA learning rule for the basis matrix (=-=Amari et al., 1996-=-). The difference in the more general case where A is rectangular is in how the coefficients s are calculated. In the standard ICA learning algorithm (A square, zero noise), the coefficients are given... |

518 | Entropy-based algorithms for best basis selection - Coifman, Wirkhauser - 1992 |

450 | Shiftable multiscale transforms - Simoncelli, Freeman, et al. - 1992 |

393 |
What is the goal of sensory coding
- Field
- 1994
(Show Context)
Citation Context ...alysis), but this yields no advantage to having an overcomplete representation because the underlying assumption is still that the data are Gaussian. An alternative choice, advocated by some authors (=-=Field, 1994-=-; Olshausen and Field, 1996; Chen et al., 1996), is to use priors that assume sparse representations. This is accomplished by priors that have high kurtosis, such as the Laplacian, P (s m ) / exp(\Gam... |

386 | Complete Discrete 2-D Gabor Transform by Neural Network for Image Analysis and Compression
- Daugman
- 1988
(Show Context)
Citation Context ...Inferring the internal state A general approach for optimizing s in the case of finite noise (ffl ? 0) and non-Gaussian P (s) is to use the gradient of the log posterior in an optimization algorithm (=-=Daugman, 1988-=-; Olshausen and Field, 1996). A suitable initial condition is s = A T x or s = A + x. An alternative method, which can be used when the prior is Laplacian and ffl = 0, is to view the problem as a line... |

338 |
Possible principles underlying the transformation of sensorymessages
- Barlow
- 1961
(Show Context)
Citation Context ...f redundancy reduction and maximizing the mutual information between the input and the representation (Nadal and Parga, 1994a, 1994b; Cardoso, 1997), which have been advocated by several researchers (=-=Barlow, 1961-=-; Hinton and Sejnowski, 1986; Barlow, 1989; Daugman, 1989; Linsker, 1988; Atick, 1992). 4.1 Fitting bases to the data distribution To understand how overcomplete representations can yield better appro... |

309 |
An information maximization approach to blind separation and blind convolution
- Bell, Sejnowski
- 1995
(Show Context)
Citation Context ...because they can better approximate the underlying statistical density of the input data. This also generalizes the technique of independent component analysis (Jutten and Herault, 1991; Comon, 1994; =-=Bell and Sejnowski, 1995-=-) and provides a method for the identification of more sources than mixtures. 2 Model We assume that each data vector, x = x 1 ; : : : ; xL , can be described with an overcomplete linear basis plus ad... |

307 | Self-organization in a perceptual network
- Linsker
- 1988
(Show Context)
Citation Context ...e input and the representation (Nadal and Parga, 1994a, 1994b; Cardoso, 1997), which have been advocated by several researchers (Barlow, 1961; Hinton and Sejnowski, 1986; Barlow, 1989; Daugman, 1989; =-=Linsker, 1988-=-; Atick, 1992). 4.1 Fitting bases to the data distribution To understand how overcomplete representations can yield better approximations to the underlying density, it is helpful to contrast different... |

300 |
Learning and relearning in Boltzmann machines
- Hinton, Sejnowski
- 1986
(Show Context)
Citation Context ...eduction and maximizing the mutual information between the input and the representation (Nadal and Parga, 1994a, 1994b; Cardoso, 1997), which have been advocated by several researchers (Barlow, 1961; =-=Hinton and Sejnowski, 1986-=-; Barlow, 1989; Daugman, 1989; Linsker, 1988; Atick, 1992). 4.1 Fitting bases to the data distribution To understand how overcomplete representations can yield better approximations to the underlying ... |

268 | An improved training algorithm for support vector machines
- Osuna, Freund, et al.
- 1997
(Show Context)
Citation Context ...This can solved efficiently and exactly with interior point linear programming methods (Chen et al., 1996). Quadratic programming approaches to this type of problem have also recently been suggested (=-=Osuna et al., 1997-=-) for similar problems. We have used both the linear programming and gradient-based methods. The linear programming methods were superior for finding exact solutions in the case of zero noise. The sta... |

233 |
Unsupervised learning
- BARLOW
- 1989
(Show Context)
Citation Context ...mutual information between the input and the representation (Nadal and Parga, 1994a, 1994b; Cardoso, 1997), which have been advocated by several researchers (Barlow, 1961; Hinton and Sejnowski, 1986; =-=Barlow, 1989-=-; Daugman, 1989; Linsker, 1988; Atick, 1992). 4.1 Fitting bases to the data distribution To understand how overcomplete representations can yield better approximations to the underlying density, it is... |

205 | Image compression via joint statistical characterization in the wavelet domain - Simoncelli, Freeman, et al. - 1999 |

146 | Nonlinear neurons in the low-noise limit: A factorial code maximizes information transfer
- NADAL, PARGA
- 1994
(Show Context)
Citation Context ...mentation-related issues such as synaptic noise are ignored, this is equivalent, to the methods of redundancy reduction and maximizing the mutual information between the input and the representation (=-=Nadal and Parga, 1994-=-a, 1994b; Cardoso, 1997), which have been advocated by several researchers (Barlow, 1961; Hinton and Sejnowski, 1986; Barlow, 1989; Daugman, 1989; Linsker, 1988; Atick, 1992). 4.1 Fitting bases to the... |

139 | Infomax and maximum likelihood for blind separation
- Cardoso
- 1997
(Show Context)
Citation Context ...s synaptic noise are ignored, this is equivalent, to the methods of redundancy reduction and maximizing the mutual information between the input and the representation (Nadal and Parga, 1994a, 1994b; =-=Cardoso, 1997-=-), which have been advocated by several researchers (Barlow, 1961; Hinton and Sejnowski, 1986; Barlow, 1989; Daugman, 1989; Linsker, 1988; Atick, 1992). 4.1 Fitting bases to the data distribution To u... |

116 | A probabilistic framework for the adaptation and comparison of image codes - Lewicki, Olshausen - 1999 |

109 | Maximum likelihood and covariant algorithms for independent component analysis. Draft manuscript, available from http://wol.ra.phy.cam.ac.uk/mackay/homepage.html
- MacKay
- 1996
(Show Context)
Citation Context ...he integral in equation 8 is intractable. In the special case of zero noise and a complete representation, i.e. A is invertible, this integral can be solved and leads to the well known ICA algorithm (=-=MacKay, 1996-=-; Pearlmutter and Parra, 1997; Olshausen and Field, 1997; Cardoso, 1997). But in the case of an overcomplete basis, such a solution is not possible. Some recent approaches have tried to approximate th... |

106 | Blind source separation of more sources than mixtures using overcomplete representations
- Lee, Lewicki, et al.
- 1999
(Show Context)
Citation Context ...ces are being mapped down to a smaller subspace and there is necessarily a loss of information. Nonetheless, is it possible to successfully separate three speakers on two channels with good fidelity (=-=Lee et al., 1998-=-). We have also shown, in the case of natural speech, that learned bases have better coding properties than commonly used representations such as the Fourier basis. In these examples, the learned basi... |

76 | Maximum likelihood blind source separation a contextsensitive generalization of ica
- Pearlmutter, Parra
- 1997
(Show Context)
Citation Context ... equation 8 is intractable. In the special case of zero noise and a complete representation, i.e. A is invertible, this integral can be solved and leads to the well known ICA algorithm (MacKay, 1996; =-=Pearlmutter and Parra, 1997-=-; Olshausen and Field, 1997; Cardoso, 1997). But in the case of an overcomplete basis, such a solution is not possible. Some recent approaches have tried to approximate this integral by evaluating P (... |

74 |
Spatially independent activity patterns in functional MRI data during the stroop color-naming task
- MJ, TP, et al.
- 1998
(Show Context)
Citation Context ...tten and Herault, 1991; Bell and Sejnowski, 1995), decomposition of electroencephalographic (EEG) signals (Makeig et al., 1996), and the analysis of functional magnetic resonance imaging (fMRI) data (=-=McKeown et al., 1998-=-). In all of these techniques, the number of basis vectors is equal to the number of inputs. Because these bases span the input space, they are complete and are sufficient to represent the data, but, ... |

49 |
Entropy reduction and decorrelation in visual coding by oriented neural receptive fields
- Daugman
- 1989
(Show Context)
Citation Context ...tion between the input and the representation (Nadal and Parga, 1994a, 1994b; Cardoso, 1997), which have been advocated by several researchers (Barlow, 1961; Hinton and Sejnowski, 1986; Barlow, 1989; =-=Daugman, 1989-=-; Linsker, 1988; Atick, 1992). 4.1 Fitting bases to the data distribution To understand how overcomplete representations can yield better approximations to the underlying density, it is helpful to con... |

45 |
Emergence of simple-cell receptive-field properties by learning a sparse code for natural images
- Olshausen, Field
- 1996
(Show Context)
Citation Context ...this yields no advantage to having an overcomplete representation because the underlying assumption is still that the data are Gaussian. An alternative choice, advocated by some authors (Field, 1994; =-=Olshausen and Field, 1996-=-; Chen et al., 1996), is to use priors that assume sparse representations. This is accomplished by priors that have high kurtosis, such as the Laplacian, P (s m ) / exp(\Gamma`js m j). Compared to a G... |

33 |
Inferring sparse, overcomplete image codes using an efficiient coding framework
- Lewicki, Olshausen
- 1997
(Show Context)
Citation Context ...lows the noise level to be set to zero. In this case, the model attempts to account for all of the variability in the data. This approach to denoising has been applied successfully to natural images (=-=Lewicki and Olshausen, 1998-=-). Another potential application is the blind separation of more sources than mixtures. For example, the two-dimensional examples in figure 3 can be viewed as a source separation problem in which a nu... |

33 | Redundancy Reduction and Independent Component Analysis: Conditions on Cumulants and Adaptive Approaches. Neural computation
- JP, Parga
- 1994
(Show Context)
Citation Context ...mentation-related issues such as synaptic noise are ignored, this is equivalent, to the methods of redundancy reduction and maximizing the mutual information between the input and the representation (=-=Nadal and Parga, 1994-=-a, 1994b; Cardoso, 1997), which have been advocated by several researchers (Barlow, 1961; Hinton and Sejnowski, 1986; Barlow, 1989; Daugman, 1989; Linsker, 1988; Atick, 1992). 4.1 Fitting bases to the... |

27 |
Blind separation of event-related brain responses into independent components
- Makeig, Jung, et al.
- 1997
(Show Context)
Citation Context ... effective in several applications such as blind source separation of mixed audio signals (Jutten and Herault, 1991; Bell and Sejnowski, 1995), decomposition of electroencephalographic (EEG) signals (=-=Makeig et al., 1996-=-), and the analysis of functional magnetic resonance imaging (fMRI) data (McKeown et al., 1998). In all of these techniques, the number of basis vectors is equal to the number of inputs. Because these... |

15 |
Blind Separation of Sources .1. An Adaptive Algorithm based
- Jutten, Herault
- 1991
(Show Context)
Citation Context ...ter and more efficient representation, because they can better approximate the underlying statistical density of the input data. This also generalizes the technique of independent component analysis (=-=Jutten and Herault, 1991-=-; Comon, 1994; Bell and Sejnowski, 1995) and provides a method for the identification of more sources than mixtures. 2 Model We assume that each data vector, x = x 1 ; : : : ; xL , can be described wi... |

7 |
Could information-theory provide an ecological theory of sensory processing
- Atick
- 1992
(Show Context)
Citation Context ... representation (Nadal and Parga, 1994a, 1994b; Cardoso, 1997), which have been advocated by several researchers (Barlow, 1961; Hinton and Sejnowski, 1986; Barlow, 1989; Daugman, 1989; Linsker, 1988; =-=Atick, 1992-=-). 4.1 Fitting bases to the data distribution To understand how overcomplete representations can yield better approximations to the underlying density, it is helpful to contrast different techniques f... |

5 | Numerical recipes in C: The art of scientific programming (2nd ed - Press, Teukolsky, et al. - 1992 |

2 |
BPMPD: An interior point linear programming solver. Code available online at: ftp://ftp.netlib.org/opt/bpmpd.tar.gz
- Meszaros
- 1997
(Show Context)
Citation Context ...size of 0.1 for the first 30 iterations which was reduced to 0.001 over the last 20. The most probable coefficients were obtained using a publicly available interior point linear programming package (=-=Meszaros, 1997-=-). Convergence of the learning algorithm was rapid, usually reaching the solution in less than 30 iterations. A solution was discarded if the magnitude of one of the basis vector dropped to zero. This... |

1 | Atomic decomposition by basis pursuit (Technical Rep - Chen, Donoho, et al. - 1996 |