## Unsupervised Learning by Convex and Conic Coding (1997)

### Cached

### Download Links

- [hebb.mit.edu]
- [hebb.mit.edu]
- [seunglab.mit.edu]
- CiteULike
- DBLP

### Other Repositories/Bibliography

Venue: | Advances in Neural Information Processing Systems 9 |

Citations: | 34 - 6 self |

### BibTeX

@INPROCEEDINGS{Lee97unsupervisedlearning,

author = {D. D. Lee and H.S. Seung},

title = {Unsupervised Learning by Convex and Conic Coding},

booktitle = {Advances in Neural Information Processing Systems 9},

year = {1997},

pages = {515--521},

publisher = {MIT Press}

}

### OpenURL

### Abstract

Unsupervised learning algorithms based on convex and conic encoders are proposed. The encoders find the closest convex or conic combination of basis vectors to the input. The learning algorithms produce basis vectors that minimize the reconstruction error of the encoders. The convex algorithm develops locally linear models of the input, while the conic algorithm discovers features. Both algorithms are used to model handwritten digits and compared with vector quantization and principal component analysis. The neural network implementations involve feedback connections that project a reconstruction back to the input layer. 1 Introduction Vector quantization (VQ) and principal component analysis (PCA) are two widely used unsupervised learning algorithms, based on two fundamentally different ways of encoding data. In VQ, the input is encoded as the index of the closest prototype stored in memory. In PCA, the input is encoded as the coefficients of a linear superposition of a set of basis ...

### Citations

966 |
Emergence of simple-cell receptive field properties by learning a sparse code for natural images
- OLSHAUSEN
- 1996
(Show Context)
Citation Context ...negativity constraints in the optimization of Eq. (1). This method of obtaining sparse encodings is distinct from the method of simply truncating a linear combination by discarding small coefficients =-=[3]-=-. 3 Learning There correspond learning algorithms for each of the encoders described above that minimize the average reconstruction error over an ensemble of inputs. If a training set of m examples is... |

317 |
L.D.: Backpropagation applied to handwritten Zip code recognition
- LeCun, Boser, et al.
- 1989
(Show Context)
Citation Context ...g can be very difficult. The issue of local minima is discussed in the following example. 4 Example: modeling handwritten digits We applied Affine, Convex, Conic, and VQ learning to the USPS database =-=[4]-=-, which consists of examples of handwritten digits segmented from actual zip codes. Each of the 7291 training and 2007 test images were normalized to a 16 \Theta 16 grid Affine (PCA) VQ Conic Convex F... |

242 |
Ecient Pattern Recognition Using a New Transformation Distance
- Simard, Cun, et al.
- 1993
(Show Context)
Citation Context ... prior knowledge of invariances, such as the support vector machine (4.0% [5]). However, it is not as good as methods that do use prior knowledge, such as nearest neighbor with tangent distance (2.6% =-=[6]-=-). On the other hand, Conic coding with r = 25 results in an error rate of 6.8% (138 errors). With larger basis sets r ? 50, Conic shows worse performance as the features shrink to small spots. These ... |

186 | Extracting support data for a given task
- Scholkopf, Burges, et al.
- 1995
(Show Context)
Citation Context ...nt the overall nonlinear nature of the input distributions. This is good performance relative to other methods that do not use prior knowledge of invariances, such as the support vector machine (4.0% =-=[5]-=-). However, it is not as good as methods that do use prior knowledge, such as nearest neighbor with tangent distance (2.6% [6]). On the other hand, Conic coding with r = 25 results in an error rate of... |

153 | Modeling the manifolds of images of handwritten digits
- Hinton, Dayan, et al.
- 1997
(Show Context)
Citation Context ...e bound constraints on W ia . This performs stochastic gradient descent on the ensemble reconstruction error with learning rate j. 6 Discussion Convex coding is similar to other locally linear models =-=[8, 9, 10, 11]-=-. Distance to a convex hull was previously used in nearest neighbor classification [12], though no learning algorithm was proposed. Conic coding is similar to the noisy OR [13, 14] and harmonium [15] ... |

115 | Autoencoders, minimum description length and helmholtz free energy
- Hinton, Zemel
- 1993
(Show Context)
Citation Context ...ld be contrasted with computationally inefficient ones. A natural modification of the point encoder with combinatorial expressiveness can be obtained by allowing ~v to be any vector of zeros and ones =-=[1, 2]-=-. Unfortunately, with this constraint the optimization of Eq. (1) becomes an integer programming problem and is quite inefficient to solve. The convex and conic encodings of an input generally contain... |

76 |
and J.J.Hopfield, "Simple neural optimization network
- Tank
- 1986
(Show Context)
Citation Context ...inct from when it fits "9"'s. 5 Neural network implementation Conic and Convex were described above as matrix factorizations. Alternatively, the encoding can be performed by a neural network=-= dynamics [7]-=- and the learning by a synaptic update rule. We describe here the implementation for the Conic network; the Convex network is similar. The Conic network has a layer of N error neurons e i and a layer ... |

65 |
A multiple cause mixture model for unsupervised learning
- Saund
- 1995
(Show Context)
Citation Context ...linear models [8, 9, 10, 11]. Distance to a convex hull was previously used in nearest neighbor classification [12], though no learning algorithm was proposed. Conic coding is similar to the noisy OR =-=[13, 14]-=- and harmonium [15] models. The main difference is that these previous models contain discrete binary variables, whereas Conic uses continuous ones. The use of analog rather than binary variables make... |

63 | Unsupervised learning of distributions on binary vectors using 2-layer networks
- Freund, Haussler
- 1992
(Show Context)
Citation Context ..., 11]. Distance to a convex hull was previously used in nearest neighbor classification [12], though no learning algorithm was proposed. Conic coding is similar to the noisy OR [13, 14] and harmonium =-=[15]-=- models. The main difference is that these previous models contain discrete binary variables, whereas Conic uses continuous ones. The use of analog rather than binary variables makes the encoding comp... |

53 | Factorial learning and the EM algorithm
- Ghahramani
- 1995
(Show Context)
Citation Context ...ld be contrasted with computationally inefficient ones. A natural modification of the point encoder with combinatorial expressiveness can be obtained by allowing ~v to be any vector of zeros and ones =-=[1, 2]-=-. Unfortunately, with this constraint the optimization of Eq. (1) becomes an integer programming problem and is quite inefficient to solve. The convex and conic encodings of an input generally contain... |

53 | Nonlinear image interpolation using manifold learning
- Bregler, Omohundro
- 1995
(Show Context)
Citation Context ...e bound constraints on W ia . This performs stochastic gradient descent on the ensemble reconstruction error with learning rate j. 6 Discussion Convex coding is similar to other locally linear models =-=[8, 9, 10, 11]-=-. Distance to a convex hull was previously used in nearest neighbor classification [12], though no learning algorithm was proposed. Conic coding is similar to the noisy OR [13, 14] and harmonium [15] ... |

53 | Competition and multiple cause models
- Dayan, Zemel
- 1995
(Show Context)
Citation Context ...linear models [8, 9, 10, 11]. Distance to a convex hull was previously used in nearest neighbor classification [12], though no learning algorithm was proposed. Conic coding is similar to the noisy OR =-=[13, 14]-=- and harmonium [15] models. The main difference is that these previous models contain discrete binary variables, whereas Conic uses continuous ones. The use of analog rather than binary variables make... |

52 |
Emergence of simple-cell receptive eld properties by learning a sparse code for natural images
- Olshausen, Field
- 1996
(Show Context)
Citation Context ...onnegativity constraints in the optimization of Eq. (1). This method of obtaining sparse encodings is distinct from the method of simply truncating a linear combination by discarding small coe cients =-=[3]-=-. 3 Learning There correspond learning algorithms for each of the encoders described above that minimize the average reconstruction error over an ensemble of inputs. If a training set of m examples is... |

31 |
Detection and characterization of cluster substructure, II. Linear structure: Fuzzy c-varieties and convex combination thereof
- Bezdek, Coray, et al.
- 1981
(Show Context)
Citation Context ...e bound constraints on W ia . This performs stochastic gradient descent on the ensemble reconstruction error with learning rate j. 6 Discussion Convex coding is similar to other locally linear models =-=[8, 9, 10, 11]-=-. Distance to a convex hull was previously used in nearest neighbor classification [12], though no learning algorithm was proposed. Conic coding is similar to the noisy OR [13, 14] and harmonium [15] ... |

29 | E.: Learning prototype models for tangent distance
- Hastie, Simard, et al.
- 1994
(Show Context)
Citation Context |

1 |
Convex hull nearest neighbor rule
- Haas, Backer
- 1980
(Show Context)
Citation Context ...ction error with learning rate j. 6 Discussion Convex coding is similar to other locally linear models [8, 9, 10, 11]. Distance to a convex hull was previously used in nearest neighbor classification =-=[12]-=-, though no learning algorithm was proposed. Conic coding is similar to the noisy OR [13, 14] and harmonium [15] models. The main difference is that these previous models contain discrete binary varia... |