Results 11  20
of
445
Building highlevel features using large scale unsupervised learning
 In International Conference on Machine Learning, 2012. 103
"... We consider the problem of building highlevel, classspecific feature detectors from only unlabeled data. For example, is it possible to learn a face detector using only unlabeled images? To answer this, we train a 9layered locally connected sparse autoencoder withpoolingandlocalcontrastnormalizati ..."
Abstract

Cited by 180 (9 self)
 Add to MetaCart
(Show Context)
We consider the problem of building highlevel, classspecific feature detectors from only unlabeled data. For example, is it possible to learn a face detector using only unlabeled images? To answer this, we train a 9layered locally connected sparse autoencoder withpoolingandlocalcontrastnormalization on a large dataset of images (the model has 1 billion connections, the dataset has 10 million 200x200 pixel images downloaded from the Internet). We train this network using model parallelism and asynchronous SGD on a cluster with 1,000 machines (16,000 cores) for three days. Contrary to what appears to be a widelyheld intuition, our experimental results reveal that it is possible to train a face detector without having to label images as containingafaceornot. Controlexperiments show that this feature detector is robust not only to translation but also to scaling and outofplane rotation. We also find that the same network is sensitive to other highlevel concepts such as cat faces and human bodies. Starting with these learned features, we trained our network to obtain 15.8 % accuracy in recognizing 20,000 object categories from ImageNet, a leap of 70 % relative improvement over the previous stateoftheart.
Sparse deep belief net model for visual area V2
 Advances in Neural Information Processing Systems 20
, 2008
"... Abstract 1 Motivated in part by the hierarchical organization of the neocortex, a number of recently proposed algorithms have tried to learn hierarchical, or “deep, ” structure from unlabeled data. While several authors have formally or informally compared their algorithms to computations performed ..."
Abstract

Cited by 164 (19 self)
 Add to MetaCart
(Show Context)
Abstract 1 Motivated in part by the hierarchical organization of the neocortex, a number of recently proposed algorithms have tried to learn hierarchical, or “deep, ” structure from unlabeled data. While several authors have formally or informally compared their algorithms to computations performed in visual area V1 (and the cochlea), little attempt has been made thus far to evaluate these algorithms in terms of their fidelity for mimicking computations at deeper levels in the cortical hierarchy. This thesis describes an unsupervised learning model that faithfully mimics certain properties of visual area V2. Specifically, we develop a sparse variant of the deep belief networks described by Hinton et al. (2006). We learn two layers of representation in the network, and demonstrate that the first layer, similar to prior work on sparse coding and ICA, results in localized, oriented, edge filters, similar to the gabor functions known to model simple cell receptive fields in area V1. Further, the second layer in our model encodes various combinations of the first layer responses in the data. Specifically, it picks up both collinear (“contour”) features as well as corners and junctions. More interestingly, in a quantitative comparison, the encoding of these more complex “corner ” features matches well with the results from Ito & Komatsu’s study of neural responses to angular stimuli in area V2 of the macaque. This suggests that our sparse variant of deep belief networks holds promise for modeling more higherorder features that are encoded in visual cortex. Conversely, one may also interpret the results reported here as suggestive that visual area V2 is performing computations on its input similar to those performed in (sparse) deep belief networks. This plausible relationship generates some intriguing hypotheses about V2 computations. 1 This thesis is an extended version of an earlier paper by Honglak Lee, Chaitanya Ekanadham, and Andrew Ng titled “Sparse deep belief net model for visual area V2.” 1
Nonlinear Learning using Local Coordinate Coding
"... This paper introduces a new method for semisupervised learning on high dimensional nonlinear manifolds, which includes a phase of unsupervised basis learning and a phase of supervised function learning. The learned bases provide a set of anchor points to form a local coordinate system, such that ea ..."
Abstract

Cited by 122 (8 self)
 Add to MetaCart
(Show Context)
This paper introduces a new method for semisupervised learning on high dimensional nonlinear manifolds, which includes a phase of unsupervised basis learning and a phase of supervised function learning. The learned bases provide a set of anchor points to form a local coordinate system, such that each data point x on the manifold can be locally approximated by a linear combination of its nearby anchor points, and the linear weights become its local coordinate coding. We show that a high dimensional nonlinear function can be approximated by a global linear function with respect to this coding scheme, and the approximation quality is ensured by the locality of such coding. The method turns a difficult nonlinear learning problem into a simple global linear learning problem, which overcomes some drawbacks of traditional local learning methods. 1
Learning Invariant Features through Topographic Filter Maps
"... Several recentlyproposed architectures for highperformance object recognition are composed of two main stages: a feature extraction stage that extracts locallyinvariant feature vectors from regularly spaced image patches, and a somewhat generic supervised classifier. The first stage is often compos ..."
Abstract

Cited by 119 (20 self)
 Add to MetaCart
(Show Context)
Several recentlyproposed architectures for highperformance object recognition are composed of two main stages: a feature extraction stage that extracts locallyinvariant feature vectors from regularly spaced image patches, and a somewhat generic supervised classifier. The first stage is often composed of three main modules: (1) a bank of filters (often oriented edge detectors); (2) a nonlinear transform, such as a pointwise squashing functions, quantization, or normalization; (3) a spatial pooling operation which combines the outputs of similar filters over neighboring regions. We propose a method that automatically learns such feature extractors in an unsupervised fashion by simultaneously learning the filters and the pooling units that combine multiple filter outputs together. The method automatically generates topographic maps of similar filters that extract features of orientations, scales, and positions. These similar filters are pooled together, producing locallyinvariant outputs. The learned feature descriptors give comparable results as SIFT on image recognition tasks for which SIFT is well suited, and better results than SIFT on tasks for which SIFT is less well suited. 1.
Exploring large feature spaces with hierarchical MKL
, 2008
"... For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or H ..."
Abstract

Cited by 110 (22 self)
 Add to MetaCart
(Show Context)
For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsityinducing norms such as the ℓ 1norm or the block ℓ 1norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsityinducing norms leads to stateoftheart predictive performance. 1
Sparse Online Learning via Truncated Gradient
"... We propose a general method called truncated gradient to induce sparsity in the weights of onlinelearning algorithms with convex loss. This method has several essential properties. First, the degree of sparsity is continuous—a parameter controls the rate of sparsification from no sparsification to ..."
Abstract

Cited by 107 (4 self)
 Add to MetaCart
(Show Context)
We propose a general method called truncated gradient to induce sparsity in the weights of onlinelearning algorithms with convex loss. This method has several essential properties. First, the degree of sparsity is continuous—a parameter controls the rate of sparsification from no sparsification to total sparsification. Second, the approach is theoretically motivated, and an instance of it can be regarded as an online counterpart of the popular L1regularization method in the batch setting. We prove small rates of sparsification result in only small additional regret with respect to typical onlinelearning guarantees. Finally, the approach works well empirically. We apply it to several datasets and find for datasets with large numbers of features, substantial sparsity is discoverable. 1
Learning A Discriminative Dictionary for Sparse Coding via Label Consistent KSVD
"... A label consistent KSVD (LCKSVD) algorithm to learn a discriminative dictionary for sparse coding is presented. In addition to using class labels of training data, we also associate label information with each dictionary item (columns of the dictionary matrix) to enforce discriminability in sparse ..."
Abstract

Cited by 88 (9 self)
 Add to MetaCart
A label consistent KSVD (LCKSVD) algorithm to learn a discriminative dictionary for sparse coding is presented. In addition to using class labels of training data, we also associate label information with each dictionary item (columns of the dictionary matrix) to enforce discriminability in sparse codes during the dictionary learning process. More specifically, we introduce a new label consistent constraint called ‘discriminative sparsecode error ’ and combine it with the reconstruction error and the classification error to form a unified objective function. The optimal solution is efficiently obtained using the KSVD algorithm. Our algorithm learns a single overcomplete dictionary and an optimal linear classifier jointly. It yields dictionaries so that feature points with the same class labels have similar sparse codes. Experimental results demonstrate that our algorithm outperforms many recently proposed sparse coding techniques for face and object category recognition under the same learning conditions. 1.
Proximal Methods for Hierarchical Sparse Coding
, 2010
"... Sparse coding consists in representing signals as sparse linear combinations of atoms selected from a dictionary. We consider an extension of this framework where the atoms are further assumed to be embedded in a tree. This is achieved using a recently introduced treestructured sparse regularizatio ..."
Abstract

Cited by 83 (18 self)
 Add to MetaCart
(Show Context)
Sparse coding consists in representing signals as sparse linear combinations of atoms selected from a dictionary. We consider an extension of this framework where the atoms are further assumed to be embedded in a tree. This is achieved using a recently introduced treestructured sparse regularization norm, which has proven useful in several applications. This norm leads to regularized problems that are difficult to optimize, and we propose in this paper efficient algorithms for solving them. More precisely, we show that the proximal operator associated with this norm is computable exactly via a dual approach that can be viewed as the composition of elementary proximal operators. Our procedure has a complexity linear, or close to linear, in the number of atoms, and allows the use of accelerated gradient techniques to solve the treestructured sparse approximation problem at the same computational cost as traditional ones using the ℓ1norm. Our method is efficient and scales gracefully to millions of variables, which we illustrate in two types of applications: first, we consider fixed hierarchical dictionaries of wavelets to denoise natural images. Then, we apply our optimization tools in the context of dictionary learning, where learned dictionary elements naturally organize in a prespecified arborescent structure, leading to a better performance in reconstruction of natural image patches. When applied to text documents, our method learns hierarchies of topics, thus providing a competitive alternative to probabilistic topic models.
Supervised translationinvariant sparse coding
 IN: IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION
, 2010
"... In this paper, we propose a novel supervised hierarchical sparse coding model based on local image descriptors for classification tasks. The supervised dictionary training is performed via backprojection, by minimizing the training error of classifying the image level features, which are extracted ..."
Abstract

Cited by 76 (8 self)
 Add to MetaCart
In this paper, we propose a novel supervised hierarchical sparse coding model based on local image descriptors for classification tasks. The supervised dictionary training is performed via backprojection, by minimizing the training error of classifying the image level features, which are extracted by max pooling over the sparse codes within a spatial pyramid. Such a max pooling procedure across multiple spatial scales offer the model translation invariant properties, similar to the Convolutional Neural Network (CNN). Experiments show that our supervised dictionary improves the performance of the proposed model significantly over the unsupervised dictionary, leading to stateoftheart performance on diverse image databases. Further more, our supervised model targets learning linear features, implying its great potential in handling large scale datasets in real applications.