Results 1  10
of
84
What is the Best MultiStage Architecture for Object Recognition?
"... In many recent object recognition systems, feature extraction stages are generally composed of a filter bank, a nonlinear transformation, and some sort of feature pooling layer. Most systems use only one stage of feature extraction in which the filters are hardwired, or two stages where the filter ..."
Abstract

Cited by 122 (19 self)
 Add to MetaCart
(Show Context)
In many recent object recognition systems, feature extraction stages are generally composed of a filter bank, a nonlinear transformation, and some sort of feature pooling layer. Most systems use only one stage of feature extraction in which the filters are hardwired, or two stages where the filters in one or both stages are learned in supervised or unsupervised mode. This paper addresses three questions: 1. How does the nonlinearities that follow the filter banks influence the recognition accuracy? 2. does learning the filter banks in an unsupervised or supervised manner improve the performance over random filters or hardwired filters? 3. Is there any advantage to using an architecture with two stages of feature extraction, rather than one? We show that using nonlinearities that include rectification and local contrast normalization is the single most important ingredient for good accuracy on object recognition benchmarks. We show that two stages of feature extraction yield better accuracy than one. Most surprisingly, we show that a twostage system with random filters can yield almost 63 % recognition rate on Caltech101, provided that the proper nonlinearities and pooling layers are used. Finally, we show that with supervised refinement, the system achieves stateoftheart performance on NORB dataset (5.6%) and unsupervised pretraining followed by supervised refinement produces good accuracy on Caltech101 (> 65%), and the lowest known error rate on the undistorted, unprocessed MNIST dataset (0.53%). 1.
Structured variable selection with sparsityinducing norms
, 2011
"... We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsityinducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1norm and the group ℓ1norm by allowing the subsets to ov ..."
Abstract

Cited by 99 (19 self)
 Add to MetaCart
(Show Context)
We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsityinducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1norm and the group ℓ1norm by allowing the subsets to overlap. This leads to a specific set of allowed nonzero patterns for the solutions of such problems. We first explore the relationship between the groups defining the norm and the resulting nonzero patterns, providing both forward and backward algorithms to go back and forth from groups to patterns. This allows the design of norms adapted to specific prior knowledge expressed in terms of nonzero patterns. We also present an efficient active set algorithm, and analyze the consistency of variable selection for leastsquares linear regression in low and highdimensional settings.
Modeling Pixel Means and Covariances Using Factorized ThirdOrder Boltzmann Machines
, 2010
"... Learning a generative model of natural images is a useful way of extracting features that capture interesting regularities. Previous work on learning such models has focused on methods in which the latent features are used to determine the mean and variance of each pixel independently, or on methods ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
Learning a generative model of natural images is a useful way of extracting features that capture interesting regularities. Previous work on learning such models has focused on methods in which the latent features are used to determine the mean and variance of each pixel independently, or on methods in which the hidden units determine the covariance matrix of a zeromean Gaussian distribution. In this work, we propose a probabilistic model that combines these two approaches into a single framework. We represent each image using one set of binary latent features that model the imagespecific covariance and a separate set that model the mean. We show that this approach provides a probabilistic framework for the widely used simplecell complexcell architecture, it produces very realistic samples of natural images and it extracts features that yield stateoftheart recognition accuracy on the challenging CIFAR 10 dataset.
Supervised translationinvariant sparse coding
 IN: IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION
, 2010
"... In this paper, we propose a novel supervised hierarchical sparse coding model based on local image descriptors for classification tasks. The supervised dictionary training is performed via backprojection, by minimizing the training error of classifying the image level features, which are extracted ..."
Abstract

Cited by 34 (6 self)
 Add to MetaCart
In this paper, we propose a novel supervised hierarchical sparse coding model based on local image descriptors for classification tasks. The supervised dictionary training is performed via backprojection, by minimizing the training error of classifying the image level features, which are extracted by max pooling over the sparse codes within a spatial pyramid. Such a max pooling procedure across multiple spatial scales offer the model translation invariant properties, similar to the Convolutional Neural Network (CNN). Experiments show that our supervised dictionary improves the performance of the proposed model significantly over the unsupervised dictionary, leading to stateoftheart performance on diverse image databases. Further more, our supervised model targets learning linear features, implying its great potential in handling large scale datasets in real applications.
Factored 3Way Restricted Boltzmann Machines For Modeling Natural Images
, 2010
"... Deep belief nets have been successful in modeling handwritten characters, but it has proved more difficult to apply them to real images. The problem lies in the restricted Boltzmann machine (RBM) which is used as a module for learning deep belief nets one layer at a time. The GaussianBinary RBMs th ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
Deep belief nets have been successful in modeling handwritten characters, but it has proved more difficult to apply them to real images. The problem lies in the restricted Boltzmann machine (RBM) which is used as a module for learning deep belief nets one layer at a time. The GaussianBinary RBMs that have been used to model realvalued data are not a good way to model the covariance structure of natural images. We propose a factored 3way RBM that uses the states of its hidden units to represent abnormalities in the local covariance structure of an image. This provides a probabilistic framework for the widely used simple/complex cell architecture. Our model learns binary features that work very well for object recognition on the “tiny images” data set. Even better features are obtained by then using standard binary RBM’s to learn a deeper model.
Kernel Descriptors for Visual Recognition
"... The design of lowlevel image features is critical for computer vision algorithms. Orientation histograms, such as those in SIFT [16] and HOG [3], are the most successful and popular features for visual object and scene recognition. We highlight the kernel view of orientation histograms, and show th ..."
Abstract

Cited by 28 (13 self)
 Add to MetaCart
(Show Context)
The design of lowlevel image features is critical for computer vision algorithms. Orientation histograms, such as those in SIFT [16] and HOG [3], are the most successful and popular features for visual object and scene recognition. We highlight the kernel view of orientation histograms, and show that they are equivalent to a certain type of match kernels over image patches. This novel view allows us to design a family of kernel descriptors which provide a unified and principled framework to turn pixel attributes (gradient, color, local binary pattern, etc.) into compact patchlevel features. In particular, we introduce three types of match kernels to measure similarities between image patches, and construct compact lowdimensional kernel descriptors from these match kernels using kernel principal component analysis (KPCA) [23]. Kernel descriptors are easy to design and can turn any type of pixel attribute into patchlevel features. They outperform carefully tuned and sophisticated features including SIFT and deep belief networks. We report superior performance on standard image classification benchmarks: Scene15, Caltech101, CIFAR10 and CIFAR10ImageNet. 1
TaskDriven Dictionary Learning
"... Abstract—Modeling data with linear combinations of a few elements from a learned dictionary has been the focus of much recent research in machine learning, neuroscience, and signal processing. For signals such as natural images that admit such sparse representations, it is now well established that ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Modeling data with linear combinations of a few elements from a learned dictionary has been the focus of much recent research in machine learning, neuroscience, and signal processing. For signals such as natural images that admit such sparse representations, it is now well established that these models are well suited to restoration tasks. In this context, learning the dictionary amounts to solving a largescale matrix factorization problem, which can be done efficiently with classical optimization tools. The same approach has also been used for learning features from data for other purposes, e.g., image classification, but tuning the dictionary in a supervised way for these tasks has proven to be more difficult. In this paper, we present a general formulation for supervised dictionary learning adapted to a wide variety of tasks, and present an efficient algorithm for solving the corresponding optimization problem. Experiments on handwritten digit classification, digital art identification, nonlinear inverse image problems, and compressed sensing demonstrate that our approach is effective in largescale settings, and is well suited to supervised and semisupervised classification, as well as regression tasks for data that admit sparse representations. Index Terms—Basis pursuit, Lasso, dictionary learning, matrix factorization, semisupervised learning, compressed sensing. Ç 1
Tiled convolutional neural networks
 In NIPS, in press
, 2010
"... Convolutional neural networks (CNNs) have been successfully applied to many tasks such as digit and object recognition. Using convolutional (tied) weights significantly reduces the number of parameters that have to be learned, and also allows translational invariance to be hardcoded into the archit ..."
Abstract

Cited by 24 (6 self)
 Add to MetaCart
(Show Context)
Convolutional neural networks (CNNs) have been successfully applied to many tasks such as digit and object recognition. Using convolutional (tied) weights significantly reduces the number of parameters that have to be learned, and also allows translational invariance to be hardcoded into the architecture. In this paper, we consider the problem of learning invariances, rather than relying on hardcoding. We propose tiled convolution neural networks (Tiled CNNs), which use a regular “tiled ” pattern of tied weights that does not require that adjacent hidden units share identical weights, but instead requires only that hidden units k steps away from each other to have tied weights. By pooling over neighboring units, this architecture is able to learn complex invariances (such as scale and rotational invariance) beyond translational invariance. Further, it also enjoys much of CNNs’ advantage of having a relatively small number of learned parameters (such as ease of learning and greater scalability). We provide an efficient learning algorithm for Tiled CNNs based on Topographic ICA, and show that learning complex invariant features allows us to achieve highly competitive results for both the NORB and CIFAR10 datasets. 1
Efficient Highly OverComplete Sparse Coding using a Mixture Model
"... Abstract. Sparse coding of sensory data has recently attracted notable attention in research of learning useful features from the unlabeled data. Empirical studies show that mapping the data into a significantly higherdimensional space with sparse coding can lead to superior classification performan ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Sparse coding of sensory data has recently attracted notable attention in research of learning useful features from the unlabeled data. Empirical studies show that mapping the data into a significantly higherdimensional space with sparse coding can lead to superior classification performance. However, computationally it is challenging to learn a set of highly overcomplete dictionary bases and to encode the test data with the learned bases. In this paper, we describe a mixture sparse coding model that can produce highdimensional sparse representations very efficiently. Besides the computational advantage, the model effectively encourages data that are similar to each other to enjoy similar sparse representations. What’s more, the proposed model can be regarded as an approximation to the recently proposed local coordinate coding (LCC), which states that sparse coding can approximately learn the nonlinear manifold of the sensory data in a locally linear manner. Therefore, the feature learned by the mixture sparse coding model works pretty well with linear classifiers. We apply the proposed model to PASCAL VOC 2007 and 2009 datasets for the classification task, both achieving stateoftheart performances. Key words: Sparse coding, highly overcomplete dictionary training, mixture model, mixture sparse coding, image classification, PASCAL VOC challenge 1
Learning Convolutional Feature Hierarchies for Visual Recognition
"... We propose an unsupervised method for learning multistage hierarchies of sparse convolutional features. While sparse coding has become an increasingly popular method for learning visual features, it is most often trained at the patch level. Applying the resulting filters convolutionally results in ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
(Show Context)
We propose an unsupervised method for learning multistage hierarchies of sparse convolutional features. While sparse coding has become an increasingly popular method for learning visual features, it is most often trained at the patch level. Applying the resulting filters convolutionally results in highly redundant codes because overlapping patches are encoded in isolation. By training convolutionally over large image windows, our method reduces the redudancy between feature vectors at neighboring locations and improves the efficiency of the overall representation. In addition to a linear decoder that reconstructs the image from sparse features, our method trains an efficient feedforward encoder that predicts quasisparse features from the input. While patchbased training rarely produces anything but oriented edge detectors, we show that convolutional training produces highly diverse filters, including centersurround filters, corner detectors, cross detectors, and oriented grating detectors. We show that using these filters in multistage convolutional network architecture improves performance on a number of visual recognition and detection tasks. 1