Results 1  10
of
445
Linear spatial pyramid matching using sparse coding for image classification
 in IEEE Conference on Computer Vision and Pattern Recognition(CVPR
, 2009
"... Recently SVMs using spatial pyramid matching (SPM) kernel have been highly successful in image classification. Despite its popularity, these nonlinear SVMs have a complexity O(n 2 ∼ n 3) in training and O(n) in testing, where n is the training size, implying that it is nontrivial to scaleup the algo ..."
Abstract

Cited by 497 (21 self)
 Add to MetaCart
Recently SVMs using spatial pyramid matching (SPM) kernel have been highly successful in image classification. Despite its popularity, these nonlinear SVMs have a complexity O(n 2 ∼ n 3) in training and O(n) in testing, where n is the training size, implying that it is nontrivial to scaleup the algorithms to handle more than thousands of training images. In this paper we develop an extension of the SPM method, by generalizing vector quantization to sparse coding followed by multiscale spatial max pooling, and propose a linear SPM kernel based on SIFT sparse codes. This new approach remarkably reduces the complexity of SVMs to O(n) in training and a constant in testing. In a number of image categorization experiments, we find that, in terms of classification accuracy, the suggested linear SPM based on sparse coding of SIFT descriptors always significantly outperforms the linear SPM kernel on histograms, and is even better than the nonlinear SPM kernels, leading to stateoftheart performance on several benchmarks by using a single type of descriptors. 1.
A Survey on Transfer Learning
"... A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many realworld applications, this assumption may not hold. For example, we sometimes have a classification task i ..."
Abstract

Cited by 459 (24 self)
 Add to MetaCart
(Show Context)
A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many realworld applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift. We also explore some potential future issues in transfer learning research.
Localityconstrained linear coding for image classification
 IN: IEEE CONFERENCE ON COMPUTER VISION AND PATTERN CLASSIFICATOIN
, 2010
"... The traditional SPM approach based on bagoffeatures (BoF) requires nonlinear classifiers to achieve good image classification performance. This paper presents a simple but effective coding scheme called Localityconstrained Linear Coding (LLC) in place of the VQ coding in traditional SPM. LLC util ..."
Abstract

Cited by 443 (20 self)
 Add to MetaCart
The traditional SPM approach based on bagoffeatures (BoF) requires nonlinear classifiers to achieve good image classification performance. This paper presents a simple but effective coding scheme called Localityconstrained Linear Coding (LLC) in place of the VQ coding in traditional SPM. LLC utilizes the locality constraints to project each descriptor into its localcoordinate system, and the projected coordinates are integrated by max pooling to generate the final representation. With linear classifier, the proposed approach performs remarkably better than the traditional nonlinear SPM, achieving stateoftheart performance on several benchmarks. Compared with the sparse coding strategy [22], the objective function used by LLC has an analytical solution. In addition, the paper proposes a fast approximated LLC method by first performing a Knearestneighbor search and then solving a constrained least square fitting problem, bearing computational complexity of O(M + K2). Hence even with very large codebooks, our system can still process multiple frames per second. This efficiency significantly adds to the practical values of LLC for real applications.
Online learning for matrix factorization and sparse coding
, 2010
"... Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the largescale matrix factorization problem that consists of learning the basis set in order to ad ..."
Abstract

Cited by 330 (31 self)
 Add to MetaCart
(Show Context)
Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the largescale matrix factorization problem that consists of learning the basis set in order to adapt it to specific data. Variations of this problem include dictionary learning in signal processing, nonnegative matrix factorization and sparse principal component analysis. In this paper, we propose to address these tasks with a new online optimization algorithm, based on stochastic approximations, which scales up gracefully to large data sets with millions of training samples, and extends naturally to various matrix factorization formulations, making it suitable for a wide range of learning problems. A proof of convergence is presented, along with experiments with natural images and genomic data demonstrating that it leads to stateoftheart performance in terms of speed and optimization for both small and large data sets.
Selftaught learning: Transfer learning from unlabeled data
 Proceedings of the Twentyfourth International Conference on Machine Learning
, 2007
"... We present a new machine learning framework called “selftaught learning ” for using unlabeled data in supervised classification tasks. We do not assume that the unlabeled data follows the same class labels or generative distribution as the labeled data. Thus, we would like to use a large number of ..."
Abstract

Cited by 299 (20 self)
 Add to MetaCart
(Show Context)
We present a new machine learning framework called “selftaught learning ” for using unlabeled data in supervised classification tasks. We do not assume that the unlabeled data follows the same class labels or generative distribution as the labeled data. Thus, we would like to use a large number of unlabeled images (or audio samples, or text documents) randomly downloaded from the Internet to improve performance on a given image (or audio, or text) classification task. Such unlabeled data is significantly easier to obtain than in typical semisupervised or transfer learning settings, making selftaught learning widely applicable to many practical learning problems. We describe an approach to selftaught learning that uses sparse coding to construct higherlevel features using the unlabeled data. These features form a succinct input representation and significantly improve classification performance. When using an SVM for classification, we further show how a Fisher kernel can be learned for this representation. 1.
What is the Best MultiStage Architecture for Object Recognition?
"... In many recent object recognition systems, feature extraction stages are generally composed of a filter bank, a nonlinear transformation, and some sort of feature pooling layer. Most systems use only one stage of feature extraction in which the filters are hardwired, or two stages where the filter ..."
Abstract

Cited by 252 (22 self)
 Add to MetaCart
(Show Context)
In many recent object recognition systems, feature extraction stages are generally composed of a filter bank, a nonlinear transformation, and some sort of feature pooling layer. Most systems use only one stage of feature extraction in which the filters are hardwired, or two stages where the filters in one or both stages are learned in supervised or unsupervised mode. This paper addresses three questions: 1. How does the nonlinearities that follow the filter banks influence the recognition accuracy? 2. does learning the filter banks in an unsupervised or supervised manner improve the performance over random filters or hardwired filters? 3. Is there any advantage to using an architecture with two stages of feature extraction, rather than one? We show that using nonlinearities that include rectification and local contrast normalization is the single most important ingredient for good accuracy on object recognition benchmarks. We show that two stages of feature extraction yield better accuracy than one. Most surprisingly, we show that a twostage system with random filters can yield almost 63 % recognition rate on Caltech101, provided that the proper nonlinearities and pooling layers are used. Finally, we show that with supervised refinement, the system achieves stateoftheart performance on NORB dataset (5.6%) and unsupervised pretraining followed by supervised refinement produces good accuracy on Caltech101 (> 65%), and the lowest known error rate on the undistorted, unprocessed MNIST dataset (0.53%). 1.
Learning midlevel features for recognition
, 2010
"... Many successful models for scene or object recognition transform lowlevel descriptors (such as Gabor filter responses, or SIFT descriptors) into richer representations of intermediate complexity. This process can often be broken down into two steps: (1) a coding step, which performs a pointwise tra ..."
Abstract

Cited by 228 (13 self)
 Add to MetaCart
(Show Context)
Many successful models for scene or object recognition transform lowlevel descriptors (such as Gabor filter responses, or SIFT descriptors) into richer representations of intermediate complexity. This process can often be broken down into two steps: (1) a coding step, which performs a pointwise transformation of the descriptors into a representation better adapted to the task, and (2) a pooling step, which summarizes the coded features over larger neighborhoods. Several combinations of coding and pooling schemes have been proposed in the literature. The goal of this paper is threefold. We seek to establish the relative importance of each step of midlevel feature extraction through a comprehensive cross evaluation of several types of coding modules (hard and soft vector quantization, sparse coding) and pooling schemes (by taking the average, or the maximum), which obtains stateoftheart performance or better on several recognition benchmarks. We show how to improve the best performing coding scheme by learning a supervised discriminative dictionary for sparse coding. We provide theoretical and empirical insight into the remarkable performance of max pooling. By teasing apart components shared by modern midlevel feature extractors, our approach aims to facilitate the design of better recognition architectures.
An Analysis of SingleLayer Networks in Unsupervised Feature Learning
"... A great deal of research has focused on algorithms for learning features from unlabeled data. Indeed, much progress has been made on benchmark datasets like NORB and CIFAR by employing increasingly complex unsupervised learning algorithms and deep models. In this paper, however, we show that several ..."
Abstract

Cited by 223 (19 self)
 Add to MetaCart
(Show Context)
A great deal of research has focused on algorithms for learning features from unlabeled data. Indeed, much progress has been made on benchmark datasets like NORB and CIFAR by employing increasingly complex unsupervised learning algorithms and deep models. In this paper, however, we show that several very simple factors, such as the number of hidden nodes in the model, may be as important to achieving high performance as the choice of learning algorithm or the depth of the model. Specifically, we will apply several offtheshelf feature learning algorithms (sparse autoencoders, sparse RBMs and Kmeans clustering, Gaussian mixtures) to NORB and CIFAR datasets using only singlelayer networks. We then present a detailed analysis of the effect of changes in the model setup: the receptive field size, number of hidden nodes (features), the stepsize (“stride”) between extracted features, and the effect of whitening. Our results show that large numbers of hidden nodes and dense feature extraction are as critical to achieving high performance as the choice of algorithm itself—so critical, in fact, that when these parameters are pushed to their limits, we are able to achieve stateoftheart performance on both CIFAR and NORB using only a single layer of features. More surprisingly, our best performance is based on Kmeans clustering, which is extremely fast, has no hyperparameters to tune beyond the model structure itself, and is very easy implement. Despite the simplicity of our system, we achieve performance beyond all previously published results on the CIFAR10 and NORB datasets (79.6 % and 97.0 % accuracy respectively). 1
Image SuperResolution via Sparse Representation
"... This paper presents a new approach to singleimage superresolution, based on sparse signal representation. Research on image statistics suggests that image patches can be wellrepresented as a sparse linear combination of elements from an appropriately chosen overcomplete dictionary. Inspired by th ..."
Abstract

Cited by 194 (9 self)
 Add to MetaCart
This paper presents a new approach to singleimage superresolution, based on sparse signal representation. Research on image statistics suggests that image patches can be wellrepresented as a sparse linear combination of elements from an appropriately chosen overcomplete dictionary. Inspired by this observation, we seek a sparse representation for each patch of the lowresolution input, and then use the coefficients of this representation to generate the highresolution output. Theoretical results from compressed sensing suggest that under mild conditions, the sparse representation can be correctly recovered from the downsampled signals. By jointly training two dictionaries for the low resolution and high resolution image patches, we can enforce the similarity of sparse representations between the low resolution and high resolution image patch pair with respect to their own dictionaries. Therefore, the sparse representation of a low resolution image patch can be applied with the high resolution image patch dictionary to generate a high resolution image patch. The learned dictionary pair is a more compact representation of the patch pairs, compared to previous approaches which simply sample a large amount of image patch pairs, reducing the computation cost substantially. The effectiveness of such a sparsity prior is demonstrated for general image superresolution and also for the special case of face hallucination. In both cases, our algorithm can generate highresolution images that are competitive or even superior in quality to images produced by other similar SR methods, but with faster processing speed.
Structured variable selection with sparsityinducing norms
, 2011
"... We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsityinducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1norm and the group ℓ1norm by allowing the subsets to ov ..."
Abstract

Cited by 187 (27 self)
 Add to MetaCart
(Show Context)
We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsityinducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1norm and the group ℓ1norm by allowing the subsets to overlap. This leads to a specific set of allowed nonzero patterns for the solutions of such problems. We first explore the relationship between the groups defining the norm and the resulting nonzero patterns, providing both forward and backward algorithms to go back and forth from groups to patterns. This allows the design of norms adapted to specific prior knowledge expressed in terms of nonzero patterns. We also present an efficient active set algorithm, and analyze the consistency of variable selection for leastsquares linear regression in low and highdimensional settings.