Results 1  10
of
81
Linear spatial pyramid matching using sparse coding for image classification
 in IEEE Conference on Computer Vision and Pattern Recognition(CVPR
, 2009
"... Recently SVMs using spatial pyramid matching (SPM) kernel have been highly successful in image classification. Despite its popularity, these nonlinear SVMs have a complexity O(n 2 ∼ n 3) in training and O(n) in testing, where n is the training size, implying that it is nontrivial to scaleup the algo ..."
Abstract

Cited by 168 (15 self)
 Add to MetaCart
Recently SVMs using spatial pyramid matching (SPM) kernel have been highly successful in image classification. Despite its popularity, these nonlinear SVMs have a complexity O(n 2 ∼ n 3) in training and O(n) in testing, where n is the training size, implying that it is nontrivial to scaleup the algorithms to handle more than thousands of training images. In this paper we develop an extension of the SPM method, by generalizing vector quantization to sparse coding followed by multiscale spatial max pooling, and propose a linear SPM kernel based on SIFT sparse codes. This new approach remarkably reduces the complexity of SVMs to O(n) in training and a constant in testing. In a number of image categorization experiments, we find that, in terms of classification accuracy, the suggested linear SPM based on sparse coding of SIFT descriptors always significantly outperforms the linear SPM kernel on histograms, and is even better than the nonlinear SPM kernels, leading to stateoftheart performance on several benchmarks by using a single type of descriptors. 1.
What is the Best MultiStage Architecture for Object Recognition?
"... In many recent object recognition systems, feature extraction stages are generally composed of a filter bank, a nonlinear transformation, and some sort of feature pooling layer. Most systems use only one stage of feature extraction in which the filters are hardwired, or two stages where the filter ..."
Abstract

Cited by 103 (16 self)
 Add to MetaCart
In many recent object recognition systems, feature extraction stages are generally composed of a filter bank, a nonlinear transformation, and some sort of feature pooling layer. Most systems use only one stage of feature extraction in which the filters are hardwired, or two stages where the filters in one or both stages are learned in supervised or unsupervised mode. This paper addresses three questions: 1. How does the nonlinearities that follow the filter banks influence the recognition accuracy? 2. does learning the filter banks in an unsupervised or supervised manner improve the performance over random filters or hardwired filters? 3. Is there any advantage to using an architecture with two stages of feature extraction, rather than one? We show that using nonlinearities that include rectification and local contrast normalization is the single most important ingredient for good accuracy on object recognition benchmarks. We show that two stages of feature extraction yield better accuracy than one. Most surprisingly, we show that a twostage system with random filters can yield almost 63 % recognition rate on Caltech101, provided that the proper nonlinearities and pooling layers are used. Finally, we show that with supervised refinement, the system achieves stateoftheart performance on NORB dataset (5.6%) and unsupervised pretraining followed by supervised refinement produces good accuracy on Caltech101 (> 65%), and the lowest known error rate on the undistorted, unprocessed MNIST dataset (0.53%). 1.
Online learning for matrix factorization and sparse coding
"... Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the largescale matrix factorization problem that consists of learning the basis set, adapting it t ..."
Abstract

Cited by 97 (19 self)
 Add to MetaCart
Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the largescale matrix factorization problem that consists of learning the basis set, adapting it to specific data. Variations of this problem include dictionary learning in signal processing, nonnegative matrix factorization and sparse principal component analysis. In this paper, we propose to address these tasks with a new online optimization algorithm, based on stochastic approximations, which scales up gracefully to large datasets with millions of training samples, and extends naturally to various matrix factorization formulations, making it suitable for a wide range of learning problems. A proof of convergence is presented, along with experiments with natural images and genomic data demonstrating that it leads to stateoftheart performance in terms of speed and optimization for both small and large datasets.
Sparse subspace clustering
 In CVPR
, 2009
"... We propose a method based on sparse representation (SR) to cluster data drawn from multiple lowdimensional linear or affine subspaces embedded in a highdimensional space. Our method is based on the fact that each point in a union of subspaces has a SR with respect to a dictionary formed by all oth ..."
Abstract

Cited by 70 (6 self)
 Add to MetaCart
We propose a method based on sparse representation (SR) to cluster data drawn from multiple lowdimensional linear or affine subspaces embedded in a highdimensional space. Our method is based on the fact that each point in a union of subspaces has a SR with respect to a dictionary formed by all other data points. In general, finding such a SR is NP hard. Our key contribution is to show that, under mild assumptions, the SR can be obtained ’exactly ’ by using ℓ1 optimization. The segmentation of the data is obtained by applying spectral clustering to a similarity matrix built from this SR. Our method can handle noise, outliers as well as missing data. We apply our subspace clustering algorithm to the problem of segmenting multiple motions in video. Experiments on 167 video sequences show that our approach significantly outperforms stateoftheart methods. 1.
Learning Invariant Features through Topographic Filter Maps
"... Several recentlyproposed architectures for highperformance object recognition are composed of two main stages: a feature extraction stage that extracts locallyinvariant feature vectors from regularly spaced image patches, and a somewhat generic supervised classifier. The first stage is often compos ..."
Abstract

Cited by 68 (12 self)
 Add to MetaCart
Several recentlyproposed architectures for highperformance object recognition are composed of two main stages: a feature extraction stage that extracts locallyinvariant feature vectors from regularly spaced image patches, and a somewhat generic supervised classifier. The first stage is often composed of three main modules: (1) a bank of filters (often oriented edge detectors); (2) a nonlinear transform, such as a pointwise squashing functions, quantization, or normalization; (3) a spatial pooling operation which combines the outputs of similar filters over neighboring regions. We propose a method that automatically learns such feature extractors in an unsupervised fashion by simultaneously learning the filters and the pooling units that combine multiple filter outputs together. The method automatically generates topographic maps of similar filters that extract features of orientations, scales, and positions. These similar filters are pooled together, producing locallyinvariant outputs. The learned feature descriptors give comparable results as SIFT on image recognition tasks for which SIFT is well suited, and better results than SIFT on tasks for which SIFT is less well suited. 1.
Fast inference in sparse coding algorithms with applications to object recognition
 Technical report, Computational and Biological Learning Lab, Courant Institute, NYU
"... Adaptive sparse coding methods learn a possibly overcomplete set of basis functions, such that natural image patches can be reconstructed by linearly combining a small subset of these bases. The applicability of these methods to visual object recognition tasks has been limited because of the prohibi ..."
Abstract

Cited by 36 (11 self)
 Add to MetaCart
Adaptive sparse coding methods learn a possibly overcomplete set of basis functions, such that natural image patches can be reconstructed by linearly combining a small subset of these bases. The applicability of these methods to visual object recognition tasks has been limited because of the prohibitive cost of the optimization algorithms required to compute the sparse representation. In this work we propose a simple and efficient algorithm to learn basis functions. After training, this model also provides a fast and smooth approximator to the optimal representation, achieving even better accuracy than exact sparse coding algorithms on visual object recognition tasks. 1
Learning to Sense Sparse Signals: Simultaneous Sensing Matrix and Sparsifying Dictionary Optimization
, 2008
"... Abstract Sparse signals representation, analysis, and sensing, has received a lot of attention in recent years from the signal processing, optimization, and learning communities. On one hand, the learning of overcomplete dictionaries that facilitate a sparse representation of the image as a liner c ..."
Abstract

Cited by 28 (4 self)
 Add to MetaCart
Abstract Sparse signals representation, analysis, and sensing, has received a lot of attention in recent years from the signal processing, optimization, and learning communities. On one hand, the learning of overcomplete dictionaries that facilitate a sparse representation of the image as a liner combination of a few atoms from such dictionary, leads to stateoftheart results in image and video restoration and image classification. On the other hand, the framework of compressed sensing (CS) has shown that sparse signals can be recovered from far less samples than those required by the classical ShannonNyquist Theorem. The goal of this paper is to present a framework that unifies the learning of overcomplete dictionaries for sparse image representation with the concepts of signal recovery from very few samples put forward by the CS theory. The samples used in CS correspond to linear projections defined by a sampling projection matrix. It has been shown that, for example, a nonadaptive random sampling matrix satisfies the fundamental theoretical requirements of CS, enjoying the additional benefit of universality. On the other hand, a projection sensing matrix that is optimally designed for a certain signal class can further improve the reconstruction accuracy or further reduce the necessary number of samples. In this work we introduce a framework for the joint design and optimization, from a set of training images, of the
TaskDriven Dictionary Learning
"... Abstract—Modeling data with linear combinations of a few elements from a learned dictionary has been the focus of much recent research in machine learning, neuroscience, and signal processing. For signals such as natural images that admit such sparse representations, it is now well established that ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
Abstract—Modeling data with linear combinations of a few elements from a learned dictionary has been the focus of much recent research in machine learning, neuroscience, and signal processing. For signals such as natural images that admit such sparse representations, it is now well established that these models are well suited to restoration tasks. In this context, learning the dictionary amounts to solving a largescale matrix factorization problem, which can be done efficiently with classical optimization tools. The same approach has also been used for learning features from data for other purposes, e.g., image classification, but tuning the dictionary in a supervised way for these tasks has proven to be more difficult. In this paper, we present a general formulation for supervised dictionary learning adapted to a wide variety of tasks, and present an efficient algorithm for solving the corresponding optimization problem. Experiments on handwritten digit classification, digital art identification, nonlinear inverse image problems, and compressed sensing demonstrate that our approach is effective in largescale settings, and is well suited to supervised and semisupervised classification, as well as regression tasks for data that admit sparse representations. Index Terms—Basis pursuit, Lasso, dictionary learning, matrix factorization, semisupervised learning, compressed sensing. Ç 1
Learning A Discriminative Dictionary for Sparse Coding via Label Consistent KSVD
"... A label consistent KSVD (LCKSVD) algorithm to learn a discriminative dictionary for sparse coding is presented. In addition to using class labels of training data, we also associate label information with each dictionary item (columns of the dictionary matrix) to enforce discriminability in sparse ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
A label consistent KSVD (LCKSVD) algorithm to learn a discriminative dictionary for sparse coding is presented. In addition to using class labels of training data, we also associate label information with each dictionary item (columns of the dictionary matrix) to enforce discriminability in sparse codes during the dictionary learning process. More specifically, we introduce a new label consistent constraint called ‘discriminative sparsecode error ’ and combine it with the reconstruction error and the classification error to form a unified objective function. The optimal solution is efficiently obtained using the KSVD algorithm. Our algorithm learns a single overcomplete dictionary and an optimal linear classifier jointly. It yields dictionaries so that feature points with the same class labels have similar sparse codes. Experimental results demonstrate that our algorithm outperforms many recently proposed sparse coding techniques for face and object category recognition under the same learning conditions. 1.
Dense error correction via ℓ1 minimization
, 2009
"... This paper studies the problem of recovering a nonnegative sparse signal x ∈ Rn from highly corrupted linear measurements y = Ax + e ∈ Rm, where e is an unknown error vector whose nonzero entries may be unbounded. Motivated by an observation from face recognition in computer vision, this paper prov ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
This paper studies the problem of recovering a nonnegative sparse signal x ∈ Rn from highly corrupted linear measurements y = Ax + e ∈ Rm, where e is an unknown error vector whose nonzero entries may be unbounded. Motivated by an observation from face recognition in computer vision, this paper proves that for highly correlated (and possibly overcomplete) dictionaries A, any nonnegative, sufficiently sparse signal x can be recovered by solving an ℓ1minimization problem: min ‖x‖1 + ‖e‖1 subject to y = Ax + e. More precisely, if the fraction ρ of errors is bounded away from one and the support of x grows sublinearly in the dimension m of the observation, then as m goes to infinity, the above ℓ1minimization succeeds for all signals x and almost all signandsupport patterns of e. This result suggests that accurate recovery of sparse signals is possible and computationally feasible even with nearly 100 % of the observations corrupted. The proof relies on a careful characterization of the faces of a convex polytope spanned together by the standard crosspolytope and a set of iid Gaussian vectors with nonzero mean and small variance, which we call the “crossandbouquet ” model. Simulations and experimental results corroborate the findings, and suggest extensions to the result.