Results 1 - 10
of
24
Linear spatial pyramid matching using sparse coding for image classification
- in IEEE Conference on Computer Vision and Pattern Recognition(CVPR
, 2009
"... Recently SVMs using spatial pyramid matching (SPM) kernel have been highly successful in image classification. Despite its popularity, these nonlinear SVMs have a complexity O(n 2 ∼ n 3) in training and O(n) in testing, where n is the training size, implying that it is nontrivial to scaleup the algo ..."
Abstract
-
Cited by 72 (9 self)
- Add to MetaCart
Recently SVMs using spatial pyramid matching (SPM) kernel have been highly successful in image classification. Despite its popularity, these nonlinear SVMs have a complexity O(n 2 ∼ n 3) in training and O(n) in testing, where n is the training size, implying that it is nontrivial to scaleup the algorithms to handle more than thousands of training images. In this paper we develop an extension of the SPM method, by generalizing vector quantization to sparse coding followed by multi-scale spatial max pooling, and propose a linear SPM kernel based on SIFT sparse codes. This new approach remarkably reduces the complexity of SVMs to O(n) in training and a constant in testing. In a number of image categorization experiments, we find that, in terms of classification accuracy, the suggested linear SPM based on sparse coding of SIFT descriptors always significantly outperforms the linear SPM kernel on histograms, and is even better than the nonlinear SPM kernels, leading to state-of-the-art performance on several benchmarks by using a single type of descriptors. 1.
Sparse Representation For Computer Vision and Pattern Recognition
, 2009
"... Techniques from sparse signal representation are beginning to see significant impact in computer vision, often on non-traditional applications where the goal is not just to obtain a compact high-fidelity representation of the observed signal, but also to extract semantic information. The choice of ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Techniques from sparse signal representation are beginning to see significant impact in computer vision, often on non-traditional applications where the goal is not just to obtain a compact high-fidelity representation of the observed signal, but also to extract semantic information. The choice of dictionary plays a key role in bridging this gap: unconventional dictionaries consisting of, or learned from, the training samples themselves provide the key to obtaining state-of-theart results and to attaching semantic meaning to sparse signal representations. Understanding the good performance of such unconventional dictionaries in turn demands new algorithmic and analytical techniques. This review paper highlights a few representative examples of how the interaction between sparse signal representation and computer vision can enrich both fields, and raises a number of open questions for further study.
Supervised translation-invariant sparse coding
- IN: IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION
, 2010
"... In this paper, we propose a novel supervised hierarchical sparse coding model based on local image descriptors for classification tasks. The supervised dictionary training is performed via back-projection, by minimizing the training error of classifying the image level features, which are extracted ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
In this paper, we propose a novel supervised hierarchical sparse coding model based on local image descriptors for classification tasks. The supervised dictionary training is performed via back-projection, by minimizing the training error of classifying the image level features, which are extracted by max pooling over the sparse codes within a spatial pyramid. Such a max pooling procedure across multiple spatial scales offer the model translation invariant properties, similar to the Convolutional Neural Network (CNN). Experiments show that our supervised dictionary improves the performance of the proposed model significantly over the unsupervised dictionary, leading to state-of-the-art performance on diverse image databases. Further more, our supervised model targets learning linear features, implying its great potential in handling large scale datasets in real applications.
Direct Sparse Deblurring
"... We propose a deblurring algorithm that explicitly takes into account the sparse characteristics of natural images and does not entail solving a numerically ill-conditioned backward-diffusion. The key observation is that the sparse coefficients that encode a given image with respect to an over-comple ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We propose a deblurring algorithm that explicitly takes into account the sparse characteristics of natural images and does not entail solving a numerically ill-conditioned backward-diffusion. The key observation is that the sparse coefficients that encode a given image with respect to an over-complete basis are the same that encode a blurred version of the image with respect to a modified basis. Following an “analysis-by-synthesis ” approach, an explicit generative model is used to compute a sparse representation of the blurred image, and the coefficients of which are used to combine elements of the original basis to yield a restored image. We compare our algorithm against the state of the art in variational methods as well as wavelet-based algorithms. 1.
Sparsity Induced Similarity Measure for Label Propagation
"... Graph-based semi-supervised learning has gained considerable interests in the past several years thanks to its effectiveness in combining labeled and unlabeled data through label propagation for better object modeling and classification. A critical issue in constructing a graph is the weight assignm ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Graph-based semi-supervised learning has gained considerable interests in the past several years thanks to its effectiveness in combining labeled and unlabeled data through label propagation for better object modeling and classification. A critical issue in constructing a graph is the weight assignment where the weight of an edge specifies the similarity between two data points. In this paper, we present a novel technique to measure the similarities among data points by decomposing each data point as an L1 sparse linear combination of the rest of the data points. The main idea is that the coefficients in such a sparse decomposition reflect the point’s neighborhood structure thus providing better similarity measures among the decomposed data point and the rest of the data points. The proposed approach is evaluated on four commonly-used data sets and the experimental results show that the proposed Sparsity Induced Similarity (SIS) measure significantly improves label propagation performance. As an application of the SIS-based label propagation, we show that the SIS measure can be used to improve the Bag-of-Words approach for scene classification. 1.
Efficient Highly Over-Complete Sparse Coding using a Mixture Model
"... Abstract. Sparse coding of sensory data has recently attracted notable attention in research of learning useful features from the unlabeled data. Empirical studies show that mapping the data into a significantly higherdimensional space with sparse coding can lead to superior classification performan ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract. Sparse coding of sensory data has recently attracted notable attention in research of learning useful features from the unlabeled data. Empirical studies show that mapping the data into a significantly higherdimensional space with sparse coding can lead to superior classification performance. However, computationally it is challenging to learn a set of highly over-complete dictionary bases and to encode the test data with the learned bases. In this paper, we describe a mixture sparse coding model that can produce high-dimensional sparse representations very efficiently. Besides the computational advantage, the model effectively encourages data that are similar to each other to enjoy similar sparse representations. What’s more, the proposed model can be regarded as an approximation to the recently proposed local coordinate coding (LCC), which states that sparse coding can approximately learn the nonlinear manifold of the sensory data in a locally linear manner. Therefore, the feature learned by the mixture sparse coding model works pretty well with linear classifiers. We apply the proposed model to PASCAL VOC 2007 and 2009 datasets for the classification task, both achieving stateof-the-art performances. Key words: Sparse coding, highly over-complete dictionary training, mixture model, mixture sparse coding, image classification, PASCAL VOC challenge 1
Image hallucination with feature enhancement
- In CVPR
, 2009
"... Abstract 1 Example-based super-resolution recovers missing high frequencies in a magnified image by learning the correspondence between co-occurrence examples at two different resolution levels. As high-resolution examples usually contain more details and are of higher dimensionality in comparison w ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract 1 Example-based super-resolution recovers missing high frequencies in a magnified image by learning the correspondence between co-occurrence examples at two different resolution levels. As high-resolution examples usually contain more details and are of higher dimensionality in comparison with low-resolution ones, the mapping from low-resolution to high-resolution is an ill-posed problem. Rather than imposing more complicated mapping constraints, we propose to improve the mapping accuracy by enhancing low-resolution examples in terms of mapped features, e.g., derivatives and primitives. A feature enhancement method is presented through a combination of interpolation with prefiltering and non-blind sparse prior deblurring. By enhancing low-resolution examples, unique feature information carried by high-resolution examples is decreased. This regularization reduces the intrinsic dimensionality disparity between two different resolution examples and thus improves the feature mapping accuracy. Experiments demonstrate our super-resolution scheme with feature enhancement produces high quality results both perceptually and quantitatively. 1.
Facial action unit recognition with sparse representation
- In Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on
"... Abstract- This paper presents a novel framework for recognition of facial action unit (AU) combinations by viewing the classification as a sparse representation problem. Based on this framework, we represent a facial image exhibiting the combination of AUs as a sparse linear combination of basis con ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract- This paper presents a novel framework for recognition of facial action unit (AU) combinations by viewing the classification as a sparse representation problem. Based on this framework, we represent a facial image exhibiting the combination of AUs as a sparse linear combination of basis constituting an overcomplete dictionary. We build an overcomplete dictionary whose main elements are mean Gabor features of AU combinations under examination. The other elements of the dictionary are randomly sampled from a distribution (e.g., Gaussian distribution) that guarantees sparse signal recovery. Afterwards, by solving L 1-norm minimization, a facial image is represented as a sparse vector which is used to distinguish various AU patterns. After calculating the sparse representation, the classification problem is simply viewed as a rank maximal problem. The index of the maximal value of the sparse vector is regarded as the class label of the facial image under test. Extensive experiments on the Cohn-Kanade facial expressions database demonstrate that this sparse learning framework is promising for recognition of AU combinations. Keywords- sparse representation; L1-norm minimization; facial expressions recognition; FACS-AU detection. I.
A Shrinkage Learning Approach for Single Image Super-Resolution with Overcomplete Representations
"... Abstract. We present a novel approach for online shrinkage functions learning in single image super-resolution. The proposed approach leverages the classical Wavelet Shrinkage denoising technique where a set of scalar shrinkage functions is applied to the wavelet coefficients of a noisy image. In th ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. We present a novel approach for online shrinkage functions learning in single image super-resolution. The proposed approach leverages the classical Wavelet Shrinkage denoising technique where a set of scalar shrinkage functions is applied to the wavelet coefficients of a noisy image. In the proposed approach, a unique set of learned shrinkage functions is applied to the overcomplete representation coefficients of the interpolated input image. The super-resolution image is reconstructed from the post-shrinkage coefficients. During the learning stage, the lowresolution input image is treated as a reference high-resolution image and a super-resolution reconstruction process is applied to a scaled-down versionofit.Theshapesofallshrinkage functions are jointly learned by solving a Least Squares optimization problem that minimizes the sum of squared errors between the reference image and its super-resolution approximation. Computer simulations demonstrate superior performance compared to state-of-the-art results. 1
Context-Constrained Hallucination for Image Super-Resolution
"... This paper proposes a context-constrained hallucination approach for image super-resolution. Through building a training set of high-resolution/low-resolution image segment pairs, the high-resolution pixel is hallucinated from its texturally similar segments which are retrieved from the training set ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper proposes a context-constrained hallucination approach for image super-resolution. Through building a training set of high-resolution/low-resolution image segment pairs, the high-resolution pixel is hallucinated from its texturally similar segments which are retrieved from the training set by texture similarity. Given the discrete hallucinated examples, a continuous energy function is designed to enforce the fidelity of high-resolution image to low-resolution input and the constraints imposed by the hallucinated examples and the edge smoothness prior. The reconstructed high-resolution image is sharp with minimal artifacts both along the edges and in the textural regions. 1.

