Results 1 - 10
of
66
Robust visual tracking via multi-task sparse learning
- In IEEE conference on computer vision and pattern recognition
, 2012
"... In this paper, we formulate object tracking in a particle filter framework as a multi-task sparse learning problem, which we denote as Multi-Task Tracking (MTT). Since we model particles as linear combinations of dictionary templates that are updated dynamically, learning the representation of each ..."
Abstract
-
Cited by 53 (6 self)
- Add to MetaCart
(Show Context)
In this paper, we formulate object tracking in a particle filter framework as a multi-task sparse learning problem, which we denote as Multi-Task Tracking (MTT). Since we model particles as linear combinations of dictionary templates that are updated dynamically, learning the representation of each particle is considered a single task in MTT. By employing popular sparsity-inducing ℓp,q mixed norms (p ∈ {2, ∞} and q = 1), we regularize the representation problem to enforce joint sparsity and learn the particle representations together. As compared to previous methods that handle particles independently, our results demonstrate that mining the interdependencies between particles improves tracking performance and overall computational complexity. Interestingly, we show that the popular L1 tracker [15] is a special case of our MTT formulation (denoted as the L11 tracker) when p = q = 1. The learning problem can be efficiently solved using an Accelerated Proximal Gradient (APG) method that yields a sequence of closed form updates. As such, MTT is computationally attractive. We test our proposed approach on challenging sequences involving heavy occlusion, drastic illumination changes, and large pose variations. Experimental results show that MTT methods consistently outperform state-of-the-art trackers. 1.
Relaxed Collaborative Representation for Pattern Classification
"... Regularized linear representation learning has led to interesting results in image classification, while how the object should be represented is a critical issue to be investigated. Considering the fact that the different features in a sample should contribute differently to the pattern representati ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
(Show Context)
Regularized linear representation learning has led to interesting results in image classification, while how the object should be represented is a critical issue to be investigated. Considering the fact that the different features in a sample should contribute differently to the pattern representation and classification, in this paper we present a novel relaxed collaborative representation (RCR) model to effectively exploit the similarity and distinctiveness of features. In RCR, each feature vector is coded on its associated dictionary to allow flexibility of feature coding, while the variance of coding vectors is minimized to address the similarity among features. In addition, the distinctiveness of different features is exploited by weighting its distance to other features in the coding domain. The proposed RCR is simple, while our extensive experimental results on benchmark image databases (e.g., various face and flower databases) show that it is very competitive with state-of-the-art image classification methods. 1.
Low-rank matrix recovery with structural incoherence for robust face recognition
- in Proc. IEEE Conf., Comput. Vis. Pattern Recognit., CVPR
, 2012
"... We address the problem of robust face recognition, in which both training and test image data might be corrupted due to occlusion and disguise. From standard face recog-nition algorithms such as Eigenfaces to recently proposed sparse representation-based classification (SRC) methods, most prior work ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
(Show Context)
We address the problem of robust face recognition, in which both training and test image data might be corrupted due to occlusion and disguise. From standard face recog-nition algorithms such as Eigenfaces to recently proposed sparse representation-based classification (SRC) methods, most prior works did not consider possible contamination of data during training, and thus the associated performance might be degraded. Based on the recent success of low-rank matrix recovery, we propose a novel low-rank matrix ap-proximation algorithm with structural incoherence for ro-bust face recognition. Our method not only decomposes raw training data into a set of representative basis with corre-sponding sparse errors for better modeling the face images, we further advocate the structural incoherence between the basis learned from different classes. These basis are en-couraged to be as independent as possible due to the regu-larization on structural incoherence. We show that this pro-vides additional discriminating ability to the original low-rank models for improved performance. Experimental re-sults on public face databases verify the effectiveness and robustness of our method, which is also shown to outper-form state-of-the-art SRC based approaches. 1.
Robust Visual Tracking via Structured Multi-Task Sparse Learning
"... Abstract In this paper, we formulate object tracking in a particle filter framework as a structured multi-task sparse learning problem, which we denote as Structured Multi-Task Tracking (S-MTT). Since we model particles as linear combinations of dictionary templates that are updated dynamically, lea ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
(Show Context)
Abstract In this paper, we formulate object tracking in a particle filter framework as a structured multi-task sparse learning problem, which we denote as Structured Multi-Task Tracking (S-MTT). Since we model particles as linear combinations of dictionary templates that are updated dynamically, learning the representation of each particle is considered a single task in Multi-Task Tracking (MTT). By employing popular sparsity-inducing ℓp,q mixed norms (specificallyp ∈ {2, ∞} and q = 1), we regularize the representation problem to enforce joint sparsity and learn the particle representations together. As compared to previous methods that handle particles independently, our results demonstrate that mining the interdependencies between particles improves tracking Electronic supplementary material The online version of this article (doi:10.1007/s11263-012-0582-z) contains supplementary material, which is available to authorized users.
Predicting occupation via human clothing and contexts
- Computer Vision, IEEE International Conference on
"... Predicting human occupations in photos has great application potentials in intelligent services and systems. However, using traditional classification methods cannot reliably distinguish different occupations due to the complex relations between occupations and the low-level image features. In this ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
(Show Context)
Predicting human occupations in photos has great application potentials in intelligent services and systems. However, using traditional classification methods cannot reliably distinguish different occupations due to the complex relations between occupations and the low-level image features. In this paper, we investigate the human occupation prediction problem by modeling the appearances of human clothing as well as surrounding context. The human clothing, regarding its complex details and variant appearances, is described via part-based modeling on the automatically aligned patches of human body parts. The image patches are represented with semantic-level patterns such as clothes and haircut styles using methods based on sparse coding towards informative and noise-tolerant capacities. This description of human clothing is proved to be more effective than traditional methods. Different kinds of surrounding context are also investigated as a complementarity of human clothing features in the cases that the background information is available. Experiments are conducted on a well labeled image database that contains more than 5, 000 images from 20 representative occupation categories. The preliminary study shows the human occupation is reasonably predictable using the proposed clothing features and possible context. 1.
Learning Active Facial Patches for Expression Analysis
"... In this paper, we present a new idea to analyze facial expression by exploring some common and specific information among different expressions. Inspired by the observation that only a few facial parts are active in expression disclosure (e.g. around mouth, eye), we try to discover the common and sp ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
(Show Context)
In this paper, we present a new idea to analyze facial expression by exploring some common and specific information among different expressions. Inspired by the observation that only a few facial parts are active in expression disclosure (e.g. around mouth, eye), we try to discover the common and specific patches which are important to discriminate all the expressions and only a particular expression, respectively. A two-stage multi-task sparse learning (MTSL) framework is proposed to efficiently locate those discriminative patches. In the first stage MTSL, expression recognition tasks, each of which aims to find dominant patches for each expression, are combined to located common patches. Second, two related tasks, facial expression recognition and face verification tasks, are coupled to learn specific facial patches for individual expression. Extensive experiments validate the existence and significance of common and specific patches. Utilizing these learned patches, we achieve superior performances on expression recognition compared to the state-of-the-arts. 1.
Hyperspectral Image Classification via Kernel Sparse Representation
"... In this paper, a novel nonlinear technique for hyperspectral image classification is proposed. Our approach relies on sparsely representing a test sample in terms of all of the training samples in a feature space induced by a kernel function. For each test pixel in the feature space, a sparse repres ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
In this paper, a novel nonlinear technique for hyperspectral image classification is proposed. Our approach relies on sparsely representing a test sample in terms of all of the training samples in a feature space induced by a kernel function. For each test pixel in the feature space, a sparse representation vector is obtained by decomposing the test pixel over a training dictionary, also in the same feature space, by using a kernel-based greedy pursuit algorithm. The recovered sparse representation vector is then used directly to determine the class label of the test pixel. Projecting the samples into a high-dimensional feature space and kernelizing the sparse representation improves the data separability between different classes, providing a higher classification accuracy compared to the more conventional linear sparsity-based classification algorithms. Moreover, the spatial coherency across neighboring pixels is also incorporated through a kernelized joint sparsity model, where all of the pixels within a small neighborhood are jointly represented in the feature space by selecting a few common training samples. Kernel greedy optimization algorithms are suggested in this paper to solve the kernel versions of the single-pixel and multi-pixel joint sparsity-based recovery problems. Experimental results on several hyperspectral images show that the proposed technique outperforms the linear sparsity-based classification technique, as well as the classical Support Vector Machines and sparse kernel logistic regression classifiers.
Sparse Representation with Kernels
, 2012
"... Recent research has shown the initial success of sparse coding (Sc) in solving many computer vision tasks. Motivated by the fact that kernel trick can capture the nonlinear similarity of features, which helps find a sparse representation of nonlinear features, we therefore propose Kernel Sparse Repr ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
(Show Context)
Recent research has shown the initial success of sparse coding (Sc) in solving many computer vision tasks. Motivated by the fact that kernel trick can capture the nonlinear similarity of features, which helps find a sparse representation of nonlinear features, we therefore propose Kernel Sparse Representation (KSR). Essentially, KSR is a sparse coding technique in a high dimensional feature space mapped by an implicit mapping function. We apply KSR to feature coding in image classification, face recognition and kernel matrix approximation. More specifically, by incorporating KSR into Spatial Pyramid Matching (SPM), we develop KSRSPM, which achieves good performance for image classification. Moreover, KSR based feature coding can be shown as a generalization of Efficient Match Kernel (EMK) and an extension of Sc based SPM (ScSPM). We further show that our proposed KSR using Histogram Intersection Kernel (HIK) can be considered as a soft assignment extension of HIK based feature quantization in feature coding process. Besides feature coding, comparing with sparse coding, KSR can learn more discriminative sparse codes and achieve higher accuracy for face recognition. Moreover, KSR can also be applied to kernel matrix approximation in large scale learning tasks, and it demonstrates its robustness to kernel matrix approximation especially when a small fraction of the data is used. Extensive experimental results demonstrate promising results of KSR in the image classification, face recognition and kernel matrix approximation. All these applications proves the effectiveness of KSR in computer vision and machine learning tasks.
Eigen-PEP for Video Face Recognition
"... Abstract. To effectively solve the problem of large scale video face recognition, we argue for a comprehensive, compact, and yet flexible rep-resentation of a face subject. It shall comprehensively integrate the visual information from all relevant video frames of the subject in a compact form. It s ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
(Show Context)
Abstract. To effectively solve the problem of large scale video face recognition, we argue for a comprehensive, compact, and yet flexible rep-resentation of a face subject. It shall comprehensively integrate the visual information from all relevant video frames of the subject in a compact form. It shall also be flexible to be incrementally updated, incorporating new or retiring obsolete observations. In search for such a representa-tion, we present the Eigen-PEP that is built upon the recent success of the probabilistic elastic part (PEP) model. It first integrates the informa-tion from relevant video sources by a part-based average pooling through the PEP model, which produces an intermediate high dimensional, part-based, and pose-invariant representation. We then compress the inter-mediate representation through principal component analysis, and only a number of principal eigen dimensions are kept (as small as 100). We evaluate the Eigen-PEP representation both for video-based face ver-ification and identification on the YouTube Faces Dataset and a new Celebrity-1000 video face dataset, respectively. On YouTube Faces, we further improve the state-of-the-art recognition accuracy. On Celebrity-1000, we lead the competing baselines by a significant margin while of-fering a scalable solution that is linear with respect to the number of subjects. (a) LFW (b) YouTube Faces (c) Celebrity-1000 Fig. 1. Sample images in three unconstrained face recognition datasets: the image-
KERNEL DICTIONARY LEARNING
"... In this paper, we present dictionary learning methods for sparse and redundant signal representations in high dimensional feature space. Using the kernel method, we describe how the well-known dictionary learning approaches such as the method of optimal directions and K-SVD can be made nonlinear. We ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
(Show Context)
In this paper, we present dictionary learning methods for sparse and redundant signal representations in high dimensional feature space. Using the kernel method, we describe how the well-known dictionary learning approaches such as the method of optimal directions and K-SVD can be made nonlinear. We analyze these constructions and demonstrate their improved performance through several experiments on classification problems. It is shown that nonlinear dictionary learning approaches can provide better discrimination compared to their linear counterparts and kernel PCA, especially when the data is corrupted by noise. Index Terms — Kernel methods, dictionary learning, method of optimal directions, K-SVD. 1.