• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Sparse dictionarybased representation and recognition of action attributes,” ICCV (2011)

by Q Qiu, Z Jiang, R Chellappa
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 23
Next 10 →

Submodular Dictionary Learning for Sparse Coding

by Zhuolin Jiang, Guangxiao Zhang, Larry S. Davis
"... A greedy-based approach to learn a compact and discriminative dictionary for sparse representation is presented. We propose an objective function consisting of two components: entropy rate of a random walk on a graph and a discriminative term. Dictionary learning is achieved by finding a graph topol ..."
Abstract - Cited by 12 (0 self) - Add to MetaCart
A greedy-based approach to learn a compact and discriminative dictionary for sparse representation is presented. We propose an objective function consisting of two components: entropy rate of a random walk on a graph and a discriminative term. Dictionary learning is achieved by finding a graph topology which maximizes the objective function. By exploiting the monotonicity and submodularity properties of the objective function and the matroid constraint, we present a highly efficient greedy-based optimization algorithm. It is more than an order of magnitude faster than several recently proposed dictionary learning approaches. Moreover, the greedy algorithm gives a near-optimal solution with a (1/2)-approximation bound. Our approach yields dictionaries having the property that feature points from the same class have very similar sparse codes. Experimental results demonstrate that our approach outperforms several recently proposed dictionary learning
(Show Context)

Citation Context

...lass and classification is based on the corresponding reconstruction errors. Some algorithms learn a dictionary by merging or selecting dictionary items from a large set of dictionary item candidates =-=[18, 14, 26, 13]-=-. [18, 14] learn a dictionary through merging two items by maximizing the mutual information of class distributions. [13] constructs a dictionary for signal reconstruction from a set of dictionary ite...

Sparse representations, compressive sensing and dictionaries for pattern recognition

by Vishal M. Patel - in Asian Conference on Pattern Recognition (ACPR , 2011
"... Abstract—In recent years, the theories of Compressive Sensing ..."
Abstract - Cited by 8 (6 self) - Add to MetaCart
Abstract—In recent years, the theories of Compressive Sensing
(Show Context)

Citation Context

... been proposed for learning discriminative dictionaries [18], [19], [20], [21], [22], and [23]. In particular, a dictionary learning method based on information maximization principle was proposed in =-=[24]-=- for action recognition. The objective function in [24] maximizes the mutual information between what has been learned and what remains to be learned in terms of appearance information and class distr...

Human gesture recognition on product manifolds

by Yui Man Lui, Isabelle Guyon, Vassilis Athitsos - Journal of Machine Learning Research
"... Action videos are multidimensional data and can be naturally represented as data tensors. While tensor computing is widely used in computer vision, the geometry of tensor space is often ignored. The aim of this paper is to demonstrate the importance of the intrinsic geometry of tensor space which yi ..."
Abstract - Cited by 8 (0 self) - Add to MetaCart
Action videos are multidimensional data and can be naturally represented as data tensors. While tensor computing is widely used in computer vision, the geometry of tensor space is often ignored. The aim of this paper is to demonstrate the importance of the intrinsic geometry of tensor space which yields a very discriminating structure for action recognition. We characterize data tensors as points on a product manifold and model it statistically using least squares regression. To this aim, we factorize a data tensor relating to each order of the tensor using Higher Order Singular Value Decomposition (HOSVD) and then impose each factorized element on a Grassmann manifold. Furthermore, we account for underlying geometry on manifolds and formulate least squares regression as a composite function. This gives a natural extension from Euclidean space to manifolds. Consequently, classification is performed using geodesic distance on a product manifold where each factor manifold is Grassmannian. Our method exploits appearance and motion without explicitly modeling the shapes and dynamics. We assess the proposed method using three gesture databases, namely the Cambridge hand-gesture, the UMD Keck body-gesture, and the CHALEARN gesture challenge data sets. Experimental results reveal that not only does the proposed method perform well on the standard benchmark data sets, but also it generalizes well on the one-shot-learning gesture challenge. Furthermore, it is based on a simple statistical model and the intrinsic geometry of tensor space.
(Show Context)

Citation Context

... competitive to the current stateof-the-art methods in both protocols. One of the key advantages of our method is its direct use of raw pixels while the prototype-tree (Lin et al., 2009), MMI-2+SIFT (=-=Qiu et al., 2011-=-), and CC K3313LUI Method Set1 Set2 Set3 Set4 Total Graph Embedding (Yuan et al., 2010) - - - - 82% TCCA (Kim and Cipolla, 2009) 81% 81% 78% 86% 82±3.5% DCCA+SIFT (Kim and Cipolla, 2007) - - - - 85±2...

Online Semi-Supervised Discriminative Dictionary Learning for Sparse Representation

by Guangxiao Zhang, Zhuolin Jiang, Larry S. Davis
"... Abstract. We present an online semi-supervised dictionary learning algorithm for classification tasks. Specifically, we integrate the reconstruction error of labeled and unlabeled data, the discriminative sparse-code error, and the classification error into an objective function for online dictionar ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
Abstract. We present an online semi-supervised dictionary learning algorithm for classification tasks. Specifically, we integrate the reconstruction error of labeled and unlabeled data, the discriminative sparse-code error, and the classification error into an objective function for online dictionary learning, which enhances the dictionary’s representative and discriminative power. In addition, we propose a probabilistic model over the sparse codes of input signals, which allows us to expand the labeled set. As a consequence, the dictionary and the classifier learned from the enlarged labeled set yield lower generalization error on unseen data. Our approach learns a single dictionary and a predictive linear classifier jointly. Experimental results demonstrate the effectiveness of our approach in face and object category recognition applications. 1
(Show Context)

Citation Context

...l the dictionary discriminates the input signal. To quantify the confidence level of the discriminability of an input signal, we compute the entropy of its sparse code: m∑ ent(x) = − pl(x) log pl(x). =-=(10)-=- l=1 Intuitively if the dictionary is highly discriminative to an input signal, we expect the large values of the sparse code to concentrate at certain dictionary items, and thus the class distributio...

Structure-Preserving Sparse Decomposition for Facial Expression Analysis

by Sima Taheri, Student Member, Qiang Qiu, Student Member
"... Abstract—Although facial expressions can be decomposed in terms of action units (AUs) as suggested by the Facial Action Coding System (FACS), there have been only a few attempts that recognize expression using AUs and their composition rules. In this paper, we propose a dictionary-based approach for ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
Abstract—Although facial expressions can be decomposed in terms of action units (AUs) as suggested by the Facial Action Coding System (FACS), there have been only a few attempts that recognize expression using AUs and their composition rules. In this paper, we propose a dictionary-based approach for facial expression analysis by decomposing expressions in terms of AUs. First, we construct an AU-dictionary using domain experts’ knowledge of AUs. To incorporate the high-level knowledge regarding expression decomposition and AUs, we then perform structure-preserving sparse coding by imposing two layers of grouping over AU-dictionary atoms as well as over the test image matrix columns. We use the computed sparse code matrix for each expressive face to perform expression decomposition and recognition. Since domain experts ’ knowledge may not always be available for constructing an AU-dictionary, we also propose a structure-preserving dictionary learning algorithm which we use to learn a structured dictionary as well as divide expressive faces into several semantic regions. Experimental results on publicly available expression datasets demonstrate the effectiveness of the proposed approach for facial expression analysis.
(Show Context)

Citation Context

...e level of noise and intra-class variations. Algorithms for data-driven learning of domain-specific overcomplete dictionaries are widely employed for reconstruction and recognition applications [18], =-=[19]-=-, [20], [21]. Local variations in the appearance of the faces due to various expressions can also be modeled using a set of dictionary atoms. But facial expressions are structured actions (e.g. deform...

Joint Sparsity-based Representation and Analysis of Unconstrained Activities

by Raghuraman Gopalan
"... While the notion of joint sparsity in understanding common and innovative components of a multi-receiver signal ensemble has been well studied, we investigate the utility of such joint sparse models in representing information contained in a single video signal. By decomposing the content of a video ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
While the notion of joint sparsity in understanding common and innovative components of a multi-receiver signal ensemble has been well studied, we investigate the utility of such joint sparse models in representing information contained in a single video signal. By decomposing the content of a video sequence into that observed by multiple spatially and/or temporally distributed receivers, we first recover a collection of common and innovative components pertaining to individual videos. We then present modeling strategies based on subspace-driven manifold metrics to characterize patterns among these components, across other videos in the system, to perform subsequent video analysis. We demonstrate the efficacy of our approach for activity classification and clustering by reporting competitive results on standard datasets such as, HMDB, UCF-50, Olympic Sports and KTH. 1.
(Show Context)

Citation Context

...ed sparse coding principles to obtain a generic mid-level video representation termed ‘video primal sketch’, [41] that proposed efficient sparse random projection algorithms for video classification, =-=[31]-=- that presented an information maximization approach for learning sparse action attribute dictionaries, and [19] that encoded motion interchange to decouple image edges from motion edges to facilitate...

Action Recognition Using Global Spatio-Temporal Features Derived from Sparse Representations

by Guruprasad Somasundaram, Anoop Cherian, Vassilios Morellas, Nikolaos Papanikolopoulos
"... Recognizing actions is one of the important challenges in computer vision with respect to video data, with applications to surveillance, diagnostics of mental disorders, and video retrieval. Compared to other data modalities such as documents and images, processing video data demands orders of magni ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Recognizing actions is one of the important challenges in computer vision with respect to video data, with applications to surveillance, diagnostics of mental disorders, and video retrieval. Compared to other data modalities such as documents and images, processing video data demands orders of magnitude higher computational and storage resources. One way to alleviate this difficulty is to focus the computations to informative (salient) regions of the video. In this paper, we propose a novel global spatio-temporal self-similarity measure to score saliency using the ideas of dictionary learning and sparse coding. In contrast to existing methods that use local spatio-temporal feature detectors along with descriptors (such as HOG, HOG3D, HOF, etc.), dictionary learning helps consider the saliency in a global setting (on the entire video) in a computationally efficient way. We consider only a small percentage of the most salient (least self-similar) regions found using our algorithm, over which spatio-temporal descriptors such as HOG and region covariance descriptors are computed. The ensemble of such block descriptors in a bag-of-features framework provides a holistic description of the motion sequence which can be used in a classification setting. Experiments on several benchmark datasets in video based action classification demonstrate that our approach performs competitively to the state of the art.
(Show Context)

Citation Context

...sed human action classification. Another powerful representation method involves describing actions as a sequence of shapes [9], and has gained popularity due to its invariance properties. Qiu et al. =-=[10]-=- propose a Gaussian process based dictionary objective function for efficient modeling of actions and for learning new actions. They use a discriminative dictionary learning approach and a probabilist...

Multi-Task Sparse Learning with Beta Process Prior for Action Recognition

by Chunfeng Yuan, Weiming Hu, Guodong Tian, Shuang Yang, Haoran Wang
"... In this paper, we formulate human action recognition as a novel Multi-Task Sparse Learning(MTSL) framework which aims to construct a test sample with multiple fea-tures from as few bases as possible. Learning the sparse representation under each feature modality is considered as a single task in MTS ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
In this paper, we formulate human action recognition as a novel Multi-Task Sparse Learning(MTSL) framework which aims to construct a test sample with multiple fea-tures from as few bases as possible. Learning the sparse representation under each feature modality is considered as a single task in MTSL. Since the tasks are generated from multiple features associated with the same visual in-put, they are not independent but inter-related. We intro-duce a Beta process(BP) prior to the hierarchical MTSL model, which efficiently learns a compact dictionary and infers the sparse structure shared across all the tasks. The MTSL model enforces the robustness in coefficient estima-tion compared with performing each task independently. Besides, the sparseness is achieved via the Beta process for-mulation rather than the computationally expensive l1 norm penalty. In terms of non-informative gamma hyper-priors, the sparsity level is totally decided by the data. Finally, the learning problem is solved by Gibbs sampling inference which estimates the full posterior on the model parameters. Experimental results on the KTH and UCF sports datasets demonstrate the effectiveness of the proposed MTSL ap-proach for action recognition. 1.
(Show Context)

Citation Context

...ning, we initialize all the variables. Except the dictionary D, all the other variables are initialized randomly. Let Dj denote the dictionary associated with the jth task. We initialize Dj via K-SVD =-=[15]-=-. K-SVD is a method to learn an over-complete dictionary for sparse representation. For every task and every action class, we obtain an initial dictionary Dj,c of a large size Kc by K-SVD from the tra...

Support Vector Guided Dictionary Learning

by Sijia Cai, Wangmeng Zuo, Lei Zhang, Xiangchu Feng, Ping Wang
"... Abstract. Discriminative dictionary learning aims to learn a dictionary from training samples to enhance the discriminative capability of their coding vectors. Several discrimination terms have been proposed by as-sessing the prediction loss (e.g., logistic regression) or class separation criterion ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract. Discriminative dictionary learning aims to learn a dictionary from training samples to enhance the discriminative capability of their coding vectors. Several discrimination terms have been proposed by as-sessing the prediction loss (e.g., logistic regression) or class separation criterion (e.g., Fisher discrimination criterion) on the coding vectors. In this paper, we provide a new insight on discriminative dictionary learn-ing. Specifically, we formulate the discrimination term as the weighted summation of the squared distances between all pairs of coding vectors. The discrimination term in the state-of-the-art Fisher discrimination dic-tionary learning (FDDL) method can be explained as a special case of our model, where the weights are simply determined by the numbers of samples of each class. We then propose a parameterization method to adaptively determine the weight of each coding vector pair, which leads to a support vector guided dictionary learning (SVGDL) model. Com-pared with FDDL, SVGDL can adaptively assign different weights to d-ifferent pairs of coding vectors. More importantly, SVGDL automatically selects a few critical pairs to assign non-zero weights, resulting in better generalization ability for pattern recognition tasks. The experimental re-sults on a series of benchmark databases show that SVGDL outperforms many state-of-the-art discriminative dictionary learning methods.
(Show Context)

Citation Context

....1 76.7 1.5e3 1.2e-5 The results of SVGDL are evaluated via five-fold cross validation, where one fold is used for testing and the remaining four folds for training. We compare SVGDL with Qiu et. al. =-=[41]-=-, Yao et. al. [42], Sadanand et. al. [43], SRC, K-SVD, DKSVD, LC-KSVD and FDDL. The recognition accuracies, training and testing time are shown in Table 6. SVGDL outperforms the state-of-the-art metho...

REJECTION-BASED CLASSIFICATION FOR ACTION RECOGNITION USING A SPATIO-TEMPORAL DICTIONARY

by Stefen Chan, Wai Tim, Michele Rombaut, Denis Pellerin, Stefen Chan, Wai Tim, Michele Rombaut, Hal Id Hal, Stefen Chan, Wai Tim, Michele Rombaut, Denis Pellerin , 2015
"... HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract - Add to MetaCart
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
(Show Context)

Citation Context

...lone only reach 80%, the combination of the proposed descriptors and classification method perform really well. The size of the codebook used in our method is 150 compared to 4000 for [15] and 40 for =-=[16]-=- even if the final dimension of the signatures is 1350. Speed wise, the experiments were done on MATLAB so a significant gain in speed is possible: the part-based human detector was the limiting facto...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University