Results 1 -
8 of
8
Canonical correlation analysis of video volume tensors for action categorization and detection
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2009
"... Abstract—This paper addresses a spatiotemporal pattern recognition problem. The main purpose of this study is to find a right representation and matching of action video volumes for categorization. A novel method is proposed to measure video-to-video volume similarity by extending Canonical Correlat ..."
Abstract
-
Cited by 58 (0 self)
- Add to MetaCart
(Show Context)
Abstract—This paper addresses a spatiotemporal pattern recognition problem. The main purpose of this study is to find a right representation and matching of action video volumes for categorization. A novel method is proposed to measure video-to-video volume similarity by extending Canonical Correlation Analysis (CCA), a principled tool to inspect linear relations between two sets of vectors, to that of two multiway data arrays (or tensors). The proposed method analyzes video volumes as inputs avoiding the difficult problem of explicit motion estimation required in traditional methods and provides a way of spatiotemporal pattern matching that is robust to intraclass variations of actions. The proposed matching is demonstrated for action classification by a simple Nearest Neighbor classifier. We, moreover, propose an automatic action detection method, which performs 3D window search over an input video with action exemplars. The search is speeded up by dynamic learning of subspaces in the proposed CCA. Experiments on a public action data set (KTH) and a self-recorded hand gesture data showed that the proposed method is significantly better than various state-ofthe-art methods with respect to accuracy. Our method has low time complexity and does not require any major tuning parameters. Index Terms—Action categorization, gesture recognition, canonical correlation analysis, tensor, action detection, incremental subspace learning, spatiotemporal pattern classification. Ç 1
Human gesture recognition on product manifolds
- Journal of Machine Learning Research
"... Action videos are multidimensional data and can be naturally represented as data tensors. While tensor computing is widely used in computer vision, the geometry of tensor space is often ignored. The aim of this paper is to demonstrate the importance of the intrinsic geometry of tensor space which yi ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
(Show Context)
Action videos are multidimensional data and can be naturally represented as data tensors. While tensor computing is widely used in computer vision, the geometry of tensor space is often ignored. The aim of this paper is to demonstrate the importance of the intrinsic geometry of tensor space which yields a very discriminating structure for action recognition. We characterize data tensors as points on a product manifold and model it statistically using least squares regression. To this aim, we factorize a data tensor relating to each order of the tensor using Higher Order Singular Value Decomposition (HOSVD) and then impose each factorized element on a Grassmann manifold. Furthermore, we account for underlying geometry on manifolds and formulate least squares regression as a composite function. This gives a natural extension from Euclidean space to manifolds. Consequently, classification is performed using geodesic distance on a product manifold where each factor manifold is Grassmannian. Our method exploits appearance and motion without explicitly modeling the shapes and dynamics. We assess the proposed method using three gesture databases, namely the Cambridge hand-gesture, the UMD Keck body-gesture, and the CHALEARN gesture challenge data sets. Experimental results reveal that not only does the proposed method perform well on the standard benchmark data sets, but also it generalizes well on the one-shot-learning gesture challenge. Furthermore, it is based on a simple statistical model and the intrinsic geometry of tensor space.
Tangent Bundle for Human Action Recognition
"... Abstract — Common human actions are instantly recognizable by people and increasingly machines need to understand this language if they are to engage smoothly with people. Here we introduce a new method for automated human action recognition. The proposed method represents videos as a tangent bundle ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
Abstract — Common human actions are instantly recognizable by people and increasingly machines need to understand this language if they are to engage smoothly with people. Here we introduce a new method for automated human action recognition. The proposed method represents videos as a tangent bundle on a Grassmann manifold. Videos are expressed as third order tensors and factorized to a set of tangent spaces. Tangent vectors are then computed between elements on a Grassmann manifold and exploited for action classification. In particular, logarithmic mapping is applied to map a point from the manifold to tangent vectors centered at a given element. The canonical metric is used to induce the intrinsic distance for a set of tangent spaces. Empirical results show that our method is effective on both uniform and non-uniform backgrounds for action classification. We achieve recognition rates of 91 % on the Cambridge gesture dataset, 88 % on the UCF sport dataset, and 97 % on the KTH human action dataset. Additionally, our method does not require prior training. I.
Real-Time Object Identification using SURF Key-Points
, 2011
"... The thesis here addresses the topic of image features and how they can be used in object identification. Two state of the art algorithms Scale Invariant Feature Transform (SIFT) and Speeded-Up Robust Features (SURF) are studied and their qualities are measured and based on the results of these tests ..."
Abstract
- Add to MetaCart
The thesis here addresses the topic of image features and how they can be used in object identification. Two state of the art algorithms Scale Invariant Feature Transform (SIFT) and Speeded-Up Robust Features (SURF) are studied and their qualities are measured and based on the results of these tests the best algorithm, SURF, is cho-sen for building a real-time object identification application. The application is expected to run on an ARM Cortex-A8 based embed-ded processor platform known as i.MX515EVK. Being a computation intensive algorithm and due to limited hardware resources several optimization strategies were applied on the algorithm to bring up the speed, namely, Algorithmic Optimizations, Implementation Op-timization and Application Optimization. Special emphasis is given to the SIMD unit of the Cortex-A8 core known as NEON, in fact the major contributor in bringing up the speed of the algorithm is due to extensive usage of NEON. Taking the most effective version of SURF algorithm implementation a real-time Euro currency notes identifi-cation application is built. Experiments are conducted to show how it is feasible for the application to be resilient to changing scale, illu-mination, blur and orientation conditions and still identify currency notes from image frames at a rate of 3.5- 4 frames per second.
Partial Least Squares Kernel for Computing Similarities between Video Sequences Partial Least Squares Kernel for Computing Similarities between Video Sequences
"... Abstract Computing similarities between data samples is a fundamental step in most Pattern Recognition (PR) tasks. Better similarity measures lead to more accurate prediction of labels. Computing similarities between video sequences has been a challenging problem for the PR community for long becau ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Computing similarities between data samples is a fundamental step in most Pattern Recognition (PR) tasks. Better similarity measures lead to more accurate prediction of labels. Computing similarities between video sequences has been a challenging problem for the PR community for long because videos have both spatial and temporal context which are hard to capture. We describe a novel approach that employs Partial Least Squares (PLS) regression to derive a measure of similarity between two tensors (videos). We demonstrate the use of this tensor similarity measure along with SVM classifiers to solve the tasks of hand gesture recognition and action classification. We show that our methods significantly outperform the state of the art approaches on two popular datasets: Cambridge hand gesture dataset and UCF sports action dataset. Our method requires no parameter tuning.
Action Recognition using Canonical Correlation Kernels
"... Abstract. In this paper, we propose the canonical correlation kernel (CCK), that seamlessly integrates the advantages of lower dimensional representation of videos with a discriminative classifier like SVM. In the process of defining the kernel, we learn a low-dimensional (linear as well as nonlinea ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. In this paper, we propose the canonical correlation kernel (CCK), that seamlessly integrates the advantages of lower dimensional representation of videos with a discriminative classifier like SVM. In the process of defining the kernel, we learn a low-dimensional (linear as well as nonlinear) representation of the video data, which is originally represented as a tensor. We densely compute features at single (or two) frame level, and avoid any explicit tracking. Tensor representation provides the holistic view of the video data, which is the starting point of computing the CCK. Our kernel is defined in terms of the principal angles between the lower dimensional representations of the tensor, and captures the similarity of two videos in an efficient manner. We test our approach on four public data sets and demonstrate consistent superior results over the state of the art methods, including those that use canonical correlations. 1
Discriminant Analysis of Patterns in Images, Image Ensembles, and Videos
"... This work addresses three visual classification tasks: face recognition from a single model image, object recognition by image sets (or ensembles) and action classification in videos. The work assumes that images and videos are given as 2D and 3D bounding boxes of patterns respectively, focusing on ..."
Abstract
- Add to MetaCart
(Show Context)
This work addresses three visual classification tasks: face recognition from a single model image, object recognition by image sets (or ensembles) and action classification in videos. The work assumes that images and videos are given as 2D and 3D bounding boxes of patterns respectively, focusing on classification of isolated patterns. Whereas traditional classification problems have involved a single query image and a set of model images per class, the so called Single-to-Set matching task, the three tasks require different matching strategies: Single-to-Single, Set-to-Set, and Video-to-Video matching to each of the three tasks (in the afore-mentioned order) respectively. They are difficult to tackle in conven-tional ways due to extremely limited model data and lack of principles to exploit image sets or videos as inputs. We propose novel methods of Discriminant Analysis (DA) for tackling the problems concerned. Discriminant Analysis (DA) is a well-established method of classification that approaches and often outperforms more complex modern methods. Owing to its simplic-ity and powerfulness as a statistical representation method, Discriminant Analysis (DA)
NON-INVASIVE DATA ACQUISITION FOR USER MODELING
"... Abstract: One of the most impending drawbacks of modern information technology and communication devices is the problem of user interfacing. The focus of the research community to tackle this problem is almost exclusively set to the concept of personalization based on user modeling procedures. The m ..."
Abstract
- Add to MetaCart
Abstract: One of the most impending drawbacks of modern information technology and communication devices is the problem of user interfacing. The focus of the research community to tackle this problem is almost exclusively set to the concept of personalization based on user modeling procedures. The major difficulty of user modeling procedures is the required input data about the modeled user. Studies and practical experiences show that, on one hand, the efficiency of user modeling relies on acquired data about the user, and on the other hand, it is extremely irritating for user to constantly provide such data about her or his feelings and behavior. Therefore there is a need for non-intrusive and non-irritating data collection about the user. There are several approaches such us distributed sensors and real time analysis of video capturing the user’s behavior with the intention of identifying specific information about user’s behavior. The problem statement of noninvasive data acquisition for user modeling along with a typical system architecture will be presented in the paper. A set of unavoidable information processing techniques in this context will be introduced. The proposed approach will be further detailed by data acquisition based on a real time video analysis and higher order post processing of gathered user data. A major part of the paper is focused on post processing of user data in terms of geometrical, topological and logical analysis with the aim of achieving an effective user model.