Results 1  10
of
11
MiddleLevel Representation for Human Activities Recognition
 the Role of Spatiotemporal Relationships,” in ECCV Workshop on Human Motion
, 2010
"... Abstract. We tackle the challenging problem of human activity recognition in realistic video sequences. Unlike local featuresbased methods or global templatebased methods, we propose to represent a video sequence by a set of middlelevel parts. A part, or component, has consistent spatial struct ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We tackle the challenging problem of human activity recognition in realistic video sequences. Unlike local featuresbased methods or global templatebased methods, we propose to represent a video sequence by a set of middlelevel parts. A part, or component, has consistent spatial structure and consistent motion. We first segment the visual motion patterns and generate a set of middlelevel components by clustering keypointsbased trajectories extracted from the video. To further exploit the interdependencies of the moving parts, we then define spatiotemporal relationships between pairwise components. The resulting descriptive middlelevel components and pairwisecomponents thereby catch the essential motion characteristics of human activities. They also give a very compact representation of the video. We apply our framework on popular and challenging video datasets: Weizmann dataset and UTInteraction dataset. We demonstrate experimentally that our middlelevel representation combined with a χ2SVM classifier equals to or outperforms the stateoftheart results on these dataset. 1
Crossview Activity Recognition using Hankelets
 In Proc. IEEE Conference on Computer Vision and Pattern Recognition
, 2012
"... Human activity recognition is central to many practical applications, ranging from visual surveillance to gaming interfacing. Most approaches addressing this problem are based on localized spatiotemporal features that can vary significantly when the viewpoint changes. As a result, their performance ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Human activity recognition is central to many practical applications, ranging from visual surveillance to gaming interfacing. Most approaches addressing this problem are based on localized spatiotemporal features that can vary significantly when the viewpoint changes. As a result, their performances rapidly deteriorate as the difference between the viewpoints of the training and testing data increases. In this paper, we introduce a new type of feature, the “Hankelet” that captures dynamic properties of short tracklets. While Hankelets do not carry any spatial information, they bring invariant properties to changes in viewpoint that allow for robust crossview activity recognition, i.e. when actions are recognized using a classifier trained on data from a different viewpoint. Our experiments on the IXMAS dataset show that using Hanklets improves the state of the art performance by over 20%. 1.
An Unsupervised Framework for Action Recognition Using Actemes
"... Abstract. In speech recognition, phonemes have demonstrated their efficacy to model the words of a language. While they are well defined for languages, their extension to human actions is not straightforward. In this paper, we study such an extension and propose an unsupervised framework to find pho ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract. In speech recognition, phonemes have demonstrated their efficacy to model the words of a language. While they are well defined for languages, their extension to human actions is not straightforward. In this paper, we study such an extension and propose an unsupervised framework to find phonemelike units for actions, which we call actemes, using 3D data and without any prior assumptions. To this purpose, build on an earlier proposed framework in speech literature to automatically find actemes in the training data. We experimentally show that actions defined in terms of actemes and actions defined by whole units give similar recognition results. We define actions out of the training set in terms of these actemes to see whether the actemes generalize to unseen actions. The results show that although the acteme definitions of the actions are not always semantically meaningful, they yield optimal recognition accuracy and constitute a promising direction of research for action modeling. 1
Human gesture recognition on product manifolds
 Journal of Machine Learning Research
"... Action videos are multidimensional data and can be naturally represented as data tensors. While tensor computing is widely used in computer vision, the geometry of tensor space is often ignored. The aim of this paper is to demonstrate the importance of the intrinsic geometry of tensor space which yi ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Action videos are multidimensional data and can be naturally represented as data tensors. While tensor computing is widely used in computer vision, the geometry of tensor space is often ignored. The aim of this paper is to demonstrate the importance of the intrinsic geometry of tensor space which yields a very discriminating structure for action recognition. We characterize data tensors as points on a product manifold and model it statistically using least squares regression. To this aim, we factorize a data tensor relating to each order of the tensor using Higher Order Singular Value Decomposition (HOSVD) and then impose each factorized element on a Grassmann manifold. Furthermore, we account for underlying geometry on manifolds and formulate least squares regression as a composite function. This gives a natural extension from Euclidean space to manifolds. Consequently, classification is performed using geodesic distance on a product manifold where each factor manifold is Grassmannian. Our method exploits appearance and motion without explicitly modeling the shapes and dynamics. We assess the proposed method using three gesture databases, namely the Cambridge handgesture, the UMD Keck bodygesture, and the CHALEARN gesture challenge data sets. Experimental results reveal that not only does the proposed method perform well on the standard benchmark data sets, but also it generalizes well on the oneshotlearning gesture challenge. Furthermore, it is based on a simple statistical model and the intrinsic geometry of tensor space.
Research Article Behavioural Analysis with Movement Cluster Model for Concurrent Actions
"... which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We present an approach to model articulated human movements and to analyse their behavioural semantics. First, we describe a novel dynamic and behavioural model that uses movem ..."
Abstract
 Add to MetaCart
(Show Context)
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We present an approach to model articulated human movements and to analyse their behavioural semantics. First, we describe a novel dynamic and behavioural model that uses movements, a sequence of consecutive poses, from motion captured video data to establish priors for both tracking and behavioural analysis. Second, using that model, we show how we can both learn and subsequently recognise human activity. Activities are modelled and recognised independently to allow concurrent and complex actions. Finally, we combine activity recognition with tracking to produce an overall evaluation of the effectiveness of the approach using publicly available datasets.
Recognition of Complex Events in OpenSource WebScale Videos
"... Recognition of complex events in unconstrained Internet videos is a challenging research problem. In this symposium proposal, we present a systematic decomposition of complex events into hierarchical components and make an indepth analysis of how existing research are being used to cater to various ..."
Abstract
 Add to MetaCart
(Show Context)
Recognition of complex events in unconstrained Internet videos is a challenging research problem. In this symposium proposal, we present a systematic decomposition of complex events into hierarchical components and make an indepth analysis of how existing research are being used to cater to various levels of this hierarchy. We also identify three key stages where we make novel contributions which are necessary to not only improve the overall recognition performance, but also develop richer understanding of these events. At the lowest level, our contributions include (a) compact covariance descriptors of appearance and motion features used in sparse coding framework to recognize realistic actions and gestures, and (b) a Liealgebra based representation of dominant camera motion present in video shots which can be used as a complementary feature for video analysis. In the next level, we propose an (c) efficient maximum likelihood estimate based representation from lowlevel features computed from videos which demonstrates state of the art performance in large scale visual concept detection, and finally, we propose to (d) model temporal interactions between concepts detected in video shots through two new discriminative feature spaces derived from Linear dynamical systems which eventually boosts event recognition performance. In all cases, we conduct thorough experiments to demonstrate promising performance gains over some of the prominent approaches.
Statistical Computations on Grassmann and 1 Stiefel manifolds for Image and VideoBased Recognition
, 2010
"... In this paper, we examine image and video based recognition applications where the underlying models have a special structure – the linear subspace structure. We discuss how commonly used parametric models for videos and imagesets can be described using the unified framework of Grassmann and Stiefe ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper, we examine image and video based recognition applications where the underlying models have a special structure – the linear subspace structure. We discuss how commonly used parametric models for videos and imagesets can be described using the unified framework of Grassmann and Stiefel manifolds. We first show that the parameters of linear dynamic models are finite dimensional linear subspaces of appropriate dimensions. Unordered imagesets as samples from a finitedimensional linear subspace naturally fall under this framework. We show that the study of inference over subspaces can be naturally cast as an inference problem on the Grassmann manifold. To perform recognition using subspacebased models, we need tools from the Riemannian geometry of the Grassmann manifold. This involves a study of the geometric properties of the space, appropriate definitions of Riemannian metrics, and definition of geodesics. Further, we derive statistical modeling of inter and intraclass variations that respect the geometry of the space. We apply techniques such as intrinsic and extrinsic statistics, to enable maximumlikelihood classification. We also provide algorithms for unsupervised clustering derived from the geometry of the manifold. Finally, we demonstrate the improved performance of these methods in a wide variety of vision applications such as activity A preliminary version of this paper appeared in [1].
Statistical Computations on Grassmann and Stiefel Manifolds for Image and VideoBased Recognition
, 2011
"... In this paper, we examine image and video based recognition applications where the underlying models have a special structure the linear subspace structure. We discuss how commonly used parametric models for videos and imagesets can be described using the unified framework of Grassmann and Stiefel ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper, we examine image and video based recognition applications where the underlying models have a special structure the linear subspace structure. We discuss how commonly used parametric models for videos and imagesets can be described using the unified framework of Grassmann and Stiefel manifolds. We first show that the parameters of linear dynamic models are finite dimensional linear subspaces of appropriate dimensions. Unordered imagesets as samples from a finitedimensional linear subspace naturally fall under this framework. We show that the study of inference over subspaces can be naturally cast as an inference problem on the Grassmann manifold. To perform recognition using subspacebased models, we need tools from the Riemannian geometry of the Grassmann manifold. This involves a study of the geometric properties of the space, appropriate definitions of Riemannian metrics, and definition of geodesics. Further, we derive statistical modeling of inter and intraclass variations that respect the geometry of the space. We apply techniques such as intrinsic and extrinsic statistics, to enable maximumlikelihood classification. We also provide algorithms for unsupervised clustering derived from the geometry of the manifold.
its applications to Video Analysis
, 2010
"... The analysis and interpretation of video data is an important component of modern vision applications such as biometrics, surveillance, motionsynthesis and webbased user interfaces. A common requirement among these very different applications is the ability to learn statistical models of appearance ..."
Abstract
 Add to MetaCart
(Show Context)
The analysis and interpretation of video data is an important component of modern vision applications such as biometrics, surveillance, motionsynthesis and webbased user interfaces. A common requirement among these very different applications is the ability to learn statistical models of appearance and motion from a collection of videos, and then use them for recognizing actions or persons in a new video. These applications in video analysis require statistical inference methods to be devised on nonEuclidean spaces or more formally on manifolds. This chapter outlines a broad survey of applications in video analysis that involve manifolds. We develop the required mathematical tools needed to perform statistical inference on manifolds and show their effectiveness in real videounderstanding applications.
Contents lists available at SciVerse ScienceDirect Image and Vision Computing
"... journal homepage: www.elsevier.com/locate/imavis ..."
(Show Context)