Results 1 - 10
of
10
Spatio-temporal covariance descriptors for action and gesture recognition
- In IEEE Workshop on Applications of Computer Vision (WACV
, 2013
"... We propose a new action and gesture recognition method based on spatio-temporal covariance descriptors and a weighted Riemannian locality preserving projection approach that takes into account the curved space formed by the descriptors. The weighted projection is then exploited during boosting to cr ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
We propose a new action and gesture recognition method based on spatio-temporal covariance descriptors and a weighted Riemannian locality preserving projection approach that takes into account the curved space formed by the descriptors. The weighted projection is then exploited during boosting to create a final multiclass classification algorithm that employs the most useful spatio-temporal regions. We also show how the descriptors can be computed quickly through the use of integral video representations. Experiments on the UCF sport, CK+ facial expression and Cambridge hand gesture datasets indicate superior performance of the proposed method compared to several recent state-of-the-art techniques. The proposed method is robust and does not require additional processing of the videos, such as foreground detection, interest-point detection or tracking. 1.
Improved foreground detection via block-based classifier cascade with probabilistic decision integration
- IEEE Transactions on Circuits and Systems for Video Technology
, 2013
"... Abstract—Foreground detection (also known as background subtraction) is a fundamental low-level processing task in numerous computer vision applications. The vast majority of algorithms in the literature process images on a pixel-by-pixel basis, where an independent decision is made for each pixel. ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
(Show Context)
Abstract—Foreground detection (also known as background subtraction) is a fundamental low-level processing task in numerous computer vision applications. The vast majority of algorithms in the literature process images on a pixel-by-pixel basis, where an independent decision is made for each pixel. A general limitation of such processing is that rich contextual information is not taken into account. We propose a block-based method capable of dealing with noise, illumination variations and dynamic backgrounds, while still obtaining smooth contours of foreground objects. Specifically, image sequences are analysed on an overlapping block-by-block basis. A low-dimensional texture descriptor obtained from each block is passed through an adaptive classifier cascade, where each stage handles a distinct problem. A probabilistic foreground mask generation approach then exploits block overlaps to integrate interim block-level decisions into final pixel-level foreground segmentation. Unlike many pixel-based methods, ad-hoc post-processing of foreground masks is not required. Experiments on the difficult Wallflower and I2R datasets show that the proposed method obtains on average better results (both qualitatively and quantitatively) than several prominent methods available in the literature. We furthermore propose the use of tracking performance as an unbiased approach for assessing the practical usefulness of foreground segmentation methods, and show that the proposed method leads to considerable improvements in object tracking accuracy on the CAVIAR dataset. Index Terms—foreground detection, segmentation, background modelling, background subtraction, patch analysis. I.
Online Dominant and Anomalous Behavior Detection in Videos
"... We present a novel approach for video parsing and si-multaneous online learning of dominant and anomalous behaviors in surveillance videos. Dominant behaviors are those occurring frequently in videos and hence, usually do not attract much attention. They can be characterized by different complexitie ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
(Show Context)
We present a novel approach for video parsing and si-multaneous online learning of dominant and anomalous behaviors in surveillance videos. Dominant behaviors are those occurring frequently in videos and hence, usually do not attract much attention. They can be characterized by different complexities in space and time, ranging from a scene background to human activities. In contrast, an anomalous behavior is defined as having a low likelihood of occurrence. We do not employ any models of the entities in the scene in order to detect these two kinds of behaviors. In this paper, video events are learnt at each pixel with-out supervision using densely constructed spatio-temporal video volumes. Furthermore, the volumes are organized into large contextual graphs. These compositions are em-ployed to construct a hierarchical codebook model for the dominant behaviors. By decomposing spatio-temporal con-textual information into unique spatial and temporal con-texts, the proposed framework learns the models of the dom-inant spatial and temporal events. Thus, it is ultimately capable of simultaneously modeling high-level behaviors as well as low-level spatial, temporal and spatio-temporal pixel level changes. 1.
Log-Euclidean bag of words for human action recognition,” IET Computer Vision
"... Representing videos by densely extracted local space-time features has recently become a popular approach for analysing actions. In this paper, we tackle the problem of categorising human actions by devising Bag of Words (BoW) models based on covariance matrices of spatio-temporal features, with the ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Representing videos by densely extracted local space-time features has recently become a popular approach for analysing actions. In this paper, we tackle the problem of categorising human actions by devising Bag of Words (BoW) models based on covariance matrices of spatio-temporal features, with the features formed from histograms of optical flow. Since covariance matrices form a special type of Riemannian manifold, the space of Symmetric Positive Definite (SPD) matrices, non-Euclidean geometry should be taken into account while discriminating be-tween covariance matrices. To this end, we propose to embed SPD manifolds to Euclidean spaces via a diffeomor-phism and extend the BoW approach to its Riemannian version. The proposed BoW approach takes into account the manifold geometry of SPD matrices during the generation of the codebook and histograms. Experiments on challenging human action datasets show that the proposed method obtains notable improvements in discrimination accuracy, in comparison to several state-of-the-art methods.
by
, 2012
"... An Approach to detecting crowd anomalies for entrance and checkpoint security ..."
Abstract
- Add to MetaCart
(Show Context)
An Approach to detecting crowd anomalies for entrance and checkpoint security
Panic Detection in Human Crowds using Sparse Coding
"... c ○ Abhishek Kumar 2012I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. Recently, the surveillance of huma ..."
Abstract
- Add to MetaCart
(Show Context)
c ○ Abhishek Kumar 2012I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. Recently, the surveillance of human activities has drawn a lot of attention from the research community and the camera based surveillance is being tried with the aid of computers. Surveillance is required to detect abnormal or unwanted activities. Such abnormal activities are very infrequent as compared to regular activities. At present, surveillance is done manually, where the job of operators is to watch a set of surveillance video screens to discover an abnormal event. This is expensive and prone to error. The limitation of these surveillance systems can be effectively removed if an automated anomaly detection system is designed. With powerful computers, computer vision is being seen as a panacea for surveillance. A computer vision aided anomaly detection system will enable the selection of those video frames which contain an anomaly, and only those selected frames will be used for manual verifications.
AUTOMATED CROWD BEHAVIOR ANALYSIS FOR VIDEO SURVEILLANCE APPLICATIONS
, 2012
"... Assist.Prof. Dr. Alptekin Temizel ..."
(Show Context)
Rights Creative Commons: Attribution 3.0 Hong Kong License
"... Abstract. Amotion texture is an instantaneous motion map extracted from a dynamic texture. We observe that such motion maps exhibit values of two types: a discrete component at zero (absence of motion) and continuous motion values. We thus develop a mixed-state Markov random field model to represent ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Amotion texture is an instantaneous motion map extracted from a dynamic texture. We observe that such motion maps exhibit values of two types: a discrete component at zero (absence of motion) and continuous motion values. We thus develop a mixed-state Markov random field model to represent motion textures. The core of our approach is to show that motion information is powerful enough to classify and segment dynamic textures if it is properly modeled regarding its specific nature and the local interactions involved. A parsimonious set of 11 parameters constitutes the descriptive feature of a motion texture. The motivation of the proposed formulation runs toward the analysis of dynamic video contents, and we tackle two related problems. First, we present a method for recognition and classification of motion textures, by means of the Kullback–Leibler distance between mixed-state statistical models. Second, we define a two-frame motion texture maximum a posteriori (MAP)-based segmentation method applicable to motion textures with deforming boundaries. We also investigate a new issue, the space-time dynamic texture segmentation, by combining the spatial segmentation and the recognition methods. Numerous experimental results are reported for those
Research Article Anomaly Detection Based on Local Nearest Neighbor Distance Descriptor in Crowded Scenes
"... Copyright © 2014 Xing Hu et al.This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We propose a novel local nearest neighbor distance (LN ..."
Abstract
- Add to MetaCart
Copyright © 2014 Xing Hu et al.This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We propose a novel local nearest neighbor distance (LNND) descriptor for anomaly detection in crowded scenes. Comparing with the commonly used low-level feature descriptors in previous works, LNND descriptor has two major advantages. First, LNND descriptor efficiently incorporates spatial and temporal contextual information around the video event that is important for detecting anomalous interaction among multiple events, while most existing feature descriptors only contain the information of single event. Second, LNND descriptor is a compact representation and its dimensionality is typically much lower than the low-level feature descriptor. Therefore, not only the computation time and storage requirement can be accordingly saved by using LNND descriptor for the anomaly detection method with offline training fashion, but also the negative aspects caused by using high-dimensional feature descriptor can be avoided. We validate the effectiveness of LNND descriptor by conducting extensive experiments on different benchmark datasets. Experimental results show the promising performance of LNND-based method against the state-of-the-artmethods. It is worthwhile to notice that the LNND-based approach requires less intermediate processing steps without any subsequent processing such as smoothing but achieves comparable event better performance. 1.