Results 1 -
3 of
3
A Large-scale Benchmark Dataset for Event Recognition in Surveillance Video
"... We introduce a new large-scale video dataset designed to assess the performance of diverse visual event recognition algorithms with a focus on continuous visual event recognition (CVER) in outdoor areas with wide coverage. Previous datasets for action recognition are unrealistic for real-world surve ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We introduce a new large-scale video dataset designed to assess the performance of diverse visual event recognition algorithms with a focus on continuous visual event recognition (CVER) in outdoor areas with wide coverage. Previous datasets for action recognition are unrealistic for real-world surveillance because they consist of short clips showing one action by one individual [15, 8]. Datasets have been developed for movies [11] and sports [12], but, these actions and scene conditions do not apply effectively to surveillance videos. Our dataset consists of many outdoor scenes with actions occurring naturally by non-actors in continuously captured videos of the real world. The dataset includes large numbers of instances for 23 event types distributed throughout 29 hours of video. This data is accompanied by detailed annotations which include both moving object tracks and event examples, which will provide solid basis for large-scale evaluation. Additionally, we propose different types of evaluation modes for visual recognition tasks and evaluation metrics along with our preliminary experimental results. We believe that this dataset will stimulate diverse aspects of computer vision research and help us to advance the CVER tasks in the years ahead. 1.
Action Recognition from One Example
, 2009
"... We present a novel action recognition method based on space-time locally adaptive regression kernels and the matrix cosine similarity measure. The proposed method uses a single example of an action to find similar matches. It does not require prior knowledge about actions; foreground/background segm ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present a novel action recognition method based on space-time locally adaptive regression kernels and the matrix cosine similarity measure. The proposed method uses a single example of an action to find similar matches. It does not require prior knowledge about actions; foreground/background segmentation, or any motion estimation or tracking. Our method is based on the computation of novel space-time descriptors from a query video, which measure the likeness of a voxel to its surroundings. Salient features are extracted from said descriptors and compared against analogous features from the target video. This comparison is done using a matrix generalization of the cosine similarity measure. The algorithm yields a scalar resemblance volume, with each voxel indicating the likelihood of similarity between the query video and all cubes in the target video. Using nonparametric significance tests and non-maxima suppression, we detect the presence and location of actions similar to the query video. High performance is demonstrated on challenging sets of action data containing fast motions, varied contexts, and even when multiple complex actions occur simultaneously within the field of view. Further experiments on the Weizmann and KTH datasets demonstrate state-of-the-art performance in action categorization, despite the use of only a single example.
Generalized Time Warping for Alignment of Human Behavior
"... Temporal alignment of human motion performing similar activities has been a topic of recent interest due to its many applications in animation, tele-rehabilitation or activity recognition. This paper presents generalized time warping (GTW), an extension of dynamic time warping (DTW) for temporally a ..."
Abstract
- Add to MetaCart
Temporal alignment of human motion performing similar activities has been a topic of recent interest due to its many applications in animation, tele-rehabilitation or activity recognition. This paper presents generalized time warping (GTW), an extension of dynamic time warping (DTW) for temporally aligning multi-modal sequences from multiple subjects performing similar activities. GTW solves three major drawbacks of existing approaches based on DTW: (1) GTW provides a feature weighting layer to adapt different modalities (e.g., video and motion capture data), (2) GTW extends DTW by allowing a more flexible time warping as combination of monotonic functions, (3) unlike DTW that typically has a quadratic cost, GTW has linear complexity in terms of the length of the sequence. Experimental results demonstrate that GTW can efficiently solve the multi-modal temporal alignment problem, and outperforms state-of-the-art methods for temporal alignment of signals with the same modality. 1.

