Results 1 - 10
of
14
Recognition and localization of relevant human behavior in videos, SPIE,
, 2013
"... ABSTRACT Ground surveillance is normally performed by human assets, since it requires visual intelligence. However, especially for military operations, this can be dangerous and is very resource intensive. Therefore, unmanned autonomous visualintelligence systems are desired. In this paper, we pres ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
(Show Context)
ABSTRACT Ground surveillance is normally performed by human assets, since it requires visual intelligence. However, especially for military operations, this can be dangerous and is very resource intensive. Therefore, unmanned autonomous visualintelligence systems are desired. In this paper, we present an improved system that can recognize actions of a human and interactions between multiple humans. Central to the new system is our agent-based architecture. The system is trained on thousands of videos and evaluated on realistic persistent surveillance data in the DARPA Mind's Eye program, with hours of videos of challenging scenes. The results show that our system is able to track the people, detect and localize events, and discriminate between different behaviors, and it performs 3.4 times better than our previous system.
Multi-Task Sparse Learning with Beta Process Prior for Action Recognition
"... In this paper, we formulate human action recognition as a novel Multi-Task Sparse Learning(MTSL) framework which aims to construct a test sample with multiple fea-tures from as few bases as possible. Learning the sparse representation under each feature modality is considered as a single task in MTS ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
In this paper, we formulate human action recognition as a novel Multi-Task Sparse Learning(MTSL) framework which aims to construct a test sample with multiple fea-tures from as few bases as possible. Learning the sparse representation under each feature modality is considered as a single task in MTSL. Since the tasks are generated from multiple features associated with the same visual in-put, they are not independent but inter-related. We intro-duce a Beta process(BP) prior to the hierarchical MTSL model, which efficiently learns a compact dictionary and infers the sparse structure shared across all the tasks. The MTSL model enforces the robustness in coefficient estima-tion compared with performing each task independently. Besides, the sparseness is achieved via the Beta process for-mulation rather than the computationally expensive l1 norm penalty. In terms of non-informative gamma hyper-priors, the sparsity level is totally decided by the data. Finally, the learning problem is solved by Gibbs sampling inference which estimates the full posterior on the model parameters. Experimental results on the KTH and UCF sports datasets demonstrate the effectiveness of the proposed MTSL ap-proach for action recognition. 1.
Motion Binary Patterns for Action Recognition ∗
"... In this paper, we propose a novel feature type to recognize human actions from video data. By combining the benefit of Volume Local Binary Patterns and Optical Flow, a simple and efficient descriptor is constructed. Motion Binary Patterns (MBP) are computed in spatio-temporal domain while static obj ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this paper, we propose a novel feature type to recognize human actions from video data. By combining the benefit of Volume Local Binary Patterns and Optical Flow, a simple and efficient descriptor is constructed. Motion Binary Patterns (MBP) are computed in spatio-temporal domain while static object appearances as well as motion information are gathered. Histograms are used to learn a Random Forest classifier which is applied to the task of human action recognition. The proposed framework is evaluated on the well-known, publicly available KTH dataset, Weizman dataset and on the IXMAS dataset for multi-view action recognition. The results demonstrate state-of-the-art accuracies in comparison to other methods. 1
ABSTRACT Title of dissertation: Analyzing Complex Events and Human Actions
"... We are living in a world where it is easy to acquire videos of events ranging from private picnics to public concerts, and to share them publicly via websites such as YouTube. The ability of smart-phones to create these videos and upload them to the internet has led to an explosion of video data, wh ..."
Abstract
- Add to MetaCart
(Show Context)
We are living in a world where it is easy to acquire videos of events ranging from private picnics to public concerts, and to share them publicly via websites such as YouTube. The ability of smart-phones to create these videos and upload them to the internet has led to an explosion of video data, which in turn has led to interesting research directions involving the analysis of “in-the-wild ” videos. To process these types of videos, various recognition tasks such as pose estimation, action recognition, and event recognition become important in computer vision. This thesis presents various recognition problems and proposes mid-level models to address them. First, a discriminative deformable part model is presented for the recovery of qualitative pose, inferring coarse pose labels (e:g: left, front-right, back), a task more robust to common confounding factors that hinder the inference of exact 2D or 3D joint locations. Our approach automatically selects parts that are predictive of qualitative pose and trains their appearance and deformation costs to best dis-criminate between qualitative poses. Unlike previous approaches, our parts are both selected and trained to improve qualitative pose discrimination and are shared by
Extracting Latent Attributes from Video Scenes Using Text as Background Knowledge
"... We explore the novel task of identify-ing latent attributes in video scenes, such as the mental states of actors, using only large text collections as background knowledge and minimal information about the videos, such as activity and actor types. We formalize the task and a measure of merit that ac ..."
Abstract
- Add to MetaCart
We explore the novel task of identify-ing latent attributes in video scenes, such as the mental states of actors, using only large text collections as background knowledge and minimal information about the videos, such as activity and actor types. We formalize the task and a measure of merit that accounts for the semantic re-latedness of mental state terms. We de-velop and test several largely unsupervised information extraction models that iden-tify the mental states of human partici-pants in video scenes. We show that these models produce complementary informa-tion and their combination significantly outperforms the individual models as well as other baseline methods. 1
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 Discriminative Non-Linear Stationary Subspace Analysis for Video Classification
"... Abstract—Low-dimensional representations are key to the success of many video classification algorithms. However, the commonly-used dimensionality reduction techniques fail to account for the fact that only part of the signal is shared across all the videos in one class. As a consequence, the result ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Low-dimensional representations are key to the success of many video classification algorithms. However, the commonly-used dimensionality reduction techniques fail to account for the fact that only part of the signal is shared across all the videos in one class. As a consequence, the resulting representations contain instance-specific information, which introduces noise in the classification process. In this paper, we introduce Non-Linear Stationary Subspace Analysis: A method that overcomes this issue by explicitly separating the stationary parts of the video signal (i.e., the parts shared across all videos in one class), from its non-stationary parts (i.e., the parts specific to individual videos). Our method also encourages the new representation to be discriminative, thus accounting for the underlying classification problem. We demonstrate the effectiveness of our approach on dynamic texture recognition, scene classification and action recognition.
Chapter 9 Action Recognition in Realistic Sports Videos
"... Abstract The ability to analyze the actions which occur in a video is essential for automatic understanding of sports. Action localization and recognition in videos are two main research topics in this context. In this chapter, we provide a detailed study of the prominent methods devised for these t ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract The ability to analyze the actions which occur in a video is essential for automatic understanding of sports. Action localization and recognition in videos are two main research topics in this context. In this chapter, we provide a detailed study of the prominent methods devised for these two tasks which yield superior results for sports videos. We adopt UCF Sports, which is a dataset of realistic sports videos collected from broadcast television channels, as our evaluation benchmark. First, we present an overview of UCF Sports along with comprehensive statistics of the techniques tested on this dataset as well as the evolution of their performance over time. To provide further details about the existing action recognition methods in this area, we decompose the action recognition framework into three main steps of feature extraction, dictionary learning to represent a video, and classification; we overview several successful techniques for each of these steps. We also overview the problem of spatio-temporal localization of actions and argue that, in general, it manifests a more challenging problem compared to action recognition. We study several recent methods for action localization which have shown promising results on sports videos. Finally, we discuss a number of forward-thinking insights drawn from overviewing the action recognition and localization methods. In particular, we argue that performing the recognition on temporally untrimmed videos and attempting to describe an action, instead of conducting a forced-choice classification, are essential for analyzing the human actions in a realistic environment.
Action Recognition in the Frequency Domain∗
"... In this paper, we describe a simple strategy for miti-gating variability in temporal data series by shifting fo-cus onto long-term, frequency domain features that are less susceptible to variability. We apply this method to the human action recognition task and demonstrate how working in the frequen ..."
Abstract
- Add to MetaCart
In this paper, we describe a simple strategy for miti-gating variability in temporal data series by shifting fo-cus onto long-term, frequency domain features that are less susceptible to variability. We apply this method to the human action recognition task and demonstrate how working in the frequency domain can yield good recog-nition features for commonly used optical flow and ar-ticulated pose features, which are highly sensitive to small differences in motion, viewpoint, dynamic back-grounds, occlusion and other sources of variability. We show how these frequency-based features can be used in combination with a simple forest classifier to achieve good and robust results on the popular KTH Actions dataset. 1
Thresholding a Random Forest Classifier∗
"... Abstract. The original Random Forest derives the final result with respect to the number of leaf nodes voted for the corresponding class. Each leaf node is treated equally and the class with the most number of votes wins. Certain leaf nodes in the topology have better classification accuracies and o ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. The original Random Forest derives the final result with respect to the number of leaf nodes voted for the corresponding class. Each leaf node is treated equally and the class with the most number of votes wins. Certain leaf nodes in the topology have better classification accuracies and others often lead to a wrong decision. Also the performance of the forest for different classes dif-fers due to uneven class proportions. In this work, a novel voting mechanism is introduced: each leaf node has an individual weight. The final decision is not determined by majority voting but rather by a linear combination of individual weights leading to a better and more robust decision. This method is inspired by the construction of a strong classifier using a linear combination of small rules of thumb (AdaBoost). Small fluctuations which are caused by the use of binary decision trees are better balanced. Experimental results on several datasets for ob-ject recognition and action recognition demonstrate that our method successfully improves the classification accuracy of the original Random Forest algorithm. 1
Computation Strategies for Volume Local Binary Patterns applied to Action Recognition⇤
"... Volume Local Binary Patterns are a well-known fea-ture type to describe object characteristics in the spatio-temporal domain. Apart from the computation of a binary pattern further steps are required to create a discrimina-tive feature. In this paper we propose different computation methods for Volu ..."
Abstract
- Add to MetaCart
(Show Context)
Volume Local Binary Patterns are a well-known fea-ture type to describe object characteristics in the spatio-temporal domain. Apart from the computation of a binary pattern further steps are required to create a discrimina-tive feature. In this paper we propose different computation methods for Volume Local Binary Patterns. These methods are evaluated in detail and the best strategy is shown. A Random Forest is used to find discriminative patterns. The proposed methods are applied to the well-known and pub-licly available KTH dataset and Weizman dataset for single-view action recognition and to the IXMAS dataset for multi-view action recognition. Furthermore, a comparison of the proposed framework to state-of-the-art methods is given. 1.