• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Scalable action recognition with a subspace forest,” CVPR, (1210)

by S O’Hara, B Draper
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 14
Next 10 →

Recognition and localization of relevant human behavior in videos, SPIE,

by Henri Bouma , Gertjan Burghouts , Leo De Penning , Patrick Hanckmann , Johan-Martijn Ten Hove , Sanne Korzec , Maarten Kruithof , Sander Landsmeer , Coen Van Leeuwen , Sebastiaan Van Den Broek , 2013
"... ABSTRACT Ground surveillance is normally performed by human assets, since it requires visual intelligence. However, especially for military operations, this can be dangerous and is very resource intensive. Therefore, unmanned autonomous visualintelligence systems are desired. In this paper, we pres ..."
Abstract - Cited by 4 (3 self) - Add to MetaCart
ABSTRACT Ground surveillance is normally performed by human assets, since it requires visual intelligence. However, especially for military operations, this can be dangerous and is very resource intensive. Therefore, unmanned autonomous visualintelligence systems are desired. In this paper, we present an improved system that can recognize actions of a human and interactions between multiple humans. Central to the new system is our agent-based architecture. The system is trained on thousands of videos and evaluated on realistic persistent surveillance data in the DARPA Mind's Eye program, with hours of videos of challenging scenes. The results show that our system is able to track the people, detect and localize events, and discriminate between different behaviors, and it performs 3.4 times better than our previous system.
(Show Context)

Citation Context

...dy by Burghouts [15]. Recently, new results have become available on IXMAS [22][25][44][44] and UT-Interaction [49]. Our literature overview shows a wide variety of ideas for automatic action recognition – including unsupervised learning, simultaneous analysis, pose estimation and high-level representations – which are discussed in more detail below. O’Hara e.a. [34] presented a method for unsupervised learning and recognition of human actions in video. In their experiments, the product manifolds perform better than bag-of-features for clustering video clips of actions. In another publication [33], the subspace forest was presented, designed to provide an efficient approximate nearest neighbor query of subspaces represented as points on Grassmann manifolds, and applied to action recognition. Truyen e.a. [43] also tried to avoid completely supervised training. They proposed an approach based on semi-supervised training of partially hidden discriminative models such as the conditional random field (CRF) and the maximum entropy Markov model (MEMM). Barbu e.a. [5] proposed a method for simultaneous object detection, tracking, and event recognition. Many person and object detectors, e.g. th...

Multi-Task Sparse Learning with Beta Process Prior for Action Recognition

by Chunfeng Yuan, Weiming Hu, Guodong Tian, Shuang Yang, Haoran Wang
"... In this paper, we formulate human action recognition as a novel Multi-Task Sparse Learning(MTSL) framework which aims to construct a test sample with multiple fea-tures from as few bases as possible. Learning the sparse representation under each feature modality is considered as a single task in MTS ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
In this paper, we formulate human action recognition as a novel Multi-Task Sparse Learning(MTSL) framework which aims to construct a test sample with multiple fea-tures from as few bases as possible. Learning the sparse representation under each feature modality is considered as a single task in MTSL. Since the tasks are generated from multiple features associated with the same visual in-put, they are not independent but inter-related. We intro-duce a Beta process(BP) prior to the hierarchical MTSL model, which efficiently learns a compact dictionary and infers the sparse structure shared across all the tasks. The MTSL model enforces the robustness in coefficient estima-tion compared with performing each task independently. Besides, the sparseness is achieved via the Beta process for-mulation rather than the computationally expensive l1 norm penalty. In terms of non-informative gamma hyper-priors, the sparsity level is totally decided by the data. Finally, the learning problem is solved by Gibbs sampling inference which estimates the full posterior on the model parameters. Experimental results on the KTH and UCF sports datasets demonstrate the effectiveness of the proposed MTSL ap-proach for action recognition. 1.
(Show Context)

Citation Context

...asets. Years KTH UCF Yeffet et al. [22] 2009 90.1 79.2 Wang et al. [23] 2009 92.1 85.6 Kovashka et al. [24] 2010 94.53 87.27 Le et al. [25] 2011 93.9 86.5 Wang et al. [16] 2011 94.2 88.2 OHara et al. =-=[20]-=- 2012 97.9 91.32 Wang et al. [21] 2012 79.8 - Raptis et al. [19] 2012 - 79.4 Our approach 97.05 92.67 fication method is employed on the resulting long feature vector. This method is symbolized by ”CF...

Motion Binary Patterns for Action Recognition ∗

by Florian Baumann, Jie Liao, Arne Ehlers, Bodo Rosenhahn
"... In this paper, we propose a novel feature type to recognize human actions from video data. By combining the benefit of Volume Local Binary Patterns and Optical Flow, a simple and efficient descriptor is constructed. Motion Binary Patterns (MBP) are computed in spatio-temporal domain while static obj ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
In this paper, we propose a novel feature type to recognize human actions from video data. By combining the benefit of Volume Local Binary Patterns and Optical Flow, a simple and efficient descriptor is constructed. Motion Binary Patterns (MBP) are computed in spatio-temporal domain while static object appearances as well as motion information are gathered. Histograms are used to learn a Random Forest classifier which is applied to the task of human action recognition. The proposed framework is evaluated on the well-known, publicly available KTH dataset, Weizman dataset and on the IXMAS dataset for multi-view action recognition. The results demonstrate state-of-the-art accuracies in comparison to other methods. 1

ABSTRACT Title of dissertation: Analyzing Complex Events and Human Actions

by In ”in-the-wild Videos
"... We are living in a world where it is easy to acquire videos of events ranging from private picnics to public concerts, and to share them publicly via websites such as YouTube. The ability of smart-phones to create these videos and upload them to the internet has led to an explosion of video data, wh ..."
Abstract - Add to MetaCart
We are living in a world where it is easy to acquire videos of events ranging from private picnics to public concerts, and to share them publicly via websites such as YouTube. The ability of smart-phones to create these videos and upload them to the internet has led to an explosion of video data, which in turn has led to interesting research directions involving the analysis of “in-the-wild ” videos. To process these types of videos, various recognition tasks such as pose estimation, action recognition, and event recognition become important in computer vision. This thesis presents various recognition problems and proposes mid-level models to address them. First, a discriminative deformable part model is presented for the recovery of qualitative pose, inferring coarse pose labels (e:g: left, front-right, back), a task more robust to common confounding factors that hinder the inference of exact 2D or 3D joint locations. Our approach automatically selects parts that are predictive of qualitative pose and trains their appearance and deformation costs to best dis-criminate between qualitative poses. Unlike previous approaches, our parts are both selected and trained to improve qualitative pose discrimination and are shared by
(Show Context)

Citation Context

...ognition rates on the YouTube sports data set. Method Accuracy (%) Wang et al. [44] 85.6 Le et al. [38] 86.5 Kovashka and Grauman [45] 87.3 Wang et al. [35] 88.2 Wu et al. [46] 91.3 O’Hara and Draper =-=[47]-=- 91.3 Todorovic [39] 92.1 Sadanand and Corso [40] 95.0 Shape 71.3 Motion 75.3 Pose 76.7 Pose + Shape 84.7 Motion + Shape 86.7 Motion + Pose 90.7 Motion + Pose + Shape 96.0 each part. Here, we set the ...

Extracting Latent Attributes from Video Scenes Using Text as Background Knowledge

by Anh Tran, Mihai Surdeanu, Paul Cohen
"... We explore the novel task of identify-ing latent attributes in video scenes, such as the mental states of actors, using only large text collections as background knowledge and minimal information about the videos, such as activity and actor types. We formalize the task and a measure of merit that ac ..."
Abstract - Add to MetaCart
We explore the novel task of identify-ing latent attributes in video scenes, such as the mental states of actors, using only large text collections as background knowledge and minimal information about the videos, such as activity and actor types. We formalize the task and a measure of merit that accounts for the semantic re-latedness of mental state terms. We de-velop and test several largely unsupervised information extraction models that iden-tify the mental states of human partici-pants in video scenes. We show that these models produce complementary informa-tion and their combination significantly outperforms the individual models as well as other baseline methods. 1

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 Discriminative Non-Linear Stationary Subspace Analysis for Video Classification

by Mahsa Baktashmotlagh, Student Member, Mehrtash Har, Brian C. Lovell, Senior Member, Mathieu Salzmann
"... Abstract—Low-dimensional representations are key to the success of many video classification algorithms. However, the commonly-used dimensionality reduction techniques fail to account for the fact that only part of the signal is shared across all the videos in one class. As a consequence, the result ..."
Abstract - Add to MetaCart
Abstract—Low-dimensional representations are key to the success of many video classification algorithms. However, the commonly-used dimensionality reduction techniques fail to account for the fact that only part of the signal is shared across all the videos in one class. As a consequence, the resulting representations contain instance-specific information, which introduces noise in the classification process. In this paper, we introduce Non-Linear Stationary Subspace Analysis: A method that overcomes this issue by explicitly separating the stationary parts of the video signal (i.e., the parts shared across all videos in one class), from its non-stationary parts (i.e., the parts specific to individual videos). Our method also encourages the new representation to be discriminative, thus accounting for the underlying classification problem. We demonstrate the effectiveness of our approach on dynamic texture recognition, scene classification and action recognition.
(Show Context)

Citation Context

...ach, it relies on features learnt from 4. The flipped version of the test video was also left out. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 13 Algorithm Accuracy Subspace Forest =-=[54]-=- 91.3% SM [53] 97.3% KFDA [26] 80.00% PCA 46.6% Kernel PCA 48% ICA 63.3% ASSA 84.6% DASSA 86.2% NLSSA - Linear Kernel 83.3% NLSSA - RBF Kernel 93.3% DNLSSA - Linear Kernel 85.3% DNLSSA - RBF Kernel 93...

Chapter 9 Action Recognition in Realistic Sports Videos

by Khurram Soomro, Amir R. Zamir, K. Soomro (b, A. R. Zamir
"... Abstract The ability to analyze the actions which occur in a video is essential for automatic understanding of sports. Action localization and recognition in videos are two main research topics in this context. In this chapter, we provide a detailed study of the prominent methods devised for these t ..."
Abstract - Add to MetaCart
Abstract The ability to analyze the actions which occur in a video is essential for automatic understanding of sports. Action localization and recognition in videos are two main research topics in this context. In this chapter, we provide a detailed study of the prominent methods devised for these two tasks which yield superior results for sports videos. We adopt UCF Sports, which is a dataset of realistic sports videos collected from broadcast television channels, as our evaluation benchmark. First, we present an overview of UCF Sports along with comprehensive statistics of the techniques tested on this dataset as well as the evolution of their performance over time. To provide further details about the existing action recognition methods in this area, we decompose the action recognition framework into three main steps of feature extraction, dictionary learning to represent a video, and classification; we overview several successful techniques for each of these steps. We also overview the problem of spatio-temporal localization of actions and argue that, in general, it manifests a more challenging problem compared to action recognition. We study several recent methods for action localization which have shown promising results on sports videos. Finally, we discuss a number of forward-thinking insights drawn from overviewing the action recognition and localization methods. In particular, we argue that performing the recognition on temporally untrimmed videos and attempting to describe an action, instead of conducting a forced-choice classification, are essential for analyzing the human actions in a realistic environment.
(Show Context)

Citation Context

...tperform them and significantly improve the overall accuracy. Several of such methods for feature extraction [16, 32, 33], action representation [26, 76], dictionary learning [53], and classification =-=[51, 57]-=- will be discussed in more detail in Sect. 9.3. 9.2.1.1 Experimental Setup The original way [57] to test on UCF Sports was to use a Leave-One-Out (LOO) cross-validation scheme. This scenario takes out...

Action Recognition in the Frequency Domain∗

by unknown authors
"... In this paper, we describe a simple strategy for miti-gating variability in temporal data series by shifting fo-cus onto long-term, frequency domain features that are less susceptible to variability. We apply this method to the human action recognition task and demonstrate how working in the frequen ..."
Abstract - Add to MetaCart
In this paper, we describe a simple strategy for miti-gating variability in temporal data series by shifting fo-cus onto long-term, frequency domain features that are less susceptible to variability. We apply this method to the human action recognition task and demonstrate how working in the frequency domain can yield good recog-nition features for commonly used optical flow and ar-ticulated pose features, which are highly sensitive to small differences in motion, viewpoint, dynamic back-grounds, occlusion and other sources of variability. We show how these frequency-based features can be used in combination with a simple forest classifier to achieve good and robust results on the popular KTH Actions dataset. 1

Thresholding a Random Forest Classifier∗

by Florian Baumann, Fangda Li, Arne Ehlers, Bodo Rosenhahn
"... Abstract. The original Random Forest derives the final result with respect to the number of leaf nodes voted for the corresponding class. Each leaf node is treated equally and the class with the most number of votes wins. Certain leaf nodes in the topology have better classification accuracies and o ..."
Abstract - Add to MetaCart
Abstract. The original Random Forest derives the final result with respect to the number of leaf nodes voted for the corresponding class. Each leaf node is treated equally and the class with the most number of votes wins. Certain leaf nodes in the topology have better classification accuracies and others often lead to a wrong decision. Also the performance of the forest for different classes dif-fers due to uneven class proportions. In this work, a novel voting mechanism is introduced: each leaf node has an individual weight. The final decision is not determined by majority voting but rather by a linear combination of individual weights leading to a better and more robust decision. This method is inspired by the construction of a strong classifier using a linear combination of small rules of thumb (AdaBoost). Small fluctuations which are caused by the use of binary decision trees are better balanced. Experimental results on several datasets for ob-ject recognition and action recognition demonstrate that our method successfully improves the classification accuracy of the original Random Forest algorithm. 1
(Show Context)

Citation Context

...known and publicly available KTH dataset [7] consists of six classes of actions. Each action is performed by 25 persons in four different scenarios. The KTH dataset consists of 599 videos. Similar to =-=[28]-=-, a fixed position bounding box with a temporal window of 24 frames is selected, based on annotations by Lui [29]. Presumably, a smaller number of frames is sufficient [30]. Weizman: We evaluate our p...

Computation Strategies for Volume Local Binary Patterns applied to Action Recognition⇤

by F. Baumann, A. Ehlers, B. Rosenhahn, Jie Liao
"... Volume Local Binary Patterns are a well-known fea-ture type to describe object characteristics in the spatio-temporal domain. Apart from the computation of a binary pattern further steps are required to create a discrimina-tive feature. In this paper we propose different computation methods for Volu ..."
Abstract - Add to MetaCart
Volume Local Binary Patterns are a well-known fea-ture type to describe object characteristics in the spatio-temporal domain. Apart from the computation of a binary pattern further steps are required to create a discrimina-tive feature. In this paper we propose different computation methods for Volume Local Binary Patterns. These methods are evaluated in detail and the best strategy is shown. A Random Forest is used to find discriminative patterns. The proposed methods are applied to the well-known and pub-licly available KTH dataset and Weizman dataset for single-view action recognition and to the IXMAS dataset for multi-view action recognition. Furthermore, a comparison of the proposed framework to state-of-the-art methods is given. 1.
(Show Context)

Citation Context

...spectively each action is acted by 25 persons in 4 different scenarios: outdoors, outdoors with scale variations, outdoors with different clothes and indoors. There are totally 599 videos. Similar to =-=[16]-=-, a fixed position bounding box with a temporal window of 32 frames is selected, based on annotations by Lui [13]. Presumably, a smaller number of frames is sufficient [18]. Furthermore, the original ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University