• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Y.: Discriminative subvolume search for efficient action detection (0)

by J Yuan, Z Liu, Wu
Venue:In: CVPR. (2009
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 13
Next 10 →

Recognizing Realistic Actions from Videos “in the Wild”

by Jingen Liu, Jiebo Luo, Mubarak Shah
"... In this paper, we present a systematic framework for recognizing realistic actions from videos “in the wild. ” Such unconstrained videos are abundant in personal collections as well as on the web. Recognizing action from such videos has not been addressed extensively, primarily due to the tremendous ..."
Abstract - Cited by 47 (8 self) - Add to MetaCart
In this paper, we present a systematic framework for recognizing realistic actions from videos “in the wild. ” Such unconstrained videos are abundant in personal collections as well as on the web. Recognizing action from such videos has not been addressed extensively, primarily due to the tremendous variations that result from camera motion, background clutter, changes in object appearance, and scale, etc. The main challenge is how to extract reliable and informative features from the unconstrained videos. We extract both motion and static features from the videos. Since the raw features of both types are dense yet noisy, we propose strategies to prune these features. We use motion statistics to acquire stable motion features and clean static features. Furthermore, PageRank is used to mine the most informative static features. In order to further construct compact yet discriminative visual vocabularies, a divisive information-theoretic algorithm is employed to group semantically related features. Finally, AdaBoost is chosen to integrate all the heterogeneous yet complementary features for recognition. We have tested the framework on the KTH dataset and our own dataset consisting of 11 categories of actions collected from YouTube and personal videos, and have obtained impressive results for action recognition and action localization. 1.

Recognizing Actions by Shape-Motion Prototype Trees

by Zhe Lin, Zhuolin Jiang, Larry S. Davis
"... A prototype-based approach is introduced for action recognition. The approach represents an action as a sequence of prototypes for efficient and flexible action matching in long video sequences. During training, first, an action prototype tree is learned in a joint shape and motion space via hierarc ..."
Abstract - Cited by 24 (4 self) - Add to MetaCart
A prototype-based approach is introduced for action recognition. The approach represents an action as a sequence of prototypes for efficient and flexible action matching in long video sequences. During training, first, an action prototype tree is learned in a joint shape and motion space via hierarchical k-means clustering; then a lookup table of prototype-to-prototype distances is generated. During testing, based on a joint likelihood model of the actor location and action prototype, the actor is tracked while a frame-to-prototype correspondence is established by maximizing the joint likelihood, which is efficiently performed by searching the learned prototype tree; then actions are recognized using dynamic prototype sequence matching. Distance matrices used for sequence matching are rapidly obtained by look-up table indexing, which is an order of magnitude faster than brute-force computation of frame-to-frame distances. Our approach enables robust action matching in very challenging situations (such as moving cameras, dynamic backgrounds) and allows automatic alignment of action sequences. Experimental results demonstrate that our approach achieves recognition rates of 91.07 % on a large gesture dataset (with dynamic backgrounds), 100 % on the Weizmann action dataset and 95.77 % on the KTH action dataset. 1.

M.: Modeling the temporal extent of actions

by Scott Satkin, Martial Hebert , 2010
"... Abstract. In this paper, we present a framework for estimating what portions of videos are most discriminative for the task of action recognition. We explore the impact of the temporal cropping of training videos on the overall accuracy of an action recognition system, and we formalize what makes a ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
Abstract. In this paper, we present a framework for estimating what portions of videos are most discriminative for the task of action recognition. We explore the impact of the temporal cropping of training videos on the overall accuracy of an action recognition system, and we formalize what makes a set of croppings optimal. In addition, we present an algorithm to determine the best set of croppings for a dataset, and experimentally show that our approach increases the accuracy of various state-of-the-art action recognition techniques.

An Efficient Divide-and-Conquer Cascade for Nonlinear Object Detection

by Christoph H. Lampert
"... We introduce a method to accelerate the evaluation of object detection cascades with the help of a divide-andconquer procedure in the space of candidate regions. Compared to the exhaustive procedure that thus far is the stateof-the-art for cascade evaluation, the proposed method requires fewer evalu ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
We introduce a method to accelerate the evaluation of object detection cascades with the help of a divide-andconquer procedure in the space of candidate regions. Compared to the exhaustive procedure that thus far is the stateof-the-art for cascade evaluation, the proposed method requires fewer evaluations of the classifier functions, thereby speeding up the search. Furthermore, we show how the recently developed efficient subwindow search (ESS) procedure [11] can be integrated into the last stage of our method. This allows us to use our method to act not only as a faster procedure for cascade evaluation, but also as a tool to perform efficient branch-and-bound object detection with nonlinear quality functions, in particular kernelized support vector machines. Experiments on the PASCAL VOC 2006 dataset show an acceleration of more than 50% by our method compared to standard cascade evaluation. 1.

Human Focused Action Localization in Video

by Alexander Kläser, Marcin Marszałek, Cordelia Schmid, Andrew Zisserman , 2010
"... We propose a novel human-centric approach to detect and localize human actions in challenging video data, such as Hollywood movies. Our goal is to localize actions in time through the video and spatially in each frame. We achieve this by first obtaining generic spatiotemporal human tracks and then d ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
We propose a novel human-centric approach to detect and localize human actions in challenging video data, such as Hollywood movies. Our goal is to localize actions in time through the video and spatially in each frame. We achieve this by first obtaining generic spatiotemporal human tracks and then detecting specific actions within these using a sliding window classifier. We make the following contributions: (i) We show that splitting the action localization task into spatial and temporal search leads to an efficient localization algorithm where generic human tracks can be reused to recognize multiple human actions; (ii) We develop a human detector and tracker which is able to cope with a wide range of postures, articulations, motions and camera viewpoints. The tracker includes detection interpolation and a principled classification stage to suppress false positive tracks; (iii) We propose a track-aligned 3D-HOG action representation, investigate its parameters, and show that action localization benefits from using tracks; and (iv) We introduce a new action localization dataset based on Hollywood movies. Results are presented on a number of real-world movies with crowded, dynamic environment, partial occlusion and cluttered background. On the Coffee&Cigarettes dataset we significantly improve over the state of the art. Furthermore, we obtain excellent results on the new Hollywood–Localization dataset.

Measuring and Reducing Observational Latency when Recognizing Actions

by Syed Z. Masood, Chris Ellis, Adarsh Nagaraja Marshall, J. Laviola, Rahul Sukthankar
"... An important aspect in interactive, action-based interfaces is the latency in recognizing the action. High latency will cause the system’s feedback to lag behind user actions, reducing the overall quality of the user experience. This paper presents a novel dataset and algorithms for reducing the lat ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
An important aspect in interactive, action-based interfaces is the latency in recognizing the action. High latency will cause the system’s feedback to lag behind user actions, reducing the overall quality of the user experience. This paper presents a novel dataset and algorithms for reducing the latency in recognizing the action. Latency in classification is minimized with a classifier based on logistic regression that uses canonical poses to identify the action. The classifier is trained from the dataset using a learning formulation that makes it possible to train the classifier to reduce latency. The classifier is compared against both a Bag of Words and a Conditional Random Field classifier and is found to be superior in both pre-segmented and on-line classification tasks. 1.

Speeding up Spatio-Temporal Sliding-Window Search for Efficient Event Detection in Crowded Videos

by Junsong Yuan, Zicheng Liu, Ying Wu, Zhengyou Zhang , 2009
"... Despite previous successes of sliding window-based object detection in images such as [6], searching desired events in the volumetric video space is still a challenging problem, partially because the pattern search in spatio-temporal video space is much more complicated than that in spatial image sp ..."
Abstract - Add to MetaCart
Despite previous successes of sliding window-based object detection in images such as [6], searching desired events in the volumetric video space is still a challenging problem, partially because the pattern search in spatio-temporal video space is much more complicated than that in spatial image space. Without knowing the location, temporal duration, and the spatial scale of the event, the search space for video events is prohibitively large for exhaustive search. To reduce the search complexity, we propose a heuristic branch-andbound solution for event detection in videos. Unlike existing branch-and-bound method which searches for an optimal subvolume before comparing its detection score against the threshold, we aim at directly finding subvolumes whose scores are higher than the threshold. In doing so, many unnecessary branches are terminated much earlier, thus the search speed can be much faster. To validate this approach, we select three human action classes from the KTH dataset for training while testing with our own action dataset which has clutter and moving backgrounds as well as large variations in lighting, scale, and performing speed of actions. The experiment results show that our technique dramatically reduces computational cost without significantly degrading the quality of the detection results.

Multimodal Partial Estimates Fusion

by Jiang Xu, Junsong Yuan, Ying Wu
"... Fusing partial estimates is a critical and common problem in many computer vision tasks such as part-based detection and tracking. It generally becomes complicated and intractable when there are a large number of multimodal partial estimates, and thus it is desirable to find an effective and scalabl ..."
Abstract - Add to MetaCart
Fusing partial estimates is a critical and common problem in many computer vision tasks such as part-based detection and tracking. It generally becomes complicated and intractable when there are a large number of multimodal partial estimates, and thus it is desirable to find an effective and scalable fusion method to integrate these partial estimates. This paper presents a novel and effective approach to fusing multimodal partial estimates in a principled way. In this new approach, fusion is related to a computational geometry problem of finding the minimumvolume orthotope, and an effective and scalable branch and bound search algorithm is designed to obtain the global optimal solution. Experiments on tracking articulated objects and occluded objects show the effectiveness of the proposed approach. 1.

Submitted by:

by Pyry Matikainen, Yaser Sheikh, Ivan Laptev , 2011
"... ..."
Abstract - Add to MetaCart
Abstract not found

Feature Seeding for Action Recognition

by Pyry Matikainen, Rahul Sukthankar, Martial Hebert
"... Progress in action recognition has been in large part due to advances in the features that drive learning-based methods. However, the relative sparsity of training data and the risk of overfitting have made it difficult to directly search for good features. In this paper we suggest using synthetic d ..."
Abstract - Add to MetaCart
Progress in action recognition has been in large part due to advances in the features that drive learning-based methods. However, the relative sparsity of training data and the risk of overfitting have made it difficult to directly search for good features. In this paper we suggest using synthetic data to search for robust features that can more easily take advantage of limited data, rather than using the synthetic data directly as a substitute for real data. We demonstrate that the features discovered by our selection method, which we call seeding, improve performance on an action classification task on real data, even though the synthetic data from which the features are seeded differs significantly from the real data, both in terms of appearance and the set of action classes. 1.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University