• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Actionness ranking with lattice conditional ordinal random fields (0)

by W Chen, C Xiong, R Xu, J Corso
Venue:CVPR
Add To MetaCart

Tools

Sorted by:
Results 1 - 8 of 8

Binarized normed gradients for objectness estimation at 300fps

by Ming-ming Cheng, Ziming Zhang, Wen-yan Lin, Philip Torr - in IEEE CVPR , 2014
"... Training a generic objectness measure to produce a small set of candidate object windows, has been shown to speed up the classical sliding window object detection paradigm. We observe that generic objects with well-defined closed boundary can be discriminated by looking at the norm of gradients, wit ..."
Abstract - Cited by 25 (6 self) - Add to MetaCart
Training a generic objectness measure to produce a small set of candidate object windows, has been shown to speed up the classical sliding window object detection paradigm. We observe that generic objects with well-defined closed boundary can be discriminated by looking at the norm of gradients, with a suitable resizing of their cor-responding image windows in to a small fixed size. Based on this observation and computational reasons, we propose to resize the window to 8 × 8 and use the norm of the gra-dients as a simple 64D feature to describe it, for explicitly training a generic objectness measure. We further show how the binarized version of this fea-ture, namely binarized normed gradients (BING), can be used for efficient objectness estimation, which requires only a few atomic operations (e.g. ADD, BITWISE SHIFT, etc.). Experiments on the challenging PASCAL VOC 2007 dataset show that our method efficiently (300fps on a single lap-top CPU) generates a small set of category-independent, high quality object windows, yielding 96.2 % object detec-tion rate (DR) with 1,000 proposals. Increasing the num-bers of proposals and color spaces for computing BING fea-tures, our performance can be further improved to 99.5% DR. 1.
(Show Context)

Citation Context

...ng very simple BING features. It would be interesting to introduce other additional cues to further reduce the number of proposals while maintaining high detection rate, and explore more applications =-=[9]-=- using BING. To encourage future works, we make the source code, links to related methods, FAQs, and live discussions available in the project page: http://mmcheng.net/bing/. Acknowledges: We acknowle...

Fast action proposals for human action detection and search

by Gang Yu, Junsong Yuan - In CVPR , 2015
"... In this paper we target at generating generic action pro-posals in unconstrained videos. Each action proposal cor-responds to a temporal series of spatial bounding boxes, i.e., a spatio-temporal video tube, which has a good poten-tial to locate one human action. Assuming each action is performed by ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
In this paper we target at generating generic action pro-posals in unconstrained videos. Each action proposal cor-responds to a temporal series of spatial bounding boxes, i.e., a spatio-temporal video tube, which has a good poten-tial to locate one human action. Assuming each action is performed by a human with meaningful motion, both ap-pearance and motion cues are utilized to measure the ac-tionness of the video tubes. After picking those spatiotem-poral paths of high actionness scores, our action proposal generation is formulated as a maximum set coverage prob-lem, where greedy search is performed to select a set of action proposals that can maximize the overall actionness score. Compared with existing action proposal approaches, our action proposals do not rely on video segmentation and can be generated in nearly real-time. Experimental results on two challenging datasets, MSRII and UCF 101, validate the superior performance of our action proposals as well as competitive results on action detection and search. 1.
(Show Context)

Citation Context

...vely accurate video segmentation [37, 38], which itself is a challenging problem. Moreover, it is difficult to efficiently and accurately segment the human action from the clutter video sequences. In =-=[34]-=-, “actionness” is measured based on lattice conditional ordinal random fields. However, it does not address the action localization problem. In this paper, we present to formulate the action proposal ...

Can Humans Fly? Action Understanding with Multiple Classes of Actors

by Chenliang Xu, Shao-hang Hsieh, Caiming Xiong, Jason J. Corso
"... Can humans fly? Emphatically no. Can cars eat? Again, absolutely not. Yet, these absurd inferences result from the current disregard for particular types of actors in action understanding. There is no work we know of on simulta-neously inferring actors and actions in the video, not to mention a data ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Can humans fly? Emphatically no. Can cars eat? Again, absolutely not. Yet, these absurd inferences result from the current disregard for particular types of actors in action understanding. There is no work we know of on simulta-neously inferring actors and actions in the video, not to mention a dataset to experiment with. Our paper hence marks the first effort in the computer vision community to jointly consider various types of actors undergoing various actions. To start with the problem, we collect a dataset of 3782 videos from YouTube and label both pixel-level actors and actions in each video. We formulate the general actor-action understanding problem and instantiate it at vari-ous granularities: both video-level single- and multiple-label actor-action recognition and pixel-level actor-action semantic segmentation. Our experiments demonstrate that inference jointly over actors and actions outperforms infer-ence independently over them, and hence concludes our ar-gument of the value of explicit consideration of various ac-tors in comprehensive action understanding. 1.
(Show Context)

Citation Context

...on is limited. The community has indeed begun to move beyond this simplified problem into action detection [56, 65], action localization [22, 39], action segmentation [23, 24], and actionness ranking =-=[8]-=-. But, all of these works do so strictly in the context of human actors. In this paper, we overcome both of these narrow viewpoints and introduce a new level of generality to the action understanding ...

Joint action recognition and pose estimation from video

by Bruce Xiaohan Nie, Caiming Xiong, Song-chun Zhu - in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition , 2015
"... Action recognition and pose estimation from video are closely related tasks for understanding human motion, most methods, however, learn separate models and combine them sequentially. In this paper, we propose a framework to in-tegrate training and testing of the two tasks. A spatial-temporal And-Or ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Action recognition and pose estimation from video are closely related tasks for understanding human motion, most methods, however, learn separate models and combine them sequentially. In this paper, we propose a framework to in-tegrate training and testing of the two tasks. A spatial-temporal And-Or graph model is introduced to represent ac-tion at three scales. Specifically the action is decomposed into poses which are further divided to mid-level ST-parts and then parts. The hierarchical structure of our model captures the geometric and appearance variations of pose at each frame and lateral connections between ST-parts at adjacent frames capture the action-specific motion informa-tion. The model parameters for three scales are learned dis-criminatively, and action labels and poses are efficiently in-ferred by dynamic programming. Experiments demonstrate that our approach achieves state-of-art accuracy in action recognition while also improving pose estimation.
(Show Context)

Citation Context

...laussible poses in space and time [7]. Many methods for action recognition bypass body poses and achieve promising results by using coarse/mid-level features for action classification on some datasets=-=[6, 10, 26, 12, 18, 2, 33, 30]-=-. In this paper, we will jointly train coarse/mid-level features with pose estimation so that these features are better aligned with body parts and improve the results. The prevailing methods for pose...

Human Action Segmentation with Hierarchical Supervoxel Consistency

by Jiasen Lu, Ran Xu, Jason J. Corso
"... Detailed analysis of human action, such as action classi-fication, detection and localization has received increasing attention from the community; datasets like JHMDB have made it plausible to conduct studies analyzing the impact that such deeper information has on the greater action un-derstanding ..."
Abstract - Add to MetaCart
Detailed analysis of human action, such as action classi-fication, detection and localization has received increasing attention from the community; datasets like JHMDB have made it plausible to conduct studies analyzing the impact that such deeper information has on the greater action un-derstanding problem. However, detailed automatic segmen-tation of human action has comparatively been unexplored. In this paper, we take a step in that direction and pro-pose a hierarchical MRF model to bridge low-level video fragments with high-level human motion and appearance; novel higher-order potentials connect different levels of the supervoxel hierarchy to enforce the consistency of the hu-man segmentation by pulling from different segment-scales. Our single layer model significantly outperforms the cur-rent state-of-the-art on actionness, and our full model im-proves upon the single layer baselines in action segmenta-tion. 1.
(Show Context)

Citation Context

...n saliency representation that is able to account for camera motion and balance human motion and human appearance cues automatically. A similar concept called “actionness” was proposed by Chen et al. =-=[5]-=-: it produces a rank ordering of video regions according to the degree to which they contain an action, but the regions to be ranked are small 3D cuboid-volumes (see Fig. 2f) and the ground-truth is a...

Action Detection by Implicit Intentional Motion Clustering

by Wei Chen, Jason J. Corso
"... Explicitly using human detection and pose estimation has found limited success in action recognition problems. This may be due to the complexity in the articulated mo-tion human exhibit. Yet, we know that action requires an actor and intention. This paper hence seeks to understand the spatiotemporal ..."
Abstract - Add to MetaCart
Explicitly using human detection and pose estimation has found limited success in action recognition problems. This may be due to the complexity in the articulated mo-tion human exhibit. Yet, we know that action requires an actor and intention. This paper hence seeks to understand the spatiotemporal properties of intentional movement and how to capture such intentional movement without relying on challenging human detection and tracking. We conduct a quantitative analysis of intentional movement, and our find-ings motivate a new approach for implicit intentional move-ment extraction that is based on spatiotemporal trajectory clustering by leveraging the properties of intentional move-ment. The intentional movement clusters are then used as action proposals for detection. Our results on three action detection benchmarks indicate the relevance of focusing on intentional movement for action detection; our method sig-nificantly outperforms the state of the art on the challenging MSR-II multi-action video benchmark. 1.

Action Localization in Videos through Context Walk

by Khurram Soomro, Haroon Idrees, Mubarak Shah
"... This paper presents an efficient approach for localizing actions by learning contextual relations, in the form of rel-ative locations between different video regions. We begin by over-segmenting the videos into supervoxels, which have the ability to preserve action boundaries and also reduce the com ..."
Abstract - Add to MetaCart
This paper presents an efficient approach for localizing actions by learning contextual relations, in the form of rel-ative locations between different video regions. We begin by over-segmenting the videos into supervoxels, which have the ability to preserve action boundaries and also reduce the complexity of the problem. Context relations are learned during training which capture displacements from all the supervoxels in a video to those belonging to foreground ac-tions. Then, given a testing video, we select a supervoxel randomly and use the context information acquired during training to estimate the probability of each supervoxel be-longing to the foreground action. The walk proceeds to a new supervoxel and the process is repeated for a few steps. This “context walk ” generates a conditional distribution of an action over all the supervoxels. A Conditional Ran-dom Field is then used to find action proposals in the video, whose confidences are obtained using SVMs. We validated the proposed approach on several datasets and show that context in the form of relative displacements between su-pervoxels can be extremely useful for action localization. This also results in significantly fewer evaluations of the classifier, in sharp contrast to the alternate sliding window approaches. 1.
(Show Context)

Citation Context

... as a new problem where the goal is to determine the location of an action in addition to its class. Action detection, which may refer to temporal detection [16] or spatiotemporal action localization =-=[6, 23, 4, 13]-=-, is especially difC o m p o si tesG ra p hs( ) (a) Training Videossfor Action c …… Video n (b) Context Graphs (c) Supervoxel Action Specificity Video 1 G1 ( V1, E1 ) Gn ( Vn , En ) H … … Figure 1....

Compositional Structure Learning for Action Understanding

by unknown authors , 2014
"... The focus of the action understanding literature has predominately been classification, how-ever, there are many applications demanding richer action understanding such as mobile robotics and video search, with solutions to classification, localization and detection. In this paper, we propose a comp ..."
Abstract - Add to MetaCart
The focus of the action understanding literature has predominately been classification, how-ever, there are many applications demanding richer action understanding such as mobile robotics and video search, with solutions to classification, localization and detection. In this paper, we propose a compositional model that leverages a new mid-level representation called composi-tional trajectories and a locally articulated spatiotemporal deformable parts model (LALSDPM) for fully action understanding. Our methods is advantageous in capturing the variable struc-ture of dynamic human activity over a long range. First, the compositional trajectories capture long-ranging, frequently co-occurring groups of trajectories in space time and represent them in discriminative hierarchies, where human motion is largely separated from camera motion; second, LASTDPM learns a structured model with multi-layer deformable parts to capture multiple levels of articulated motion. We implement our methods and demonstrate state of the art performance on all three problems: action detection, localization, and recognition. 1
(Show Context)

Citation Context

...ove limitations by clustering trajectories. But, in their model the location of the structures is fixed before learning, therefore limiting the generality of the approach. As discussed by Chen et al. =-=[4]-=-, motion in a video can occur in various forms such as agent (human/animal) moving, camera panning or jittering, background object moving, among many others. We are particularly interested in human ac...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University