Results 1 - 10
of
701
Machine recognition of human activities: A survey
, 2008
"... The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the a ..."
Abstract
-
Cited by 218 (0 self)
- Add to MetaCart
(Show Context)
The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the activities occurring in the video. The analysis of human activities in videos is an area with increasingly important consequences from security and surveillance to entertainment and personal archiving. Several challenges at various levels of processing—robustness against errors in low-level processing, view and rate-invariant representations at midlevel processing and semantic representation of human activities at higher level processing—make this problem hard to solve. In this review paper, we present a comprehensive survey of efforts in the past couple of decades to address the problems of representation, recognition, and learning of human activities from video and related applications. We discuss the problem at two major levels of complexity: 1) “actions ” and 2) “activities. ” “Actions ” are characterized by simple motion patterns typically executed by a single human. “Activities ” are more complex and involve coordinated actions among a small number of humans. We will discuss several approaches and classify them according to their ability to handle varying degrees of complexity as interpreted above. We begin with a discussion of approaches to model the simplest of action classes known as atomic or primitive actions that do not require sophisticated dynamical modeling. Then, methods to model actions with more complex dynamics are discussed. The discussion then leads naturally to methods for higher level representation of complex activities.
Visual Tracking Decomposition
- in CVPR
, 2010
"... We propose a novel tracking algorithm that can work robustly in a challenging scenario such that several kinds of appearance and motion changes of an object occur at the same time. Our algorithm is based on a visual tracking decomposition scheme for the efficient design of observation and motion mod ..."
Abstract
-
Cited by 120 (5 self)
- Add to MetaCart
(Show Context)
We propose a novel tracking algorithm that can work robustly in a challenging scenario such that several kinds of appearance and motion changes of an object occur at the same time. Our algorithm is based on a visual tracking decomposition scheme for the efficient design of observation and motion models as well as trackers. In our scheme, the observation model is decomposed into multiple basic observation models that are constructed by sparse principal component analysis (SPCA) of a set of feature templates. Each basic observation model covers a specific appearance of the object. The motion model is also represented by the combination of multiple basic motion models, each of which covers a different type of motion. Then the multiple basic trackers are designed by associating the basic observation models and the basic motion models, so that each specific tracker takes charge of a certain change in the object. All basic trackers are then integrated into one compound tracker through an interactive Markov Chain Monte Carlo (IMCMC) framework in which the basic trackers communicate with one another interactively while run in parallel. By exchanging information with others, each tracker further improves its performance, which results in increasing the whole performance of tracking. Experimental results show that our method tracks the object accurately and reliably in realistic videos where the appearance and motion are drastically changing over time. 1.
Combining multiple depth cameras and projectors for interactions on, above and between surfaces
- In Proc. UIST
"... Figure 1: LightSpace prototype combines depth cameras and projectors to provide interactivity on and between surfaces in everyday environments. LightSpace interactions include through-body object transitions between existing interactive surfaces (a-b) and interactions with an object in hand (c-d). I ..."
Abstract
-
Cited by 79 (6 self)
- Add to MetaCart
(Show Context)
Figure 1: LightSpace prototype combines depth cameras and projectors to provide interactivity on and between surfaces in everyday environments. LightSpace interactions include through-body object transitions between existing interactive surfaces (a-b) and interactions with an object in hand (c-d). Images (a) and (c) show real images of the user experience, while images (b) and (d) show the virtual 3D mesh representation used to reason about users and interactions in space. Instrumented with multiple depth cameras and projectors, LightSpace is a small room installation designed to explore a variety of interactions and computational strategies related to interactive displays and the space that they inhabit. LightSpace cameras and projectors are calibrated to 3D real world coordinates, allowing for projection of graphics correctly onto any surface visible by both camera and projector. Selective projection of the depth camera data enables emulation of interactive displays on un-instrumented surfaces (such as a standard table or office desk), as well as facilitates mid-air interactions between and around these displays. For example, after performing multi-touch interactions on a virtual object on the tabletop, the user may transfer the object to another display by simultaneously touching the object and the destination display. Or the user may “pick up ” the object by sweeping it into their hand, see it sitting in their hand as they walk over to an interactive wall display, and “drop ” the object onto the wall by touching it with their other hand. We detail the interactions and algorithms unique to LightSpace, discuss some initial observations of use and suggest future directions. ACM Classification: H5.2 [Information interfaces and
Online Object Tracking: A Benchmark
"... Object tracking is one of the most important components in numerous applications of computer vision. While much progress has been made in recent years with efforts on sharing code and datasets, it is of great importance to develop a library and benchmark to gauge the state of the art. After briefly ..."
Abstract
-
Cited by 77 (5 self)
- Add to MetaCart
(Show Context)
Object tracking is one of the most important components in numerous applications of computer vision. While much progress has been made in recent years with efforts on sharing code and datasets, it is of great importance to develop a library and benchmark to gauge the state of the art. After briefly reviewing recent advances of online object tracking, we carry out large scale experiments with various evaluation criteria to understand how these algorithms perform. The test image sequences are annotated with different attributes for performance evaluation and analysis. By analyzing quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field. 1.
Robust Visual Tracking and Vehicle Classification via Sparse Representation
"... In this paper, we propose a robust visual tracking method by casting tracking as a sparse approximation problem in a particle filter framework. In this framework, occlusion, noise and other challenging issues are addressed seamlessly through a set of trivial templates. Specifically, to find the trac ..."
Abstract
-
Cited by 71 (6 self)
- Add to MetaCart
In this paper, we propose a robust visual tracking method by casting tracking as a sparse approximation problem in a particle filter framework. In this framework, occlusion, noise and other challenging issues are addressed seamlessly through a set of trivial templates. Specifically, to find the tracking target in a new frame, each target candidate is sparsely represented in the space spanned by target templates and trivial templates. The sparsity is achieved by solving an ℓ1-regularized least squares problem. Then the candidate with the smallest projection error is taken as the tracking target. After that, tracking is continued using a Bayesian state inference framework. Two strategies are used to further improve the tracking performance. First, target templates are dynamically updated to capture appearance changes. Second, nonnegativity constraints are enforced to filter out clutters which negatively resemble tracking targets. We test the proposed approach on numerous sequences involving different types of challenges including occlusion and variations in illumination, scale, and pose. The proposed approach demonstrates excellent performance in comparison with previously proposed trackers. We also extend the method for simultaneous tracking and recognition by introducing a static template set, which stores target images from different classes. The recognition result at each frame is propagated to produce the final result for the whole video. The approach is validated on a vehicle tracking and classification task using outdoor infrared video sequences.
A Survey of Vision-Based Trajectory Learning and Analysis for Surveillance
"... Abstract—This paper presents a survey of trajectory-based activity analysis for visual surveillance. It describes techniques that use trajectory data to define a general set of activities that are applicable to a wide range of scenes and environments. Events of interest are detected by building a ge ..."
Abstract
-
Cited by 57 (11 self)
- Add to MetaCart
(Show Context)
Abstract—This paper presents a survey of trajectory-based activity analysis for visual surveillance. It describes techniques that use trajectory data to define a general set of activities that are applicable to a wide range of scenes and environments. Events of interest are detected by building a generic topographical scene description from underlying motion structure as observed over time. The scene topology is automatically learned and is distinguished by points of interest and motion characterized by activity paths. The methods we review are intended for real-time surveillance through definition of a diverse set of events for further analysis triggering, including virtual fencing, speed profiling, behavior classification, anomaly detection, and object interaction. Index Terms—Event detection, motion analysis, situational awareness, statistical learning. Fig. 1. Relationship between analysis levels and required knowledge: highlevel activity analysis requires large amounts of domain knowledge while lowlevel analysis assumes very little. I.
Context-aware visual tracking
- IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2009
"... Enormous uncertainties in unconstrained environments lead to a fundamental dilemma that many tracking algorithms have to face in practice: Tracking has to be computationally efficient, but verifying whether or not the tracker is following the true target tends to be demanding, especially when the ba ..."
Abstract
-
Cited by 55 (8 self)
- Add to MetaCart
(Show Context)
Enormous uncertainties in unconstrained environments lead to a fundamental dilemma that many tracking algorithms have to face in practice: Tracking has to be computationally efficient, but verifying whether or not the tracker is following the true target tends to be demanding, especially when the background is cluttered and/or when occlusion occurs. Due to the lack of a good solution to this problem, many existing methods tend to be either effective but computationally intensive by using sophisticated image observation models or efficient but vulnerable to false alarms. This greatly challenges long-duration robust tracking. This paper presents a novel solution to this dilemma by considering the context of the tracking scene. Specifically, we integrate into the tracking process a set of auxiliary objects that are automatically discovered in the video on the fly by data mining. Auxiliary objects have three properties, at least in a short time interval: 1) persistent co-occurrence with the target, 2) consistent motion correlation to the target, and 3) easy to track. Regarding these auxiliary objects as the context of the target, the collaborative tracking of these auxiliary objects leads to efficient computation as well as strong verification. Our extensive experiments have exhibited exciting performance in very challenging realworld testing cases.
Video Browsing by Direct Manipulation
"... We present a method for browsing videos by directly dragging their content. This method brings the benefits of direct manipulation to an activity typically mediated by widgets. We support this new type of interactivity by: 1) automatically extracting motion data from videos; and 2) a new technique c ..."
Abstract
-
Cited by 54 (7 self)
- Add to MetaCart
(Show Context)
We present a method for browsing videos by directly dragging their content. This method brings the benefits of direct manipulation to an activity typically mediated by widgets. We support this new type of interactivity by: 1) automatically extracting motion data from videos; and 2) a new technique called relative flow dragging that lets users control video playback by moving objects of interest along their visual trajectory. We show that this method can outperform the traditional seeker bar in video browsing tasks that focus on visual content rather than time. ACM Classification: H5.2 [Information interfaces and presentation]: User Interfaces.- Graphical user interfaces.
Tracking Multiple Occluding People by Localizing on Multiple Scene Planes
"... Abstract—Occlusion and lack of visibility in crowded and cluttered scenes make it difficult to track individual people correctly and consistently, particularly in a single view. We present a multiview approach to solve this problem. In our approach, we neither detect nor track objects from any singl ..."
Abstract
-
Cited by 54 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Occlusion and lack of visibility in crowded and cluttered scenes make it difficult to track individual people correctly and consistently, particularly in a single view. We present a multiview approach to solve this problem. In our approach, we neither detect nor track objects from any single camera or camera pair; rather, evidence is gathered from all of the cameras into a synergistic framework and detection and tracking results are propagated back to each view. Unlike other multiview approaches that require fully calibrated views, our approach is purely image-based and uses only 2D constructs. To this end, we develop a planar homographic occupancy constraint that fuses foreground likelihood information from multiple views to resolve occlusions and localize people on a reference scene plane. For greater robustness, this process is extended to multiple planes parallel to the reference plane in the framework of plane to plane homologies. Our fusion methodology also models scene clutter using the Schmieder and Weathersby clutter measure, which acts as a confidence prior, to assign higher fusion weight to views with lesser clutter. Detection and tracking are performed simultaneously by graph cuts segmentation of tracks in the space-time occupancy likelihood data. Experimental results with detailed qualitative and quantitative analysis are demonstrated in challenging multiview crowded scenes. Index Terms—Tracking, sensor fusion, graph-theoretic methods. Ç 1