Results 1 - 10
of
97
Object Tracking: A Survey
, 2006
"... The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends. Object tracking, in general, is a challenging problem. Difficulties in tracking objects can arise due to abrupt object motion, changing appearance patterns o ..."
Abstract
-
Cited by 131 (3 self)
- Add to MetaCart
The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends. Object tracking, in general, is a challenging problem. Difficulties in tracking objects can arise due to abrupt object motion, changing appearance patterns of both the object and the scene, nonrigid object structures, object-to-object and object-to-scene occlusions, and camera motion. Tracking is usually performed in the context of higher-level applications that require the location and/or shape of the object in every frame. Typically, assumptions are made to constrain the tracking problem in the context of a particular application. In this survey, we categorize the tracking methods on the basis of the object and motion representations used, provide detailed descriptions of representative methods in each category, and examine their pros and cons. Moreover, we discuss the important issues related to tracking including the use of appropriate image features, selection of motion models, and detection of objects.
A survey on visual surveillance of object motion and behaviors
- IEEE Transactions on Systems, Man and Cybernetics
, 2004
"... Abstract—Visual surveillance in dynamic scenes, especially for humans and vehicles, is currently one of the most active research topics in computer vision. It has a wide spectrum of promising applications, including access control in special areas, human identification at a distance, crowd flux stat ..."
Abstract
-
Cited by 123 (2 self)
- Add to MetaCart
Abstract—Visual surveillance in dynamic scenes, especially for humans and vehicles, is currently one of the most active research topics in computer vision. It has a wide spectrum of promising applications, including access control in special areas, human identification at a distance, crowd flux statistics and congestion analysis, detection of anomalous behaviors, and interactive surveillance using multiple cameras, etc. In general, the processing framework of visual surveillance in dynamic scenes includes the following stages: modeling of environments, detection of motion, classification of moving objects, tracking, understanding and description of behaviors, human identification, and fusion of data from multiple cameras. We review recent developments and general strategies of all these stages. Finally, we analyze possible research directions, e.g., occlusion handling, a combination of twoand three-dimensional tracking, a combination of motion analysis and biometrics, anomaly detection and behavior prediction, content-based retrieval of surveillance videos, behavior understanding and natural language description, fusion of information from multiple sensors, and remote surveillance. Index Terms—Behavior understanding and description, fusion of data from multiple cameras, motion detection, personal identification, tracking, visual surveillance.
Counting people in crowds with a real-time network of simple image sensors
- In Proc. IEEE International Conference on Computer Vision
, 2003
"... Estimating the number of people in a crowded environment is a central task in civilian surveillance. Most vision-based counting techniques depend on detecting individuals in order to count, an unrealistic proposition in crowded settings. We propose an alternative approach that directly estimates the ..."
Abstract
-
Cited by 44 (1 self)
- Add to MetaCart
Estimating the number of people in a crowded environment is a central task in civilian surveillance. Most vision-based counting techniques depend on detecting individuals in order to count, an unrealistic proposition in crowded settings. We propose an alternative approach that directly estimates the number of people. In our system, groups of image sensors segment foreground objects from the background, aggregate the resulting silhouettes over a network, and compute a planar projection of the scene’s visual hull. We introduce a geometric algorithm that calculates bounds on the number of persons in each region of the projection, after phantom regions have been eliminated. The computational requirements scale well with the number of sensors and the number of people, and only limited amounts of data are transmitted over the network. Because of these properties, our system runs in real-time and can be deployed as an untethered wireless sensor network. We describe the major components of our system, and report preliminary experiments with our first prototype implementation. 1.
A system for learning statistical motion patterns
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2006
"... permission from the publisher. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of th ..."
Abstract
-
Cited by 42 (0 self)
- Add to MetaCart
permission from the publisher. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. © 2006 IEEE. Copyright and all rights therein are retained by authors or by other copyright holders. All persons downloading this information are expected to adhere to the terms and constraints invoked by copyright. This document or any part thereof may not be reposted without the explicit permission of the copyright holder. Citation for this copy:
P.: Multi-camera people tracking with a probabilistic occupancy map
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2007
"... Given three or four synchronized videos taken at eye level and from different angles, we show that we can effectively combine a generative model with dynamic programming to accurately follow up to six individuals across thousands of frames in spite of significant occlusions and lighting changes. In ..."
Abstract
-
Cited by 38 (5 self)
- Add to MetaCart
Given three or four synchronized videos taken at eye level and from different angles, we show that we can effectively combine a generative model with dynamic programming to accurately follow up to six individuals across thousands of frames in spite of significant occlusions and lighting changes. In addition, we also derive metrically accurate trajectories for each one of them. Our contribution is twofold. First, we demonstrate that our generative model can effectively handle occlusions in each time frame independently, even when the only data available comes from the output of a simple background subtraction algorithm and when the number of individuals is unknown a priori. Second, we show that multi-person tracking can be reliably achieved by processing individual trajectories separately over long sequences, provided that a reasonable heuristic is used to rank these individuals and avoid confusing them with one another. Figure 1: Images from two indoor and two outdoor multi-camera video sequences we use for our experiments. At each time step, we draw a box around people we detect and assign to them an Id number that follows them throughout the sequence. 1
Continuous tracking within and across camera streams
- IEEE Int’l Conf. on Computer Vision and Pattern Recognition
, 2003
"... This paper presents a new approach for continuous tracking of moving objects observed by multiple, heterogeneous cameras. Our approach simultaneously processes video streams from stationary and Pan-Tilt-Zoom cameras. The detection of moving objects from moving camera streams is performed by defining ..."
Abstract
-
Cited by 37 (7 self)
- Add to MetaCart
This paper presents a new approach for continuous tracking of moving objects observed by multiple, heterogeneous cameras. Our approach simultaneously processes video streams from stationary and Pan-Tilt-Zoom cameras. The detection of moving objects from moving camera streams is performed by defining an adaptive background model that takes into account the camera motion approximated by an affine transformation. We address the tracking problem by separately modeling motion and appearance of the moving objects using two probabilistic models. For the appearance model, multiple color distribution components are proposed for ensuring a more detailed description of the object being tracked. The motion model is obtained using a Kalman Filter (KF) process, which predicts the position of the moving object. The tracking is performed by the maximization of a joint probability model. The novelty of our approach consists in modeling the multiple trajectories observed by the moving and stationary cameras in the same KF framework. It allows deriving a more accurate motion measurement for objects simultaneously viewed by the two cameras and an automatic handling of occlusions, errors in the detection and camera handoff. We demonstrate the performances of the system on several video surveillance sequences. 1.
Multiple person and speaker activity tracking with a particle filter
- In International Conference on Acoustics Speech and Signal Processing
, 2004
"... In this paper, we present a system that combines sound and vision to track multiple people. In a cluttered or noisy scene multi-person tracking estimates have a distinctly non-Gaussian distribution. We apply a particle filter with audio and video state components, and derive observation likelihoods ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
In this paper, we present a system that combines sound and vision to track multiple people. In a cluttered or noisy scene multi-person tracking estimates have a distinctly non-Gaussian distribution. We apply a particle filter with audio and video state components, and derive observation likelihoods based on both audio and video measurements. Our state includes the number of people present, their positions, and whether each person is talking. We show experiments in an environment with sparse microphones and monocular cameras. Our results show that our system can accurately track the locations and speech activity of a varying number of people. 1.
Visibility Analysis and Sensor Planning in Dynamic Environments
- IN EUROPEAN CONFERENCE ON COMPUTER VISION
, 2004
"... We analyze visibility from static sensors in a dynamic scene with moving obstacles (people). Such analysis is considered in a probabilistic sense in the context of multiple sensors, so that visibility from even one sensor might be sufficient. Additionally, we analyze worst-case scenarios for high ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
We analyze visibility from static sensors in a dynamic scene with moving obstacles (people). Such analysis is considered in a probabilistic sense in the context of multiple sensors, so that visibility from even one sensor might be sufficient. Additionally, we analyze worst-case scenarios for high-security areas where targets are non-cooperative. Such visibility analysis provides important performance characterization of multi-camera systems. Furthermore, maximization of visibility in a given region of interest yields the optimum number and placement of cameras in the scene. Our analysis has applications in surveillance - manual or automated - and can be utilized for sensor planning in places like museums, shopping malls, subway stations and parking lots. We present several example scenes - simulated and real - for which interesting camera configurations were obtained using the formal analysis developed in the paper.
Multi-camera tracking and segmentation of occluded people on ground plane using search-guided particle filtering
- In ECCV
, 2006
"... Abstract. A multi-view multi-hypothesis approach to segmenting and tracking multiple (possibly occluded) persons on a ground plane is proposed. During tracking, several iterations of segmentation are performed using information from human appearance models and ground plane homography. To more precis ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
Abstract. A multi-view multi-hypothesis approach to segmenting and tracking multiple (possibly occluded) persons on a ground plane is proposed. During tracking, several iterations of segmentation are performed using information from human appearance models and ground plane homography. To more precisely locate the ground location of a person, all center vertical axes of the person across views are mapped to the topview plane and their intersection point on the ground is estimated. To tackle the explosive state space due to multiple targets and views, iterative segmentation-searching is incorporated into a particle filtering framework. By searching for people’s ground point locations from segmentations, a set of a few good particles can be identified, resulting in low computational cost. In addition, even if all the particles are away from the true ground point, some of them move towards the true one through the iterated process as long as they are located nearby. We demonstrate the performance of the approach on several video sequences. 1
A probabilistic framework for multi-modal multi-person tracking
- In Wkshp. on Multi-Object Tracking
, 2003
"... In this paper, we present a probabilistic tracking framework that combines sound and vision to achieve more robust and accurate tracking of multiple objects. In a cluttered or noisy scene, our measurements have a non-Gaussian, multimodal distribution. We apply a particle filter to track multiple peo ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
In this paper, we present a probabilistic tracking framework that combines sound and vision to achieve more robust and accurate tracking of multiple objects. In a cluttered or noisy scene, our measurements have a non-Gaussian, multimodal distribution. We apply a particle filter to track multiple people using combined audio and video observations. We have applied our algorithm to the domain of tracking people with a stereo-based visual foreground detection algorithm and audio localization using a beamforming technique. Our model also accurately reflects the number of people present. We test the efficacy of our system on a sequence of multiple people moving and speaking in an indoor environment. 1.

