Results 1 - 10
of
110
You’ll never walk alone: modeling social behavior for multi-target tracking
- IN INT. CONF. ON COMPUTER VISION (ICCV
, 2009
"... Object tracking typically relies on a dynamic model to predict the object’s location from its past trajectory. In crowded scenarios a strong dynamic model is particularly important, because more accurate predictions allow for smaller search regions, which greatly simplifies data asso-ciation. Tradit ..."
Abstract
-
Cited by 120 (3 self)
- Add to MetaCart
Object tracking typically relies on a dynamic model to predict the object’s location from its past trajectory. In crowded scenarios a strong dynamic model is particularly important, because more accurate predictions allow for smaller search regions, which greatly simplifies data asso-ciation. Traditional dynamic models predict the location for each target solely based on its own history, without tak-ing into account the remaining scene objects. Collisions are resolved only when they happen. Such an approach ignores important aspects of human behavior: people are driven by their future destination, take into account their environment, anticipate collisions, and adjust their trajec-tories at an early stage in order to avoid them. In this work, we introduce a model of dynamic social behavior, inspired by models developed for crowd simulation. The model is trained with videos recorded from birds-eye view at busy locations, and applied as a motion model for multi-people tracking from a vehicle-mounted camera. Experiments on real sequences show that accounting for social interactions and scene knowledge improves tracking performance, espe-cially during occlusions.
Online Multi-Person Trackingby-Detection from a Single, Uncalibrated Camera
- PAMI
, 2010
"... In this paper, we address the problem of automatically detecting and tracking a variable number of persons in complex scenes using a monocular, potentially moving, uncalibrated camera. We propose a novel approach for multi-person tracking-by-detection in a particle filtering framework. In addition ..."
Abstract
-
Cited by 78 (0 self)
- Add to MetaCart
In this paper, we address the problem of automatically detecting and tracking a variable number of persons in complex scenes using a monocular, potentially moving, uncalibrated camera. We propose a novel approach for multi-person tracking-by-detection in a particle filtering framework. In addition to final high-confidence detections, our algorithm uses the continuous confidence of pedestrian detectors and online trained, instance-specific classifiers as a graded observation model. Thus, generic object category knowledge is complemented by instance-specific information. The main contribution of this paper is to explore how these unreliable information sources can be used for robust multi-person tracking. The algorithm detects and tracks a large number of dynamically moving persons in complex scenes with occlusions, does not rely on background modeling, requires no camera or ground plane calibration, and only makes use of information from the past. Hence, it imposes very few restrictions and is suitable for online applications. Our experiments show that the method yields good tracking performance in a large variety of highly dynamic scenarios, such as typical surveillance videos, webcam footage, or sports sequences. We demonstrate that our algorithm outperforms other methods that rely on additional information. Furthermore, we analyze the influence of different algorithm components on the robustness.
Robust Visual Tracking and Vehicle Classification via Sparse Representation
"... In this paper, we propose a robust visual tracking method by casting tracking as a sparse approximation problem in a particle filter framework. In this framework, occlusion, noise and other challenging issues are addressed seamlessly through a set of trivial templates. Specifically, to find the trac ..."
Abstract
-
Cited by 71 (6 self)
- Add to MetaCart
In this paper, we propose a robust visual tracking method by casting tracking as a sparse approximation problem in a particle filter framework. In this framework, occlusion, noise and other challenging issues are addressed seamlessly through a set of trivial templates. Specifically, to find the tracking target in a new frame, each target candidate is sparsely represented in the space spanned by target templates and trivial templates. The sparsity is achieved by solving an ℓ1-regularized least squares problem. Then the candidate with the smallest projection error is taken as the tracking target. After that, tracking is continued using a Bayesian state inference framework. Two strategies are used to further improve the tracking performance. First, target templates are dynamically updated to capture appearance changes. Second, nonnegativity constraints are enforced to filter out clutters which negatively resemble tracking targets. We test the proposed approach on numerous sequences involving different types of challenges including occlusion and variations in illumination, scale, and pose. The proposed approach demonstrates excellent performance in comparison with previously proposed trackers. We also extend the method for simultaneous tracking and recognition by introducing a static template set, which stores target images from different classes. The recognition result at each frame is propagated to produce the final result for the whole video. The approach is validated on a vehicle tracking and classification task using outdoor infrared video sequences.
Multi-target tracking by on-line learned discriminative appearance models
- IEEE Conference on Computer Vision and Pattern Recognition
, 2010
"... We present an approach for online learning of discriminative appearance models for robust multi-target tracking in a crowded scene from a single camera. Although much progress has been made in developing methods for optimal data association, there has been comparatively less work on the appearance m ..."
Abstract
-
Cited by 69 (5 self)
- Add to MetaCart
(Show Context)
We present an approach for online learning of discriminative appearance models for robust multi-target tracking in a crowded scene from a single camera. Although much progress has been made in developing methods for optimal data association, there has been comparatively less work on the appearance models, which are key elements for good performance. Many previous methods either use simple features such as color histograms, or focus on the discriminability between a target and the background which does not resolve ambiguities between the different targets. We propose an algorithm for learning a discriminative appearance model for different targets. Training samples are collected online from tracklets within a time sliding window based on some spatial-temporal constraints; this allows the models to adapt to target instances. Learning uses an AdaBoost algorithm that combines effective image descriptors and their corresponding similarity measurements. We term the learned models as OLDAMs. Our evaluations indicate that OLDAMs have significantly higher discrimination between different targets than conventional holistic color histograms, and when integrated into a hierarchical association framework, they help improve the tracking accuracy, particularly reducing the false alarms and identity switches. 1.
Tracking-learning-detection
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2012
"... Abstract—This paper investigates long-term tracking of unknown objects in a video stream. The object is defined by its location and extent in a single frame. In every frame that follows, the task is to determine the object’s location and extent or indicate that the object is not present. We propose ..."
Abstract
-
Cited by 52 (0 self)
- Add to MetaCart
(Show Context)
Abstract—This paper investigates long-term tracking of unknown objects in a video stream. The object is defined by its location and extent in a single frame. In every frame that follows, the task is to determine the object’s location and extent or indicate that the object is not present. We propose a novel tracking framework (TLD) that explicitly decomposes the long-term tracking task into tracking, learning and detection. The tracker follows the object from frame to frame. The detector localizes all appearances that have been observed so far and corrects the tracker if necessary. The learning estimates detector’s errors and updates it to avoid these errors in the future. We study how to identify detector’s errors and learn from them. We develop a novel learning method (P-N learning) which estimates the errors by a pair of “experts”: (i) P-expert estimates missed detections, and (ii) N-expert estimates false alarms. The learning process is modeled as a discrete dynamical system and the conditions under which the learning guarantees improvement are found. We describe our real-time implementation of the TLD framework and the P-N learning. We carry out an extensive quantitative evaluation which shows a significant improvement over state-of-the-art approaches. Index Terms—Long-term tracking, learning from video, bootstrapping, real-time, semi-supervised learning 1
C.: Robust tracking using local sparse appearance model and k-selection
- In: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on
, 2011
"... Online learned tracking is widely used for it’s adap-tive ability to handle appearance changes. However, it in-troduces potential drifting problems due to the accumula-tion of errors during the self-updating, especially for the occluded scenarios. The recent literature demonstrates that appropriate ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
(Show Context)
Online learned tracking is widely used for it’s adap-tive ability to handle appearance changes. However, it in-troduces potential drifting problems due to the accumula-tion of errors during the self-updating, especially for the occluded scenarios. The recent literature demonstrates that appropriate combinations of trackers can help bal-ance stability and flexibility requirements. We have de-veloped a robust tracking algorithm using a local sparse appearance model (SPT). A static sparse dictionary and a dynamically online updated basis distribution model the target appearance. A novel sparse representation-based voting map and sparse constraint regularized mean-shift support the robust object tracking. Besides these contri-butions, we also introduce a new dictionary learning al-gorithm with a locally constrained sparse representation, calledK-Selection. Based on a set of comprehensive exper-iments, our algorithm has demonstrated better performance than alternatives reported in the recent literature. 1.
Multiobject tracking as maximum weight independent set
- In Proc. IEEE Conf. on Computer Vision and Pattern Recognition
, 2011
"... This paper addresses the problem of simultaneous tracking of multiple targets in a video. We first apply object detectors to every video frame. Pairs of detection responses from every two consecutive frames are then used to build a graph of tracklets. The graph helps transitively link the best match ..."
Abstract
-
Cited by 42 (1 self)
- Add to MetaCart
(Show Context)
This paper addresses the problem of simultaneous tracking of multiple targets in a video. We first apply object detectors to every video frame. Pairs of detection responses from every two consecutive frames are then used to build a graph of tracklets. The graph helps transitively link the best matching tracklets that do not violate hard and soft contextual constraints between the resulting tracks. We prove that this data association problem can be formulated as finding the maximum-weight independent set (MWIS) of the graph. We present a new, polynomial-time MWIS algorithm, and prove that it converges to an optimum. Similarity and contextual constraints between object detections, used for data association, are learned online from object appearance and motion properties. Long-term occlusions are addressed by iteratively repeating MWIS to hierarchically merge smaller tracks into longer ones. Our results demonstrate advantages of simultaneously accounting for soft and hard contextual constraints in multitarget tracking. We outperform the state of the art on the benchmark datasets. 1.
GOOL - Improving Data Association by Joint Modeling of Pedestrian Trajectories and Groupings
"... Abstract. We consider the problem of data association in a multi-person tracking context. In semi-crowded environments, people are still discernible as individually moving entities, that undergo many interac-tions with other people in their direct surrounding. Finding the correct association is ther ..."
Abstract
-
Cited by 40 (1 self)
- Add to MetaCart
(Show Context)
Abstract. We consider the problem of data association in a multi-person tracking context. In semi-crowded environments, people are still discernible as individually moving entities, that undergo many interac-tions with other people in their direct surrounding. Finding the correct association is therefore difficult, but higher-order social factors, such as group membership, are expected to ease the problem. However, estimat-ing group membership is a chicken-and-egg problem: knowing pedestrian trajectories, it is rather easy to find out possible groupings in the data, but in crowded scenes, it is often difficult to estimate closely interacting trajectories without further knowledge about groups. To this end, we propose a third-order graphical model that is able to jointly estimate correct trajectories and group memberships over a short time window. A set of experiments on challenging data underline the importance of joint reasoning for data association in crowded scenarios.
Part-based Multiple-Person Tracking with Partial Occlusion Handling
"... Single camera-based multiple-person tracking is often hindered by difficulties such as occlusion and changes in appearance. In this paper, we address such problems by proposing a robust part-based tracking-by-detection framework. Human detection using part models has become quite popular, yet its ex ..."
Abstract
-
Cited by 37 (8 self)
- Add to MetaCart
Single camera-based multiple-person tracking is often hindered by difficulties such as occlusion and changes in appearance. In this paper, we address such problems by proposing a robust part-based tracking-by-detection framework. Human detection using part models has become quite popular, yet its extension in tracking has not been fully explored. Our approach learns part-based person-specific SVM classifiers which capture the articulations of the human bodies in dynamically changing appearance and background. With the part-based model, our approach is able to handle partial occlusions in both the detection and the tracking stages. In the detection stage, we select the subset of parts which maximizes the probability of detection, which significantly improves the detection performance in crowded scenes. In the tracking stage, we dynamically handle occlusions by distributing the score of the learned person classifier among its corresponding parts, which allows us to detect and predict partial occlusions, and prevent the performance of the classifiers from being degraded. Extensive experiments using the proposed method on several challenging sequences demonstrate state-of-the-art performance in multiple-people tracking. 1.
GMCP-Tracker: Global Multi-object Tracking Using Generalized Minimum Clique Graphs
- In ECCV
, 2012
"... Abstract. Data association is an essential component of any human tracking system. The majority of current methods, such as bipartite matching, incorporate a limited-temporal-locality of the sequence into the data association problem, which makes them inherently prone to IDswitches and difficulties ..."
Abstract
-
Cited by 36 (7 self)
- Add to MetaCart
(Show Context)
Abstract. Data association is an essential component of any human tracking system. The majority of current methods, such as bipartite matching, incorporate a limited-temporal-locality of the sequence into the data association problem, which makes them inherently prone to IDswitches and difficulties caused by long-term occlusion, cluttered background, and crowded scenes. We propose an approach to data association which incorporates both motion and appearance in a global manner. Unlike limited-temporal-locality methods which incorporate a few frames into the data association problem, we incorporate the whole temporal span and solve the data association problem for one object at a time, while implicitly incorporating the rest of the objects. In order to achieve this, we utilize Generalized Minimum Clique Graphs to solve the optimization problem of our data association method. Our proposed method yields a better formulated approach to data association which is supported by our superior results. Experiments show the proposed method makes significant improvements in tracking in the diverse sequences of Town Center [1], TUD-crossing [2], TUD-Stadtmitte [2], PETS2009 [3], and a new sequence called Parking Lot compared to the state of the art methods.