Results 1 - 10
of
153
Pedestrian Detection: An Evaluation of the State of the Art
- SUBMISSION TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1
"... Pedestrian detection is a key problem in computer vision, with several applications that have the potential to positively impact quality of life. In recent years, the number of approaches to detecting pedestrians in monocular images has grown steadily. However, multiple datasets and widely varying e ..."
Abstract
-
Cited by 174 (10 self)
- Add to MetaCart
Pedestrian detection is a key problem in computer vision, with several applications that have the potential to positively impact quality of life. In recent years, the number of approaches to detecting pedestrians in monocular images has grown steadily. However, multiple datasets and widely varying evaluation protocols are used, making direct comparisons difficult. To address these shortcomings, we perform an extensive evaluation of the state of the art in a unified framework. We make three primary contributions: (1) we put together a large, well-annotated and realistic monocular pedestrian detection dataset and study the statistics of the size, position and occlusion patterns of pedestrians in urban scenes, (2) we propose a refined per-frame evaluation methodology that allows us to carry out probing and informative comparisons, including measuring performance in relation to scale and occlusion, and (3) we evaluate the performance of sixteen pre-trained state-of-the-art detectors across six datasets. Our study allows us to assess the state of the art and provides a framework for gauging future efforts. Our experiments show that despite significant progress, performance still has much room for improvement. In particular, detection is disappointing at low resolutions and for partially occluded pedestrians.
New features and insights for pedestrian detection
- In CVPR
, 2010
"... Despite impressive progress in people detection the performance on challenging datasets like Caltech Pedestrians or TUD-Brussels is still unsatisfactory. In this work we show that motion features derived from optic flow yield substantial improvements on image sequences, if implemented correctly—even ..."
Abstract
-
Cited by 68 (5 self)
- Add to MetaCart
(Show Context)
Despite impressive progress in people detection the performance on challenging datasets like Caltech Pedestrians or TUD-Brussels is still unsatisfactory. In this work we show that motion features derived from optic flow yield substantial improvements on image sequences, if implemented correctly—even in the case of low-quality video and consequently degraded flow fields. Furthermore, we introduce a new feature, self-similarity on color channels, which consistently improves detection performance both for static images and for video sequences, across different datasets. In combination with HOG, these two features outperform the state-of-the-art by up to 20%. Finally, we report two insights concerning detector evaluations, which apply to classifier-based object detection in general. First, we show that a commonly under-estimated detail of training, the number of bootstrapping rounds, has a drastic influence on the relative (and absolute) performance of different feature/classifier combinations. Second, we discuss important intricacies of detector evaluation and show that current benchmarking protocols lack crucial details, which can distort evaluations. 1.
Multi-cue onboard pedestrian detection
- In Proc. CVPR
, 2009
"... Various powerful people detection methods exist. Sur-prisingly, most approaches rely on static image features only despite the obvious potential of motion information for people detection. This paper systematically evaluates dif-ferent features and classifiers in a sliding-window frame-work. First, ..."
Abstract
-
Cited by 62 (6 self)
- Add to MetaCart
(Show Context)
Various powerful people detection methods exist. Sur-prisingly, most approaches rely on static image features only despite the obvious potential of motion information for people detection. This paper systematically evaluates dif-ferent features and classifiers in a sliding-window frame-work. First, our experiments indicate that incorporating motion information improves detection performance signif-icantly. Second, the combination of multiple and comple-mentary feature types can also help improve performance. And third, the choice of the classifier-feature combination and several implementation details are crucial to reach best performance. In contrast to many recent papers experi-mental results are reported for four different datasets rather than using a single one. Three of them are taken from the lit-erature allowing for direct comparison. The fourth dataset is newly recorded using an onboard camera driving through urban environment. Consequently this dataset is more re-alistic and more challenging than any currently available dataset. 1.
Multiresolution models for object detection
"... Abstract. Most current approaches to recognition aim to be scaleinvariant. However, the cues available for recognizing a 300 pixel tall object are qualitatively different from those for recognizing a 3 pixel tall object. We argue that for sensors with finite resolution, one should instead use scale- ..."
Abstract
-
Cited by 55 (6 self)
- Add to MetaCart
(Show Context)
Abstract. Most current approaches to recognition aim to be scaleinvariant. However, the cues available for recognizing a 300 pixel tall object are qualitatively different from those for recognizing a 3 pixel tall object. We argue that for sensors with finite resolution, one should instead use scale-variant, or multiresolution representations that adapt in complexity to the size of a putative detection window. We describe a multiresolution model that acts as a deformable part-based model when scoring large instances and a rigid template with scoring small instances. We also examine the interplay of resolution and context, and demonstrate that context is most helpful for detecting low-resolution instances when local models are limited in discriminative power. We demonstrate impressive results on the Caltech Pedestrian benchmark, which contains object instances at a wide range of scales. Whereas recent state-of-theart methods demonstrate missed detection rates of 86%-37 % at 1 falsepositive-per-image, our multiresolution model reduces the rate to 29%. 1
Multi-Cue Pedestrian Classification With Partial Occlusion Handling
"... This paper presents a novel mixture-of-experts framework for pedestrian classification with partial occlusion handling. The framework involves a set of component-based expert classifiers trained on features derived from intensity, depth and motion. To handle partial occlusion, we compute expert weig ..."
Abstract
-
Cited by 55 (7 self)
- Add to MetaCart
(Show Context)
This paper presents a novel mixture-of-experts framework for pedestrian classification with partial occlusion handling. The framework involves a set of component-based expert classifiers trained on features derived from intensity, depth and motion. To handle partial occlusion, we compute expert weights that are related to the degree of visibility of the associated component. This degree of visibility is determined by examining occlusion boundaries, i.e. discontinuities in depth and motion. Occlusion-dependent component weights allow to focus the combined decision of the mixtureof-experts classifier on the unoccluded body parts. In experiments on extensive real-world data sets, with both partially occluded and non-occluded pedestrians, we obtain significant performance boosts over state-of-the-art approaches by up to a factor of four in reduction of false positives at constant detection rates. The dataset is made public for benchmarking purposes. 1.
A Survey of Recent Advances in Face detection
, 2010
"... Face detection has been one of the most studied topics in the computer vision literature. In this technical report, we survey the recent advances in face detection for the past decade. The seminal Viola-Jones face detector is first reviewed. We then survey the various techniques according to how the ..."
Abstract
-
Cited by 52 (1 self)
- Add to MetaCart
Face detection has been one of the most studied topics in the computer vision literature. In this technical report, we survey the recent advances in face detection for the past decade. The seminal Viola-Jones face detector is first reviewed. We then survey the various techniques according to how they extract features and what learning algorithms are adopted. It is our hope that by reviewing the many existing algorithms, we will see even better algorithms developed to solve this fundamental computer vision problem.
People detection in RGB-D data
- In IEEE/RSJ Int. Conf. on
, 2011
"... Abstract — People detection is a key issue for robots and intelligent systems sharing a space with people. Previous works have used cameras and 2D or 3D range finders for this task. In this paper, we present a novel people detection approach for RGB-D data. We take inspiration from the Histogram of ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
(Show Context)
Abstract — People detection is a key issue for robots and intelligent systems sharing a space with people. Previous works have used cameras and 2D or 3D range finders for this task. In this paper, we present a novel people detection approach for RGB-D data. We take inspiration from the Histogram of Oriented Gradients (HOG) detector to design a robust method to detect people in dense depth data, called Histogram of Oriented Depths (HOD). HOD locally encodes the direction of depth changes and relies on an depth-informed scale-space search that leads to a 3-fold acceleration of the detection process. We then propose Combo-HOD, a RGB-D detector that probabilistically combines HOD and HOG. The experiments include a comprehensive comparison with several alternative detection approaches including visual HOG, several variants of HOD, a geometric person detector for 3D point clouds, and an Haar-based AdaBoost detector. With an equal error rate of 85 % in a range up to 8m, the results demonstrate the robustness of HOD and Combo-HOD on a real-world data set collected with a Kinect sensor in a populated indoor environment. I.
Integrated Pedestrian Classification and Orientation Estimation
"... This paper presents a novel approach to single-frame pedestrian classification and orientation estimation. Unlike previous work which addressed classification and orientation separately with different models, our method involves a probabilistic framework to approach both in a unified fashion. We add ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
(Show Context)
This paper presents a novel approach to single-frame pedestrian classification and orientation estimation. Unlike previous work which addressed classification and orientation separately with different models, our method involves a probabilistic framework to approach both in a unified fashion. We address both problems in terms of a set of view-related models which couple discriminative expert classifiers with sample-dependent priors, facilitating easy integration of other cues (e.g. motion, shape) in a Bayesian fashion. This mixture-of-experts formulation approximates the probability density of pedestrian orientation and scalesup to the use of multiple cameras. Experiments on large real-world data show a significant performance improvement in both pedestrian classification and orientation estimation of up to 50%, compared to stateof-the-art, using identical data and evaluation techniques. 1.
People tracking in rgb-d data with on-line boosted target models
- In IEEE/RSJ Int. Conf. on
, 2011
"... Abstract — People tracking is a key component for robots that are deployed in populated environments. Previous works have used cameras and 2D and 3D range finders for this task. In this paper, we present a 3D people detection and tracking approach using RGB-D data. We combine a novel multi-cue perso ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
(Show Context)
Abstract — People tracking is a key component for robots that are deployed in populated environments. Previous works have used cameras and 2D and 3D range finders for this task. In this paper, we present a 3D people detection and tracking approach using RGB-D data. We combine a novel multi-cue person detector for RGB-D data with an on-line detector that learns individual target models. The two detectors are integrated into a decisional framework with a multi-hypothesis tracker that controls on-line learning through a track interpretation feedback. For on-line learning, we take a boosting approach using three types of RGB-D features and a confidence maximization search in 3D space. The approach is general in that it neither relies on background learning nor a ground plane assumption. For the evaluation, we collect data in a populated indoor environment using a setup of three Microsoft Kinect sensors with a joint field of view. The results demonstrate reliable 3D tracking of people in RGB-D data and show how the framework is able to avoid drift of the on-line detector and increase the overall tracking performance. I.
Learning people detection models from few training samples
- In CVPR,’11
"... People detection is an important task for a wide range of applications in computer vision. State-of-the-art methods learn appearance based models requiring tedious collection and annotation of large data corpora. Also, obtaining data sets representing all relevant variations with sufficient accuracy ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
(Show Context)
People detection is an important task for a wide range of applications in computer vision. State-of-the-art methods learn appearance based models requiring tedious collection and annotation of large data corpora. Also, obtaining data sets representing all relevant variations with sufficient accuracy for the intended application domain at hand is often a non-trivial task. Therefore this paper investigates how 3D shape models from computer graphics can be leveraged to ease training data generation. In particular we employ a rendering-based reshaping method in order to generate thousands of synthetic training samples from only a few persons and views. We evaluate our data generation method for two different people detection models. Our experiments on a challenging multi-view dataset indicate that the data from as few as eleven persons suffices to achieve good performance. When we additionally combine our synthetic training samples with real data we even outperform existing state-of-the-art methods. 1.