Results 1 - 10
of
68
Pedestrian Detection: An Evaluation of the State of the Art
- SUBMISSION TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1
"... Pedestrian detection is a key problem in computer vision, with several applications that have the potential to positively impact quality of life. In recent years, the number of approaches to detecting pedestrians in monocular images has grown steadily. However, multiple datasets and widely varying e ..."
Abstract
-
Cited by 174 (10 self)
- Add to MetaCart
Pedestrian detection is a key problem in computer vision, with several applications that have the potential to positively impact quality of life. In recent years, the number of approaches to detecting pedestrians in monocular images has grown steadily. However, multiple datasets and widely varying evaluation protocols are used, making direct comparisons difficult. To address these shortcomings, we perform an extensive evaluation of the state of the art in a unified framework. We make three primary contributions: (1) we put together a large, well-annotated and realistic monocular pedestrian detection dataset and study the statistics of the size, position and occlusion patterns of pedestrians in urban scenes, (2) we propose a refined per-frame evaluation methodology that allows us to carry out probing and informative comparisons, including measuring performance in relation to scale and occlusion, and (3) we evaluate the performance of sixteen pre-trained state-of-the-art detectors across six datasets. Our study allows us to assess the state of the art and provides a framework for gauging future efforts. Our experiments show that despite significant progress, performance still has much room for improvement. In particular, detection is disappointing at low resolutions and for partially occluded pedestrians.
Learning convolutional feature hierarchies for visual recognition
- In Advances in
, 2010
"... We propose an unsupervised method for learning multi-stage hierarchies of sparse convolutional features. While sparse coding has become an increasingly popular method for learning visual features, it is most often trained at the patch level. Applying the resulting filters convolutionally results in ..."
Abstract
-
Cited by 56 (1 self)
- Add to MetaCart
(Show Context)
We propose an unsupervised method for learning multi-stage hierarchies of sparse convolutional features. While sparse coding has become an increasingly popular method for learning visual features, it is most often trained at the patch level. Applying the resulting filters convolutionally results in highly redundant codes because overlapping patches are encoded in isolation. By training convolutionally over large image windows, our method reduces the redudancy between feature vectors at neighboring locations and improves the efficiency of the overall repre-sentation. In addition to a linear decoder that reconstructs the image from sparse features, our method trains an efficient feed-forward encoder that predicts quasi-sparse features from the input. While patch-based training rarely produces any-thing but oriented edge detectors, we show that convolutional training produces highly diverse filters, including center-surround filters, corner detectors, cross de-tectors, and oriented grating detectors. We show that using these filters in multi-stage convolutional network architecture improves performance on a number of visual recognition and detection tasks. 1
Discrete-Continuous Optimization for Multi-Target Tracking
"... The problem of multi-target tracking is comprised of two distinct, but tightly coupled challenges: (i) the naturally discrete problem of data association, i.e. assigning image observations to the appropriate target; (ii) the naturally continuous problem of trajectory estimation, i.e. recovering the ..."
Abstract
-
Cited by 39 (5 self)
- Add to MetaCart
(Show Context)
The problem of multi-target tracking is comprised of two distinct, but tightly coupled challenges: (i) the naturally discrete problem of data association, i.e. assigning image observations to the appropriate target; (ii) the naturally continuous problem of trajectory estimation, i.e. recovering the trajectories of all targets. To go beyond simple greedy solutions for data association, recent approaches often perform multi-target tracking using discrete optimization. This has the disadvantage that trajectories need to be pre-computed or represented discretely, thus limiting accuracy. In this paper we instead formulate multi-target tracking as a discretecontinuous optimization problem that handles each aspect in its natural domain and allows leveraging powerful methods for multi-model fitting. Data association is performed using discrete optimization with label costs, yielding near optimality. Trajectory estimation is posed as a continuous fitting problem with a simple closed-form solution, which is used in turn to update the label costs. We demonstrate the accuracy and robustness of our approach with state-of-theart performance on several standard datasets. 1.
Joint deep learning for pedestrian detection
- In ICCV
, 2013
"... Feature extraction, deformation handling, occlusion handling, and classification are four important components in pedestrian detection. Existing methods learn or design these components either individually or sequentially. The interaction among these components is not yet well ex-plored. This paper ..."
Abstract
-
Cited by 34 (11 self)
- Add to MetaCart
(Show Context)
Feature extraction, deformation handling, occlusion handling, and classification are four important components in pedestrian detection. Existing methods learn or design these components either individually or sequentially. The interaction among these components is not yet well ex-plored. This paper proposes that they should be jointly learned in order to maximize their strengths through coop-eration. We formulate these four components into a joint deep learning framework and propose a new deep network architecture1. By establishing automatic, mutual interac-tion among components, the deep model achieves a 9 % re-duction in the average miss rate compared with the cur-rent best-performing pedestrian detection approaches on the largest Caltech benchmark dataset. 1.
Efficient Classification for Additive kernel SVMs
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2012
"... We show that a class of non-linear kernel SVMs admit approximate classifiers with run-time and memory complexity that is independent of the number of support vectors. This class of kernels which we refer to as additive kernels, include the widely used kernels for histogram based image comparison lik ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
(Show Context)
We show that a class of non-linear kernel SVMs admit approximate classifiers with run-time and memory complexity that is independent of the number of support vectors. This class of kernels which we refer to as additive kernels, include the widely used kernels for histogram based image comparison like intersection and chi-squared kernels. Additive kernel SVMs can offer significant improvements in accuracy over linear SVMs on a wide variety of tasks while having the same run-time, making them practical for large scale recognition or real-time detection tasks. We present experiments on a variety of datasets including the INRIA person, Daimler-Chrysler pedestrians, UIUC Cars, Caltech-101, MNIST and USPS digits, to demonstrate the effectiveness of our method for efficient evaluation of SVMs with additive kernels. Since its introduction, our method has become integral to various state of the art systems for PASCAL VOC object detection/image classification, ImageNet Challenge, TRECVID, etc. The techniques we propose can also be applied to settings where evaluation of weighted additive kernels is required, which include kernelized versions of PCA, LDA, regression, k-means, as well as speeding up the inner loop of SVM classifier training algorithms.
Exploring Weak Stabilization for Motion Feature Extraction
"... We describe novel but simple motion features for the problem of detecting objects in video sequences. Previous approaches either compute optical flow or temporal differences on video frame pairs with various assumptions about stabilization. We describe a combined approach that uses coarse-scale flow ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
(Show Context)
We describe novel but simple motion features for the problem of detecting objects in video sequences. Previous approaches either compute optical flow or temporal differences on video frame pairs with various assumptions about stabilization. We describe a combined approach that uses coarse-scale flow and fine-scale temporal difference features. Our approach performs weak motion stabilization by factoring out camera motion and coarse object motion while preserving nonrigid motions that serve as useful cues for recognition. We show results for pedestrian detection and human pose estimation in video sequences, achieving state-of-the-art results in both. In particular, given a fixed detection rate our method achieves a five-fold reduction in false positives over prior art on the Caltech Pedestrian benchmark. Finally, we perform extensive diagnostic experiments to reveal what aspects of our system are crucial for good performance. Proper stabilization, long time-scale features, and proper normalization are all critical. 1.
A Discriminative Deep Model for Pedestrian Detection with Occlusion Handling
- In Proc. CVPR
, 2012
"... Part-based models have demonstrated their merit in ob-ject detection. However, there is a key issue to be solved on how to integrate the inaccurate scores of part detectors when there are occlusions or large deformations. To han-dle the imperfectness of part detectors, this paper presents a probabil ..."
Abstract
-
Cited by 18 (12 self)
- Add to MetaCart
(Show Context)
Part-based models have demonstrated their merit in ob-ject detection. However, there is a key issue to be solved on how to integrate the inaccurate scores of part detectors when there are occlusions or large deformations. To han-dle the imperfectness of part detectors, this paper presents a probabilistic pedestrian detection framework. In this frame-work, a deformable part-based model is used to obtain the scores of part detectors and the visibilities of parts are mod-eled as hidden variables. Unlike previous occlusion han-dling approaches that assume independence among visibil-ity probabilities of parts or manually define rules for the visibility relationship, a discriminative deep model is used in this paper for learning the visibility relationship among overlapping parts at multiple layers. Experimental results on three public datasets (Caltech, ETH and Daimler) and a new CUHK occlusion dataset1 specially designed for the evaluation of occlusion handling approaches show the ef-fectiveness of the proposed approach. 1.
Contextual Boost for Pedestrian Detection
"... Pedestrian detection from images is an important and yet challenging task. The conventional methods usually identify human figures using image features inside the local regions. In this paper we present that, besides the local features, context cues in the neighborhood provide important constraints ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
(Show Context)
Pedestrian detection from images is an important and yet challenging task. The conventional methods usually identify human figures using image features inside the local regions. In this paper we present that, besides the local features, context cues in the neighborhood provide important constraints that are not yet well utilized. We propose a framework to incorporate the context constraints for detection. First, we combine the local window with neighborhood windows to construct a multi-scale image context descriptor, designed to represent the contextual cues in spatial, scaling, and color spaces. Second, we develop an iterative classification algorithm called contextual boost. At each iteration, the classifier responses from the previous iteration across the neighborhood and multiple image scales, called classification context, are incorporated as additional features to learn a new classifier. The number of iterations is determined in the training process when the error rate converges. Since the classification context incorporates contextual cues from the neighborhood, through iterations it implicitly propagates to greater areas and thus provides more global constraints. We evaluate our method on the Caltech benchmark dataset [11]. The results confirm the advantages of the proposed framework. Compared with state of the arts, our method reduces the miss rate from 29 % by [30] to 25 % at 1 false positive per image (FPPI). 1.
A MULTI-LEVEL MIXTURE-OF-EXPERTS FRAMEWORK FOR PEDESTRIAN CLASSIFICATION
, 2011
"... Notwithstanding many years of progress, pedestrian recognition is still a difficult but important problem. We present a novel multi-level Mixture-of-Experts approach to combine information from multiple features and cues with the objective of improved pedestrian classification. On pose-level, shape ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
(Show Context)
Notwithstanding many years of progress, pedestrian recognition is still a difficult but important problem. We present a novel multi-level Mixture-of-Experts approach to combine information from multiple features and cues with the objective of improved pedestrian classification. On pose-level, shape cues based on Chamfer shape matching provide sample-dependent priors for a certain pedestrian view. On modality-level, we represent each data sample in terms of image intensity, (dense) depth and (dense) flow. On feature-level, we consider histograms of oriented gradients (HOG) and local binary patterns (LBP). Multilayer perceptrons (MLP) and linear support vector machines (linSVM) are used as expert classifiers. Experiments are performed on a unique real-world multimodality dataset captured from a moving vehicle in urban traffic. This dataset has been made public for research purposes. Our results show a significant performance boost of up to a factor of 42 in reduction of false positives at constant detection rates of our approach compared to a baseline intensity-only HOG/linSVM approach.
Multi-stage contextual deep learning for pedestrian detection
- In ICCV
, 2013
"... Cascaded classifiers1 have been widely used in pedes-trian detection and achieved great success. These classi-fiers are trained sequentially without joint optimization. In this paper, we propose a new deep model that can jointly train multi-stage classifiers through several stages of back-propagatio ..."
Abstract
-
Cited by 15 (11 self)
- Add to MetaCart
(Show Context)
Cascaded classifiers1 have been widely used in pedes-trian detection and achieved great success. These classi-fiers are trained sequentially without joint optimization. In this paper, we propose a new deep model that can jointly train multi-stage classifiers through several stages of back-propagation. It keeps the score map output by a classifier within a local region and uses it as contextual information to support the decision at the next stage. Through a spe-cific design of the training strategy, this deep architecture is able to simulate the cascaded classifiers by mining hard samples to train the network stage-by-stage. Each classi-fier handles samples at a different difficulty level. Unsu-pervised pre-training and specifically designed stage-wise supervised training are used to regularize the optimization problem. Both theoretical analysis and experimental re-sults show that the training strategy helps to avoid overfit-ting. Experimental results on three datasets (Caltech, ETH and TUD-Brussels) show that our approach outperforms the state-of-the-art approaches. 1.