Results 1 - 10
of
55
Object detection with grammar models
- In NIPS
, 2011
"... Compositional models provide an elegant formalism for representing the visual appearance of highly variable objects. While such models are appealing from a theoretical point of view, it has been difficult to demonstrate that they lead to performance advantages on challenging datasets. Here we develo ..."
Abstract
-
Cited by 59 (4 self)
- Add to MetaCart
(Show Context)
Compositional models provide an elegant formalism for representing the visual appearance of highly variable objects. While such models are appealing from a theoretical point of view, it has been difficult to demonstrate that they lead to performance advantages on challenging datasets. Here we develop a grammar model for person detection and show that it outperforms previous high-performance systems on the PASCAL benchmark. Our model represents people using a hierarchy of deformable parts, variable structure and an explicit model of occlusion for partially visible objects. To train the model, we introduce a new discriminative framework for learning structured prediction models from weakly-labeled data. 1
Joint deep learning for pedestrian detection
- In ICCV
, 2013
"... Feature extraction, deformation handling, occlusion handling, and classification are four important components in pedestrian detection. Existing methods learn or design these components either individually or sequentially. The interaction among these components is not yet well ex-plored. This paper ..."
Abstract
-
Cited by 34 (11 self)
- Add to MetaCart
(Show Context)
Feature extraction, deformation handling, occlusion handling, and classification are four important components in pedestrian detection. Existing methods learn or design these components either individually or sequentially. The interaction among these components is not yet well ex-plored. This paper proposes that they should be jointly learned in order to maximize their strengths through coop-eration. We formulate these four components into a joint deep learning framework and propose a new deep network architecture1. By establishing automatic, mutual interac-tion among components, the deep model achieves a 9 % re-duction in the average miss rate compared with the cur-rent best-performing pedestrian detection approaches on the largest Caltech benchmark dataset. 1.
Multimodal Templates for Real-Time Detection of Texture-less Objects in Heavily Cluttered Scenes
"... We present a method for detecting 3D objects using multi-modalities. While it is generic, we demonstrate it on the combination of an image and a dense depth map which give complementary object information. It works in real-time, under heavy clutter, does not require a time consuming training stage, ..."
Abstract
-
Cited by 28 (3 self)
- Add to MetaCart
(Show Context)
We present a method for detecting 3D objects using multi-modalities. While it is generic, we demonstrate it on the combination of an image and a dense depth map which give complementary object information. It works in real-time, under heavy clutter, does not require a time consuming training stage, and can handle untextured objects. It is based on an efficient representation of templates that capture the different modalities, and we show in many experiments on commodity hardware that our approach significantly outperforms state-of-the-art methods on single modalities. 1.
People detection in RGB-D data
- In IEEE/RSJ Int. Conf. on
, 2011
"... Abstract — People detection is a key issue for robots and intelligent systems sharing a space with people. Previous works have used cameras and 2D or 3D range finders for this task. In this paper, we present a novel people detection approach for RGB-D data. We take inspiration from the Histogram of ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
(Show Context)
Abstract — People detection is a key issue for robots and intelligent systems sharing a space with people. Previous works have used cameras and 2D or 3D range finders for this task. In this paper, we present a novel people detection approach for RGB-D data. We take inspiration from the Histogram of Oriented Gradients (HOG) detector to design a robust method to detect people in dense depth data, called Histogram of Oriented Depths (HOD). HOD locally encodes the direction of depth changes and relies on an depth-informed scale-space search that leads to a 3-fold acceleration of the detection process. We then propose Combo-HOD, a RGB-D detector that probabilistically combines HOD and HOG. The experiments include a comprehensive comparison with several alternative detection approaches including visual HOG, several variants of HOD, a geometric person detector for 3D point clouds, and an Haar-based AdaBoost detector. With an equal error rate of 85 % in a range up to 8m, the results demonstrate the robustness of HOD and Combo-HOD on a real-world data set collected with a Kinect sensor in a populated indoor environment. I.
People tracking in rgb-d data with on-line boosted target models
- In IEEE/RSJ Int. Conf. on
, 2011
"... Abstract — People tracking is a key component for robots that are deployed in populated environments. Previous works have used cameras and 2D and 3D range finders for this task. In this paper, we present a 3D people detection and tracking approach using RGB-D data. We combine a novel multi-cue perso ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
(Show Context)
Abstract — People tracking is a key component for robots that are deployed in populated environments. Previous works have used cameras and 2D and 3D range finders for this task. In this paper, we present a 3D people detection and tracking approach using RGB-D data. We combine a novel multi-cue person detector for RGB-D data with an on-line detector that learns individual target models. The two detectors are integrated into a decisional framework with a multi-hypothesis tracker that controls on-line learning through a track interpretation feedback. For on-line learning, we take a boosting approach using three types of RGB-D features and a confidence maximization search in 3D space. The approach is general in that it neither relies on background learning nor a ground plane assumption. For the evaluation, we collect data in a populated indoor environment using a setup of three Microsoft Kinect sensors with a joint field of view. The results demonstrate reliable 3D tracking of people in RGB-D data and show how the framework is able to avoid drift of the on-line detector and increase the overall tracking performance. I.
Continuous Energy Minimization for Multi-Target Tracking
"... Abstract—Many recent advances in multiple target tracking aim at finding a (nearly) optimal set of trajectories within a temporal window. To handle the large space of possible trajectory hypotheses, it is typically reduced to a finite set by some form of data-driven or regular discretization. In thi ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
(Show Context)
Abstract—Many recent advances in multiple target tracking aim at finding a (nearly) optimal set of trajectories within a temporal window. To handle the large space of possible trajectory hypotheses, it is typically reduced to a finite set by some form of data-driven or regular discretization. In this work we propose an alternative formulation of multi-target tracking as minimization of a continuous energy. Contrary to recent approaches, we focus on designing an energy that corresponds to a more complete representation of the problem, rather than one that is amenable to global optimization. Besides the image evidence, the energy function takes into account physical constraints, such as target dynamics, mutual exclusion, and track persistence. In addition, partial image evidence is handled with explicit occlusion reasoning, and different targets are disambiguated with an appearance model. To nevertheless find strong local minima of the proposed non-convex energy we construct a suitable optimization scheme that alternates between continuous conjugate gradient descent and discrete trans-dimensional jump moves. These moves, which are executed such that they always reduce the energy, allow the search to escape weak minima and explore a much larger portion of the search space of varying dimensionality. We demonstrate the validity of our approach with an extensive quantitative evaluation on several public datasets. Index Terms—Multi-object tracking, tracking-by-detection, visual surveillance, continuous optimization. F 1
A Discriminative Deep Model for Pedestrian Detection with Occlusion Handling
- In Proc. CVPR
, 2012
"... Part-based models have demonstrated their merit in ob-ject detection. However, there is a key issue to be solved on how to integrate the inaccurate scores of part detectors when there are occlusions or large deformations. To han-dle the imperfectness of part detectors, this paper presents a probabil ..."
Abstract
-
Cited by 18 (12 self)
- Add to MetaCart
(Show Context)
Part-based models have demonstrated their merit in ob-ject detection. However, there is a key issue to be solved on how to integrate the inaccurate scores of part detectors when there are occlusions or large deformations. To han-dle the imperfectness of part detectors, this paper presents a probabilistic pedestrian detection framework. In this frame-work, a deformable part-based model is used to obtain the scores of part detectors and the visibilities of parts are mod-eled as hidden variables. Unlike previous occlusion han-dling approaches that assume independence among visibil-ity probabilities of parts or manually define rules for the visibility relationship, a discriminative deep model is used in this paper for learning the visibility relationship among overlapping parts at multiple layers. Experimental results on three public datasets (Caltech, ETH and Daimler) and a new CUHK occlusion dataset1 specially designed for the evaluation of occlusion handling approaches show the ef-fectiveness of the proposed approach. 1.
Monocular 3D scene understanding with explicit occlusion reasoning
- In CVPR
, 2011
"... Scene understanding from a monocular, moving cam-era is a challenging problem with a number of applica-tions including robotics and automotive safety. While re-cent systems have shown that this is best accomplished with a 3D scene model, handling of partial object occlu-sion is still unsatisfactory. ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
(Show Context)
Scene understanding from a monocular, moving cam-era is a challenging problem with a number of applica-tions including robotics and automotive safety. While re-cent systems have shown that this is best accomplished with a 3D scene model, handling of partial object occlu-sion is still unsatisfactory. In this paper we propose an ap-proach that tightly integrates monocular 3D scene tracking-by-detection with explicit object-object occlusion reason-ing. Full object and object part detectors are combined in a mixture of experts based on their expected visibility, which is obtained from the 3D scene model. For the difficult case of multi-people tracking, we demonstrate that our approach yields more robust detection and tracking of partially visible pedestrians, even when they are occluded over long periods of time. Our approach is evaluated on two challenging se-quences recorded from a moving camera in busy pedestrian zones and outperforms several state-of-the-art approaches. 1.
Contextual Boost for Pedestrian Detection
"... Pedestrian detection from images is an important and yet challenging task. The conventional methods usually identify human figures using image features inside the local regions. In this paper we present that, besides the local features, context cues in the neighborhood provide important constraints ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Pedestrian detection from images is an important and yet challenging task. The conventional methods usually identify human figures using image features inside the local regions. In this paper we present that, besides the local features, context cues in the neighborhood provide important constraints that are not yet well utilized. We propose a framework to incorporate the context constraints for detection. First, we combine the local window with neighborhood windows to construct a multi-scale image context descriptor, designed to represent the contextual cues in spatial, scaling, and color spaces. Second, we develop an iterative classification algorithm called contextual boost. At each iteration, the classifier responses from the previous iteration across the neighborhood and multiple image scales, called classification context, are incorporated as additional features to learn a new classifier. The number of iterations is determined in the training process when the error rate converges. Since the classification context incorporates contextual cues from the neighborhood, through iterations it implicitly propagates to greater areas and thus provides more global constraints. We evaluate our method on the Caltech benchmark dataset [11]. The results confirm the advantages of the proposed framework. Compared with state of the arts, our method reduces the miss rate from 29 % by [30] to 25 % at 1 false positive per image (FPPI). 1.
A MULTI-LEVEL MIXTURE-OF-EXPERTS FRAMEWORK FOR PEDESTRIAN CLASSIFICATION
, 2011
"... Notwithstanding many years of progress, pedestrian recognition is still a difficult but important problem. We present a novel multi-level Mixture-of-Experts approach to combine information from multiple features and cues with the objective of improved pedestrian classification. On pose-level, shape ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
(Show Context)
Notwithstanding many years of progress, pedestrian recognition is still a difficult but important problem. We present a novel multi-level Mixture-of-Experts approach to combine information from multiple features and cues with the objective of improved pedestrian classification. On pose-level, shape cues based on Chamfer shape matching provide sample-dependent priors for a certain pedestrian view. On modality-level, we represent each data sample in terms of image intensity, (dense) depth and (dense) flow. On feature-level, we consider histograms of oriented gradients (HOG) and local binary patterns (LBP). Multilayer perceptrons (MLP) and linear support vector machines (linSVM) are used as expert classifiers. Experiments are performed on a unique real-world multimodality dataset captured from a moving vehicle in urban traffic. This dataset has been made public for research purposes. Our results show a significant performance boost of up to a factor of 42 in reduction of false positives at constant detection rates of our approach compared to a baseline intensity-only HOG/linSVM approach.