Results 1 - 10
of
70
Object Detection with Discriminatively Trained Part Based Models
"... We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges. While deformable part models have become quite popular, their ..."
Abstract
-
Cited by 171 (14 self)
- Add to MetaCart
We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges. While deformable part models have become quite popular, their value had not been demonstrated on difficult benchmarks such as the PASCAL datasets. Our system relies on new methods for discriminative training with partially labeled data. We combine a margin-sensitive approach for data-mining hard negative examples with a formalism we call latent SVM. A latent SVM is a reformulation of MI-SVM in terms of latent variables. A latent SVM is semi-convex and the training problem becomes convex once latent information is specified for the positive examples. This leads to an iterative training algorithm that alternates between fixing latent values for positive examples and optimizing the latent SVM objective function.
Beyond sliding windows: Object localization by efficient subwindow search
- In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR
, 2008
"... Most successful object recognition systems rely on binary classification, deciding only if an object is present or not, but not providing information on the actual object location. To perform localization, one can take a sliding window approach, but this strongly increases the computational cost, be ..."
Abstract
-
Cited by 63 (8 self)
- Add to MetaCart
Most successful object recognition systems rely on binary classification, deciding only if an object is present or not, but not providing information on the actual object location. To perform localization, one can take a sliding window approach, but this strongly increases the computational cost, because the classifier function has to be evaluated over a large set of candidate subwindows. In this paper, we propose a simple yet powerful branchand-bound scheme that allows efficient maximization of a large class of classifier functions over all possible subimages. It converges to a globally optimal solution typically in sublinear time. We show how our method is applicable to different object detection and retrieval scenarios. The achieved speedup allows the use of classifiers for localization that formerly were considered too slow for this task, such as SVMs with a spatial pyramid kernel or nearest neighbor classifiers based on the χ 2-distance. We demonstrate state-of-the-art performance of the resulting systems on the UIUC Cars dataset, the PASCAL VOC 2006 dataset and in the PASCAL VOC 2007 competition. 1.
Pedestrian detection: A benchmark
- In CVPR
, 2009
"... Pedestrian detection is a key problem in computer vision, with several applications including robotics, surveillance and automotive safety. Much of the progress of the past few years has been driven by the availability of challenging public datasets. To continue the rapid rate of innovation, we intr ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
Pedestrian detection is a key problem in computer vision, with several applications including robotics, surveillance and automotive safety. Much of the progress of the past few years has been driven by the availability of challenging public datasets. To continue the rapid rate of innovation, we introduce the Caltech Pedestrian Dataset, which is two orders of magnitude larger than existing datasets. The dataset contains richly annotated video, recorded from a moving vehicle, with challenging images of low resolution and frequently occluded people. We propose improved evaluation metrics, demonstrating that commonly used perwindow measures are flawed and can fail to predict performance on full images. We also benchmark several promising detection systems, providing an overview of state-of-theart performance and a direct, unbiased comparison of existing methods. Finally, by analyzing common failure cases, we help identify future research directions for the field.
Gool. 3D urban scene modeling integrating recognition and reconstruction
- IJCV
, 2008
"... Abstract — Supplying realistically textured 3D city models at ground level promises to be useful for pre-visualizing upcoming traffic situations in car navigation systems. Because this previsualization can be rendered from the expected future viewpoints of the driver, the required maneuver will be m ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
Abstract — Supplying realistically textured 3D city models at ground level promises to be useful for pre-visualizing upcoming traffic situations in car navigation systems. Because this previsualization can be rendered from the expected future viewpoints of the driver, the required maneuver will be more easily understandable. 3D city models can be reconstructed from the imagery recorded by surveying vehicles. The vastness of image material gathered by these vehicles, however, puts extreme demands on vision algorithms to ensure their practical usability. Algorithms need to be as fast as possible and should result in compact, memory efficient 3D city models for future ease of distribution and visualization. For the considered application, these are not contradictory demands. Simplified geometry assumptions can speed up vision algorithms while automatically guaranteeing compact geometry models. In this paper, we present a novel city modeling framework which builds upon this philosophy to create 3D content at high speed. Objects in the environment, such as cars and pedestrians, may however disturb the reconstruction, as they violate the simplified geometry assumptions, leading to visually unpleasant artifacts and degrading the visual realism of the resulting 3D city model. Unfortunately, such objects are prevalent in urban scenes. We therefore extend the reconstruction framework by integrating it with an object recognition module that automatically detects cars in the input video streams and localizes them in 3D. The two components of our system are tightly integrated and benefit from each other’s continuous input. 3D reconstruction delivers geometric scene context, which greatly helps improve detection precision. The detected car locations, on the other hand, are used to instantiate virtual placeholder models which augment the visual realism of the reconstructed city model. Index Terms — city modeling, structure from motion, 3D reconstruction, object detection, temporal integration I.
Class-specific hough forests for object detection
- In Proceedings IEEE Conference Computer Vision and Pattern Recognition
, 2009
"... We present a method for the detection of instances of an object class, such as cars or pedestrians, in natural images. Similarly to some previous works, this is accomplished via generalized Hough transform, where the detections of individual object parts cast probabilistic votes for possible locatio ..."
Abstract
-
Cited by 24 (10 self)
- Add to MetaCart
We present a method for the detection of instances of an object class, such as cars or pedestrians, in natural images. Similarly to some previous works, this is accomplished via generalized Hough transform, where the detections of individual object parts cast probabilistic votes for possible locations of the centroid of the whole object; the detection hypotheses then correspond to the maxima of the Hough image that accumulates the votes from all parts. However, whereas the previous methods detect object parts using generative codebooks of part appearances, we take a more discriminative approach to object part detection. Towards this end, we train a class-specific Hough forest, which is a random forest that directly maps the image patch appearance to the probabilistic vote about the possible location of the object centroid. We demonstrate that Hough forests improve the results of the Hough-transform object detection significantly and achieve state-of-the-art performance for several classes and datasets. 1.
Efficient Subwindow Search: A Branch and Bound Framework for Object Localization
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
"... Most successful object recognition systems rely on binary classification, deciding only if an object is present or not, but not providing information on the actual object location. To estimate the object’s location one can take a sliding window approach, but this strongly increases the computational ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
Most successful object recognition systems rely on binary classification, deciding only if an object is present or not, but not providing information on the actual object location. To estimate the object’s location one can take a sliding window approach, but this strongly increases the computational cost, because the classifier or similarity function has to be evaluated over a large set of candidate subwindows. In this paper, we propose a simple yet powerful branch and bound scheme that allows efficient maximization of a large class of quality functions over all possible subimages. It converges to a globally optimal solution typically in linear or even sublinear time, in constrast to the quadratic scaling of exhaustive or sliding window search. We show how our method is applicable to different object detection and image retrieval scenarios. The achieved speedup allows the use of classifiers for localization that formerly were considered too slow for this task, such as SVMs with a spatial pyramid kernel or nearest neighbor classifiers based on the χ²-distance. We demonstrate state-of-the-art localization performance of the resulting systems on the
Robust tracking-by-detection using a detector confidence particle filter
- In ICCV
, 2009
"... We propose a novel approach for multi-person trackingby-detection in a particle filtering framework. In addition to final high-confidence detections, our algorithm uses the continuous confidence of pedestrian detectors and online trained, instance-specific classifiers as a graded observation model. ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
We propose a novel approach for multi-person trackingby-detection in a particle filtering framework. In addition to final high-confidence detections, our algorithm uses the continuous confidence of pedestrian detectors and online trained, instance-specific classifiers as a graded observation model. Thus, generic object category knowledge is complemented by instance-specific information. A main contribution of this paper is the exploration of how these unreliable information sources can be used for multi-person tracking. The resulting algorithm robustly tracks a large number of dynamically moving persons in complex scenes with occlusions, does not rely on background modeling, and operates entirely in 2D (requiring no camera or ground plane calibration). Our Markovian approach relies only on information from the past and is suitable for online applications. We evaluate the performance on a variety of datasets and show that it improves upon state-of-the-art methods. 1.
Multiple component learning for object detection
- In Proc. of ECCV
, 2008
"... Abstract. Object detection is one of the key problems in computer vision. In the last decade, discriminative learning approaches have proven effective in detecting rigid objects, achieving very low false positives rates. The field has also seen a resurgence of part-based recognition methods, with im ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Abstract. Object detection is one of the key problems in computer vision. In the last decade, discriminative learning approaches have proven effective in detecting rigid objects, achieving very low false positives rates. The field has also seen a resurgence of part-based recognition methods, with impressive results on highly articulated, diverse object categories. In this paper we propose a discriminative learning approach for detection that is inspired by part-based recognition approaches. Our method, Multiple Component Learning (mcl), automatically learns individual component classifiers and combines these into an overall classifier. Unlike previous methods, which rely on either fairly restricted part models or labeled part data, mcl learns powerful component classifiers in a weakly supervised manner, where object labels are provided but part labels are not. The basis of mcl lies in learning a set classifier; we achieve this by combining boosting with weakly supervised learning, specifically the Multiple Instance Learning framework (mil). mcl is general, and we demonstrate results on a range of data from computer audition and computer vision. In particular, mcl outperforms all existing methods on the challenging INRIA pedestrian detection dataset, and unlike methods that are not part-based, mcl is quite robust to occlusions. 1
P.: On detection of multiple object instances using hough transforms
, 2010
"... To detect multiple objects of interest, the methods based on Hough transform use non-maxima supression or mode seeking in order to locate and to distinguish peaks in Hough images. Such postprocessing requires tuning of extra parameters and is often fragile, especially when objects of interest tend t ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
To detect multiple objects of interest, the methods based on Hough transform use non-maxima supression or mode seeking in order to locate and to distinguish peaks in Hough images. Such postprocessing requires tuning of extra parameters and is often fragile, especially when objects of interest tend to be closely located. In the paper, we develop a new probabilistic framework that is in many ways related to Hough transform, sharing its simplicity and wide applicability. At the same time, the framework bypasses the problem of multiple peaks identification in Hough images, and permits detection of multiple objects without invoking nonmaximum suppression heuristics. As a result, the experiments demonstrate a significant improvement in detection accuracy both for the classical task of straight line detection and for a more modern category-level (pedestrian) detection problem. 1. Hough Transform in Object Detection The Hough transform [10] is one of the classical computer vision techniques which dates 50 years back. It was initially suggested as a method for line detection in edge maps of images but was then extended to detect general low-parametric objects such as circles [2]. In recent years, Hough-based methods were successful adapted to the problem of part-based category-level object detection where they have obtained state-of-the-art results for some popular datasets [12, 13, 7, 8, 15, 3]. Both the classical Hough transform and its more modern variants proceed by converting the input image into a new representation called the Hough image which lives in a domain called the Hough space (Figure 1). Each point in the Hough space corresponds to a hypothesis about the object of interest being present in the original image at a particular location and configuration. Any Hough transform based method essentially works by splitting the input image into a set of voting elements. Each such element votes for the hypotheses that might have generated this element. For instance, a feature that fires ∗ The first two authors were with Microsoft Research through the initial stages of the work and are currently supported by Microsoft Research programs in Russia. Victor Lempitsky is also supported by EU under ERC
Object Recognition as Ranking Holistic Figure-Ground Hypotheses
- In CVPR, 2010. 7
"... We present an approach to visual object-class recognition and segmentation based on a pipeline that combines multiple, holistic figure-ground hypotheses generated in a bottom-up, object independent process. Decisions are performed based on continuous estimates of the spatial overlap between image se ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
We present an approach to visual object-class recognition and segmentation based on a pipeline that combines multiple, holistic figure-ground hypotheses generated in a bottom-up, object independent process. Decisions are performed based on continuous estimates of the spatial overlap between image segment hypotheses and each putative class. We differ from existing approaches not only in our seemingly unreasonable assumption that good object-level segments can be obtained in a feed-forward fashion, but also in framing recognition as a regression problem. Instead of focusing on a one-vs-all winning margin that can scramble ordering inside the non-maximum (non-winning) set, learning produces a globally consistent ranking with close ties to segment quality, hence to the extent entire object or part hypotheses spatially overlap with the ground truth. We demonstrate results beyond the current state of the art for image classification, object detection and semantic segmentation, in a number of challenging datasets including Caltech-101, ETHZ-Shape and PASCAL VOC 2009. 1.

