Results 1 - 10
of
59
Ensemble of Exemplar-SVMs for Object Detection and Beyond
"... This paper proposes a conceptually simple but surprisingly powerful method which combines the effectiveness of a discriminative object detector with the explicit correspondence offered by a nearest-neighbor approach. The method is based on training a separate linear SVM classifier for every exemplar ..."
Abstract
-
Cited by 164 (10 self)
- Add to MetaCart
(Show Context)
This paper proposes a conceptually simple but surprisingly powerful method which combines the effectiveness of a discriminative object detector with the explicit correspondence offered by a nearest-neighbor approach. The method is based on training a separate linear SVM classifier for every exemplar in the training set. Each of these Exemplar-SVMs is thus defined by a single positive instance and millions of negatives. While each detector is quite specific to its exemplar, we empirically observe that an ensemble of such Exemplar-SVMs offers surprisingly good generalization. Our performance on the PASCAL VOC detection task is on par with the much more complex latent part-based model of Felzenszwalb et al., at only a modest computational cost increase. But the central benefit of our approach is that it creates an explicit association between each detection and a single training exemplar. Because most detections show good alignment to their associated exemplar, it is possible to transfer any available exemplar meta-data (segmentation, geometric structure, 3D model, etc.) directly onto the detections, which can then be used as part of overall scene understanding. 1.
Recognition using Regions
"... This paper presents a unified framework for object detection, segmentation, and classification using regions. Region features are appealing in this context because: (1) they encode shape and scale information of objects naturally; (2) they are only mildly affected by background clutter. Regions have ..."
Abstract
-
Cited by 106 (5 self)
- Add to MetaCart
(Show Context)
This paper presents a unified framework for object detection, segmentation, and classification using regions. Region features are appealing in this context because: (1) they encode shape and scale information of objects naturally; (2) they are only mildly affected by background clutter. Regions have not been popular as features due to their sensitivity to segmentation errors. In this paper, we start by producing a robust bag of overlaid regions for each image using Arbeláez et al., CVPR 2009. Each region is represented by a rich set of image cues (shape, color and texture). We then learn region weights using a max-margin framework. In detection and segmentation, we apply a generalized Hough voting scheme to generate hypotheses of object locations, scales and support, followed by a verification classifier and a constrained segmenter on each hypothesis. The proposed approach significantly outperforms the state of the art on the ETHZ shape database (87.1 % average detection rate compared to Ferrari et al.’s 67.2%), and achieves competitive performance on the Caltech 101 database.
How important are ‘deformable parts’ in the deformable parts model
- In ECCV Workshop on Parts and Attributes
, 2012
"... Abstract. The Deformable Parts Model (DPM) has recently emerged as a very useful and popular tool for tackling the intra-category diversity problem in object detection. In this paper, we summarize the key insights from our empirical analysis of the important elements constituting this detector. More ..."
Abstract
-
Cited by 41 (4 self)
- Add to MetaCart
(Show Context)
Abstract. The Deformable Parts Model (DPM) has recently emerged as a very useful and popular tool for tackling the intra-category diversity problem in object detection. In this paper, we summarize the key insights from our empirical analysis of the important elements constituting this detector. More specifically, we study the relationship between the role of deformable parts and the mixture model components within this detector, and understand their relative importance. First, we find that by increasing the number of components, and switching the initialization step from their aspect-ratio, left-right flipping heuristics to appearancebased clustering, considerable improvement in performance is obtained. But more intriguingly, we observed that with these new components, the part deformations can now be turned off, yet obtaining results that are almost on par with the original DPM detector.
Teaching 3d geometry to deformable part models
- In Proc. CVPR
, 2012
"... Current object class recognition systems typically target 2D bounding box localization, encouraged by benchmark data sets, such as Pascal VOC. While this seems suitable for the detection of individual objects, higher-level applica-tions such as 3D scene understanding or 3D object tracking would bene ..."
Abstract
-
Cited by 40 (7 self)
- Add to MetaCart
Current object class recognition systems typically target 2D bounding box localization, encouraged by benchmark data sets, such as Pascal VOC. While this seems suitable for the detection of individual objects, higher-level applica-tions such as 3D scene understanding or 3D object tracking would benefit from more fine-grained object hypotheses in-corporating 3D geometric information, such as viewpoints or the locations of individual parts. In this paper, we help narrowing the representational gap between the ideal in-put of a scene understanding system and object class detec-tor output, by designing a detector particularly tailored to-wards 3D geometric reasoning. In particular, we extend the successful discriminatively trained deformable part models to include both estimates of viewpoint and 3D parts that are consistent across viewpoints. We experimentally verify that adding 3D geometric information comes at minimal perfor-mance loss w.r.t. 2D bounding box localization, but outper-forms prior work in 3D viewpoint estimation and ultra-wide baseline matching. 1.
Viewpoint-Aware Object Detection and Pose Estimation
"... We describe an approach to category-level detection and viewpoint estimation for rigid 3D objects from single 2D images. In contrast to many existing methods, we directly integrate 3D reasoning with an appearance-based voting architecture. Our method relies on a nonparametric representation of a joi ..."
Abstract
-
Cited by 37 (0 self)
- Add to MetaCart
(Show Context)
We describe an approach to category-level detection and viewpoint estimation for rigid 3D objects from single 2D images. In contrast to many existing methods, we directly integrate 3D reasoning with an appearance-based voting architecture. Our method relies on a nonparametric representation of a joint distribution of shape and appearance of the object class. Our voting method employs a novel parametrization of joint detection and viewpoint hypothesis space, allowing efficient accumulation of evidence. We combine this with a re-scoring and refinement mechanism, using an ensemble of view-specific Support Vector Machines. We evaluate the performance of our approach in detection and pose estimation of cars on a number of benchmark datasets. 1.
Learning part-based templates from large collections of 3d shapes
- Trans. on Graphics (Proc. of SIGGRAPH
, 2013
"... As large repositories of 3D shape collections continue to grow, understanding the data, especially encoding the inter-model similarity and their variations, is of central importance. For example, many data-driven approaches now rely on access to semantic segmentation information, accurate inter-mode ..."
Abstract
-
Cited by 33 (19 self)
- Add to MetaCart
As large repositories of 3D shape collections continue to grow, understanding the data, especially encoding the inter-model similarity and their variations, is of central importance. For example, many data-driven approaches now rely on access to semantic segmentation information, accurate inter-model point-to-point correspondence, and deformation models that characterize the model collections. Existing approaches, however, are either supervised requiring manual labeling; or employ super-linear matching algorithms and thus are unsuited for analyzing large collections spanning many thousands of models. We propose an automatic algorithm that starts with an initial template model and then jointly optimizes for part segmentation, point-to-point surface correspondence, and a compact deformation model to best explain the input model collection. As output, the algorithm produces a set of probabilistic part-based templates that groups the original models into clusters of models capturing their styles and variations. We evaluate our algorithm on several standard datasets and demonstrate its scalability by analyzing much larger collections of up to thousands of shapes.
Todorovic.: From Contours to 3D Object Detection and Pose Estimation
- IEEE International Conference on Computer Vision
, 2011
"... This paper addresses view-invariant object detection and pose estimation from a single image. While recent work focuses on object-centered representations of point-based object features, we revisit the viewer-centered framework, and use image contours as basic features. Given training examples of ar ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
(Show Context)
This paper addresses view-invariant object detection and pose estimation from a single image. While recent work focuses on object-centered representations of point-based object features, we revisit the viewer-centered framework, and use image contours as basic features. Given training examples of arbitrary views of an object, we learn a sparse object model in terms of a few view-dependent shape templates. The shape templates are jointly used for detecting object occurrences and estimating their 3D poses in a new image. Instrumental to this is our new mid-level feature, called bag of boundaries (BOB), aimed at lifting from individual edges toward their more informative summaries for identifying object boundaries amidst the background clutter. In inference, BOBs are placed on deformable grids both in the image and the shape templates, and then matched. This is formulated as a convex optimization problem that accommodates invariance to non-rigid, locally affine shape deformations. Evaluation on benchmark datasets demonstrates our competitive results relative to the state of the art. 1.
Estimating the aspect layout of object categories
- In CVPR
, 2012
"... In this work we seek to move away from the traditional paradigm for 2D object recognition whereby objects are identified in the image as 2D bounding boxes. We focus instead on: i) detecting objects; ii) identifying their 3D poses; iii) characterizing the geometrical and topological properties of the ..."
Abstract
-
Cited by 30 (8 self)
- Add to MetaCart
In this work we seek to move away from the traditional paradigm for 2D object recognition whereby objects are identified in the image as 2D bounding boxes. We focus instead on: i) detecting objects; ii) identifying their 3D poses; iii) characterizing the geometrical and topological properties of the objects in terms of their aspect configurations in 3D. We call such characterization an object’s aspect layout (see Fig. 1). We propose a new model for solving these problems in a joint fashion from a single image for object categories. Our model is constructed upon a novel framework based on conditional random fields with maximal margin parameter estimation. Extensive experiments are conducted to evaluate our model’s performance in determining object pose and layout from images. We achieve superior viewpoint accuracy results on three public datasets and show extensive quantitative analysis to demonstrate the ability of accurately recovering the aspect layout of objects. 1.
3d2pm - 3d deformable part models
- In ECCV
, 2012
"... Abstract. As objects are inherently 3-dimensional, they have been mod-eled in 3D in the early days of computer vision. Due to the ambiguities arising from mapping 2D features to 3D models, 2D feature-based models are the predominant paradigm in object recognition today. While such models have shown ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
(Show Context)
Abstract. As objects are inherently 3-dimensional, they have been mod-eled in 3D in the early days of computer vision. Due to the ambiguities arising from mapping 2D features to 3D models, 2D feature-based models are the predominant paradigm in object recognition today. While such models have shown competitive bounding box (BB) detection perfor-mance, they are clearly limited in their capability of fine-grained reason-ing in 3D or continuous viewpoint estimation as required for advanced tasks such as 3D scene understanding. This work extends the deformable part model [1] to a 3D object model. It consists of multiple parts mod-eled in 3D and a continuous appearance model. As a result, the model generalizes beyond BB oriented object detection and can be jointly op-timized in a discriminative fashion for object detection and viewpoint estimation. Our 3D Deformable Part Model (3D2PM) leverages on CAD data of the object class, as a 3D geometry proxy. 1
3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model
"... This paper addresses the problem of category-level 3D object detection. Given a monocular image, our aim is to localize the objects in 3D by enclosing them with tight oriented 3D bounding boxes. We propose a novel approach that extends the well-acclaimed deformable part-based model [1] to reason in ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
(Show Context)
This paper addresses the problem of category-level 3D object detection. Given a monocular image, our aim is to localize the objects in 3D by enclosing them with tight oriented 3D bounding boxes. We propose a novel approach that extends the well-acclaimed deformable part-based model [1] to reason in 3D. Our model represents an object class as a deformable 3D cuboid composed of faces and parts, which are both allowed to deform with respect to their anchors on the 3D box. We model the appearance of each face in fronto-parallel coordinates, thus effectively factoring out the appearance variation induced by viewpoint. Our model reasons about face visibility patters called aspects. We train the cuboid model jointly and discriminatively and share weights across all aspects to attain efficiency. Inference then entails sliding and rotating the box in 3D and scoring object hypotheses. While for inference we discretize the search space, the variables are continuous in our model. We demonstrate the effectiveness of our approach in indoor and outdoor scenarios, and show that our approach significantly outperforms the stateof-the-art in both 2D [1] and 3D object detection [2]. 1