Results 1 - 10
of
43
A benchmark for the evaluation of RGBD SLAM systems
- In International Conference on Intelligent Robot Systems (IROS
, 2012
"... Abstract — In this paper, we present a novel benchmark for the evaluation of RGB-D SLAM systems. We recorded a large set of image sequences from a Microsoft Kinect with highly accurate and time-synchronized ground truth camera poses from a motion capture system. The sequences contain both the color ..."
Abstract
-
Cited by 70 (11 self)
- Add to MetaCart
(Show Context)
Abstract — In this paper, we present a novel benchmark for the evaluation of RGB-D SLAM systems. We recorded a large set of image sequences from a Microsoft Kinect with highly accurate and time-synchronized ground truth camera poses from a motion capture system. The sequences contain both the color and depth images in full sensor resolution (640 × 480) at video frame rate (30 Hz). The ground-truth trajectory was obtained from a motion-capture system with eight high-speed tracking cameras (100 Hz). The dataset consists of 39 sequences that were recorded in an office environment and an industrial hall. The dataset covers a large variety of scenes and camera motions. We provide sequences for debugging with slow motions as well as longer trajectories with and without loop closures. Most sequences were recorded from a handheld Kinect with unconstrained 6-DOF motions but we also provide sequences from a Kinect mounted on a Pioneer 3 robot that was manually navigated through a cluttered indoor environment. To stimulate the comparison of different approaches, we provide automatic evaluation tools both for the evaluation of drift of visual odometry systems and the global pose error of SLAM systems. The benchmark website [1] contains all data, detailed descriptions of the scenes, specifications of the data formats, sample code, and evaluation tools. I.
Teaching 3d geometry to deformable part models
- In Proc. CVPR
, 2012
"... Current object class recognition systems typically target 2D bounding box localization, encouraged by benchmark data sets, such as Pascal VOC. While this seems suitable for the detection of individual objects, higher-level applica-tions such as 3D scene understanding or 3D object tracking would bene ..."
Abstract
-
Cited by 40 (7 self)
- Add to MetaCart
(Show Context)
Current object class recognition systems typically target 2D bounding box localization, encouraged by benchmark data sets, such as Pascal VOC. While this seems suitable for the detection of individual objects, higher-level applica-tions such as 3D scene understanding or 3D object tracking would benefit from more fine-grained object hypotheses in-corporating 3D geometric information, such as viewpoints or the locations of individual parts. In this paper, we help narrowing the representational gap between the ideal in-put of a scene understanding system and object class detec-tor output, by designing a detector particularly tailored to-wards 3D geometric reasoning. In particular, we extend the successful discriminatively trained deformable part models to include both estimates of viewpoint and 3D parts that are consistent across viewpoints. We experimentally verify that adding 3D geometric information comes at minimal perfor-mance loss w.r.t. 2D bounding box localization, but outper-forms prior work in 3D viewpoint estimation and ultra-wide baseline matching. 1.
Estimating the aspect layout of object categories
- In CVPR
, 2012
"... In this work we seek to move away from the traditional paradigm for 2D object recognition whereby objects are identified in the image as 2D bounding boxes. We focus instead on: i) detecting objects; ii) identifying their 3D poses; iii) characterizing the geometrical and topological properties of the ..."
Abstract
-
Cited by 30 (8 self)
- Add to MetaCart
In this work we seek to move away from the traditional paradigm for 2D object recognition whereby objects are identified in the image as 2D bounding boxes. We focus instead on: i) detecting objects; ii) identifying their 3D poses; iii) characterizing the geometrical and topological properties of the objects in terms of their aspect configurations in 3D. We call such characterization an object’s aspect layout (see Fig. 1). We propose a new model for solving these problems in a joint fashion from a single image for object categories. Our model is constructed upon a novel framework based on conditional random fields with maximal margin parameter estimation. Extensive experiments are conducted to evaluate our model’s performance in determining object pose and layout from images. We achieve superior viewpoint accuracy results on three public datasets and show extensive quantitative analysis to demonstrate the ability of accurately recovering the aspect layout of objects. 1.
3d2pm - 3d deformable part models
- In ECCV
, 2012
"... Abstract. As objects are inherently 3-dimensional, they have been mod-eled in 3D in the early days of computer vision. Due to the ambiguities arising from mapping 2D features to 3D models, 2D feature-based models are the predominant paradigm in object recognition today. While such models have shown ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
(Show Context)
Abstract. As objects are inherently 3-dimensional, they have been mod-eled in 3D in the early days of computer vision. Due to the ambiguities arising from mapping 2D features to 3D models, 2D feature-based models are the predominant paradigm in object recognition today. While such models have shown competitive bounding box (BB) detection perfor-mance, they are clearly limited in their capability of fine-grained reason-ing in 3D or continuous viewpoint estimation as required for advanced tasks such as 3D scene understanding. This work extends the deformable part model [1] to a 3D object model. It consists of multiple parts mod-eled in 3D and a continuous appearance model. As a result, the model generalizes beyond BB oriented object detection and can be jointly op-timized in a discriminative fashion for object detection and viewpoint estimation. Our 3D Deformable Part Model (3D2PM) leverages on CAD data of the object class, as a 3D geometry proxy. 1
Understanding indoor scenes using 3d geometric phrases
- In CVPR
, 2013
"... Visual scene understanding is a difficult problem inter-leaving object detection, geometric reasoning and scene classification. We present a hierarchical scene model for learning and reasoning about complex indoor scenes which is computationally tractable, can be learned from a reason-able amount of ..."
Abstract
-
Cited by 21 (5 self)
- Add to MetaCart
(Show Context)
Visual scene understanding is a difficult problem inter-leaving object detection, geometric reasoning and scene classification. We present a hierarchical scene model for learning and reasoning about complex indoor scenes which is computationally tractable, can be learned from a reason-able amount of training data, and avoids oversimplification. At the core of this approach is the 3D Geometric Phrase Model which captures the semantic and geometric relation-ships between objects which frequently co-occur in the same 3D spatial configuration. Experiments show that this model effectively explains scene semantics, geometry and object groupings from a single image, while also improving indi-vidual object detections. 1.
Efficient structured prediction for 3d indoor scene understanding
, 2012
"... Existing approaches to indoor scene understanding formulate the problem as a structured prediction task focusing on estimating the 3D bounding box which best describes the scene layout. Unfortunately, these approaches utilize high order potentials which are computationally intractable and rely on ad ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
(Show Context)
Existing approaches to indoor scene understanding formulate the problem as a structured prediction task focusing on estimating the 3D bounding box which best describes the scene layout. Unfortunately, these approaches utilize high order potentials which are computationally intractable and rely on ad-hoc approximations for both learning and inference. In this paper we show that the potentials commonly used in the literature can be decomposed into pairwise potentials by extending the concept of integral images to geometry. As a consequence no heuristic reduction of the search space is required. In practice, this results in large improvements in performance over the state-of-theart, while being orders of magnitude faster. 1.
A.: SUN3D: A database of big spaces reconstructed using sfm and object labels
- In: ICCV. (2013
"... Existing scene understanding datasets contain only a limited set of views of a place, and they lack representations of complete 3D spaces. In this paper, we introduce SUN3D, a large-scale RGB-D video database with camera pose and object labels, capturing the full 3D extent of many places. The tasks ..."
Abstract
-
Cited by 17 (8 self)
- Add to MetaCart
(Show Context)
Existing scene understanding datasets contain only a limited set of views of a place, and they lack representations of complete 3D spaces. In this paper, we introduce SUN3D, a large-scale RGB-D video database with camera pose and object labels, capturing the full 3D extent of many places. The tasks that go into constructing such a dataset are diffi-cult in isolation – hand-labeling videos is painstaking, and structure from motion (SfM) is unreliable for large spaces. But if we combine them together, we make the dataset con-struction task much easier. First, we introduce an intuitive labeling tool that uses a partial reconstruction to propa-gate labels from one frame to another. Then we use the object labels to fix errors in the reconstruction. For this, we introduce a generalization of bundle adjustment that incor-porates object-to-object correspondences. This algorithm works by constraining points for the same object from dif-ferent frames to lie inside a fixed-size bounding box, pa-rameterized by its rotation and translation. The SUN3D database, the source code for the generalized bundle adjust-ment, and the web-based 3D annotation tool are all avail-able at
Fine-Grained Categorization for 3D Scene Understanding
- In Proceedings of the 23rd British Machine Vision Conference (BMVC), Surrey, UK, 2012
"... Fine-grained categorization of object classes is receiving increased attention, since it promises to automate classification tasks that are difficult even for humans, such as the distinction between different animal species. In this paper, we consider fine-grained categorization for a different reas ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
(Show Context)
Fine-grained categorization of object classes is receiving increased attention, since it promises to automate classification tasks that are difficult even for humans, such as the distinction between different animal species. In this paper, we consider fine-grained categorization for a different reason: following the intuition that fine-grained categories encode metric information, we aim to generate metric constraints from fine-grained cate-gory predictions, for the benefit of 3D scene-understanding. To that end, we propose two novel methods for fine-grained classification, both based on part information, as well as a new fine-grained category data set of car types. We demonstrate superior performance of our methods to state-of-the-art classifiers, and show first promising results for estimating the depth of objects from fine-grained category predictions from a monocular camera. 1
3D Scene Understanding by Voxel-CRF
"... Scene understanding is an important yet very challenging problem in computer vision. In the past few years, researchers have taken advantage of the recent diffusion of depth-RGB (RGB-D) cameras to help simplify the problem of inferring scene semantics. However, while the added 3D geometry is certain ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
(Show Context)
Scene understanding is an important yet very challenging problem in computer vision. In the past few years, researchers have taken advantage of the recent diffusion of depth-RGB (RGB-D) cameras to help simplify the problem of inferring scene semantics. However, while the added 3D geometry is certainly useful to segment out objects with different depth values, it also adds complications in that the 3D geometry is often incorrect because of noisy depth measurements and the actual 3D extent of the objects is usually unknown because of occlusions. In this paper we propose a new method that allows us to jointly refine the 3D reconstruction of the scene (raw depth values) while accurately segmenting out the objects or scene elements from the 3D reconstruction. This is achieved by introducing a new model which we called Voxel-CRF. The Voxel-CRF model is based on the idea of constructing a conditional random field over a 3D volume of interest which captures the semantic and 3D geometric relationships among different elements (voxels) of the scene. Such model allows to jointly estimate (1) a dense voxel-based 3D reconstruction and (2) the semantic labels associated with each voxel even in presence of partial occlusions using an approximate yet efficient inference strategy. We evaluated our method on the challenging NYU Depth dataset (Version 1 and 2). Experimental results show that our method achieves competitive accuracy in inferring scene semantics and visually appealing results in improving the quality of the 3D reconstruction. We also demonstrate an interesting application of object removal and scene completion from RGB-D images. 1.
I.: Dense reconstruction using 3d object shape priors
- In: CVPR, IEEE
"... We propose a formulation of monocular SLAM which combines live dense reconstruction with shape priors-based 3D tracking and reconstruction. Current live dense SLAM approaches are limited to the reconstruction of visible sur-faces. Moreover, most of them are based on the minimi-sation of a photo-cons ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
(Show Context)
We propose a formulation of monocular SLAM which combines live dense reconstruction with shape priors-based 3D tracking and reconstruction. Current live dense SLAM approaches are limited to the reconstruction of visible sur-faces. Moreover, most of them are based on the minimi-sation of a photo-consistency error, which usually makes them sensitive to specularities. In the 3D pose recovery lit-erature, problems caused by imperfect and ambiguous im-age information have been dealt with by using prior shape knowledge. At the same time, the success of depth sen-sors has shown that combining joint image and depth infor-mation drastically increases the robustness of the classical monocular 3D tracking and 3D reconstruction approaches. In this work we link dense SLAM to 3D object pose and shape recovery. More specifically, we automatically aug-ment our SLAM system with object specific identity, together with 6D pose and additional shape degrees of freedom for the object(s) of known class in the scene, combining im-age data and depth information for the pose and shape re-covery. This leads to a system that allows for full scaled 3D reconstruction with the known object(s) segmented from the scene. The segmentation enhances the clarity, accuracy and completeness of the maps built by the dense SLAM sys-tem, while the dense 3D data aids the segmentation process, yielding faster and more reliable convergence than when us-ing 2D image data alone. 1.