Results 1 - 10
of
24
Modeling the World from Internet Photo Collections
- INT J COMPUT VIS
, 2007
"... There are billions of photographs on the Internet, comprising the largest and most diverse photo collection ever assembled. How can computer vision researchers exploit this imagery? This paper explores this question from the standpoint of 3D scene modeling and visualization. We present structure-fro ..."
Abstract
-
Cited by 45 (1 self)
- Add to MetaCart
There are billions of photographs on the Internet, comprising the largest and most diverse photo collection ever assembled. How can computer vision researchers exploit this imagery? This paper explores this question from the standpoint of 3D scene modeling and visualization. We present structure-from-motion and image-based rendering algorithms that operate on hundreds of images downloaded as a result of keyword-based image search queries like “Notre Dame ” or “Trevi Fountain.” This approach, which we call Photo Tourism, has enabled reconstructions of numerous well-known world sites. This paper presents these algorithms and results as a first step towards 3D modeling of the world’s well-photographed sites, cities, and landscapes from Internet imagery, and discusses key open problems and challenges for the research community.
A general solution to the P4P problem for camera with unknown focal length
, 2008
"... This paper presents a general solution to the determination of the pose of a perspective camera with unknown focal length from images of four 3D reference points. Our problem is a generalization of the P3P and P4P problems previously developed for fully calibrated cameras. Given four 2D-to-3D corres ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
This paper presents a general solution to the determination of the pose of a perspective camera with unknown focal length from images of four 3D reference points. Our problem is a generalization of the P3P and P4P problems previously developed for fully calibrated cameras. Given four 2D-to-3D correspondences, we estimate camera position, orientation and recover the camera focal length. We formulate the problem and provide a minimal solution from four points by solving a system of algebraic equations. We compare the Hidden variable resultant and Gröbner basis techniques for solving the algebraic equations of our problem. By evaluating them on synthetic and on real-data, we show that the Gröbner basis technique provides stable results.
Rgbd mapping: Using depth cameras for dense 3d modeling of indoor environments
- In RGB-D: Advanced Reasoning with Depth Cameras Workshop in conjunction with RSS
, 2010
"... Abstract RGB-D cameras are novel sensing systems that capture RGB images along with per-pixel depth information. In this paper we investigate how such cameras can be used in the context of robotics, specifically for building dense 3D maps of indoor environments. Such maps have applications in robot ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
Abstract RGB-D cameras are novel sensing systems that capture RGB images along with per-pixel depth information. In this paper we investigate how such cameras can be used in the context of robotics, specifically for building dense 3D maps of indoor environments. Such maps have applications in robot navigation, manipulation, semantic mapping, and telepresence. We present RGB-D Mapping, a full 3D mapping system that utilizes a novel joint optimization algorithm combining visual features and shape-based alignment. Visual and depth information are also combined for view-based loop closure detection, followed by pose optimization to achieve globally consistent maps. We evaluate RGB-D Mapping on two large indoor environments, and show that it effectively combines the visual and shape information available from RGB-D cameras. 1
Out-of-Core Bundle Adjustment for Large-Scale 3D Reconstruction
"... Large-scale 3D reconstruction has recently received much attention from the computer vision community. Bundle adjustment is a key component of 3D reconstruction problems. However, traditional bundle adjustment algorithms require a considerable amount of memory and computational resources. In this pa ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Large-scale 3D reconstruction has recently received much attention from the computer vision community. Bundle adjustment is a key component of 3D reconstruction problems. However, traditional bundle adjustment algorithms require a considerable amount of memory and computational resources. In this paper, we present an extremely efficient, inherently out-of-core bundle adjustment algorithm. We decouple the original problem into several submaps that have their own local coordinate systems and can be optimized in parallel. A key contribution to our algorithm is making as much progress towards optimizing the global non-linear cost function as possible using the fragments of the reconstruction that are currently in core memory. This allows us to converge with very few global sweeps (often only two) through the entire reconstruction. We present experimental results on large-scale 3D reconstruction datasets, both synthetic and real. 1.
Stereoscan: Dense 3d reconstruction in real-time
- in IEEE Intelligent Vehicles Symposium, 2011
"... Abstract — Accurate 3d perception from video sequences is a core subject in computer vision and robotics, since it forms the basis of subsequent scene analysis. In practice however, online requirements often severely limit the utilizable camera resolution and hence also reconstruction accuracy. Furt ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Abstract — Accurate 3d perception from video sequences is a core subject in computer vision and robotics, since it forms the basis of subsequent scene analysis. In practice however, online requirements often severely limit the utilizable camera resolution and hence also reconstruction accuracy. Furthermore, real-time systems often rely on heavy parallelism which can prevent applications in mobile devices or driver assistance systems, especially in cases where FPGAs cannot be employed. This paper proposes a novel approach to build 3d maps from high-resolution stereo sequences in real-time. Inspired by recent progress in stereo matching, we propose a sparse feature matcher in conjunction with an efficient and robust visual odometry algorithm. Our reconstruction pipeline combines both techniques with efficient stereo matching and a multi-view linking scheme for generating consistent 3d point clouds. In our experiments we show that the proposed odometry method achieves state-of-the-art accuracy. Including feature matching, the visual odometry part of our algorithm runs at 25 frames per second, while – at the same time – we obtain new depth maps at 3-4 fps, sufficient for online 3d reconstructions. I.
Gain adaptive real-time stereo streaming
- In Int. Conf. on Vision Systems
, 2007
"... Abstract. This paper introduces a multi-view stereo matcher that generates depth in real-time from a monocular video stream of a static scene. A key feature of our processing pipeline is that it estimates global camera gain changes in the feature tracking stage and efficiently compensates for these ..."
Abstract
-
Cited by 7 (7 self)
- Add to MetaCart
Abstract. This paper introduces a multi-view stereo matcher that generates depth in real-time from a monocular video stream of a static scene. A key feature of our processing pipeline is that it estimates global camera gain changes in the feature tracking stage and efficiently compensates for these in the stereo stage without impacting the real-time performance. This is very important for outdoor applications where the brightness range often far exceeds the dynamic range of the camera. Real-time performance is achieved by leveraging the processing power of the graphics processing unit (GPU) in addition to the CPU. We demonstrate the effectiveness of our approach on videos of urban scenes recorded by a vehicle-mounted camera with auto-gain enabled. 1
Measuring camera translation by the dominant apical angle
- IEEE Conference on Computer Vision and Pattern Recognition, CVPR
, 2008
"... This paper provides a technique for measuring camera translation relatively w.r.t. the scene from two images. We demonstrate that the amount of the translation can be reliably measured for general as well as planar scenes by the most frequent apical angle, the angle under which the camera centers ar ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
This paper provides a technique for measuring camera translation relatively w.r.t. the scene from two images. We demonstrate that the amount of the translation can be reliably measured for general as well as planar scenes by the most frequent apical angle, the angle under which the camera centers are seen from the perspective of the reconstructed scene points. Simulated experiments show that the dominant apical angle is a linear function of the length of the true camera translation. In a real experiment, we demonstrate that by skipping image pairs with too small motion, we can reliably initialize structure from motion, compute accurate camera trajectory in order to rectify images and use the ground plane constraint in recognition of pedestrians in a hand-held video sequence. 1.
3-D Reconstruction from Sparse Views using Monocular Vision
"... We consider the task of creating a 3-d model of a large novel environment, given only a small number of images of the scene. This is a difficult problem, because if the images are taken from very different viewpoints or if they contain similar-looking structures, then most geometric reconstruction m ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We consider the task of creating a 3-d model of a large novel environment, given only a small number of images of the scene. This is a difficult problem, because if the images are taken from very different viewpoints or if they contain similar-looking structures, then most geometric reconstruction methods will have great difficulty finding good correspondences. Further, the reconstructions given by most algorithms include only points in 3-d that were observed in two or more images; a point observed only in a single image would not be reconstructed. In this paper, we show how monocular image cues can be combined with triangulation cues to build a photo-realistic model of a scene given only a few images—even ones taken from very different viewpoints or with little overlap. Our approach begins by oversegmenting each image into small patches (superpixels). It then simultaneously tries to infer the 3-d position and orientation of every superpixel in every image. This is done using a Markov Random Field (MRF) which simultaneously reasons about monocular cues and about the relations between multiple image patches, both within the same image and across different images (triangulation cues). MAP inference in our model is efficiently approximated using a series of linear programs, and our algorithm scales well to a large number of images. 1.
Generalized detection and merging of loop closures for video sequences
- In 3DPVT
, 2008
"... In this work we present a method to detect overlaps in image sequences, and use this information to integrate overlapping sparse 3D structure from video sequences. The additional temporal information of these images is used to increase robustness over single image pair matching. A scanline optimizat ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this work we present a method to detect overlaps in image sequences, and use this information to integrate overlapping sparse 3D structure from video sequences. The additional temporal information of these images is used to increase robustness over single image pair matching. A scanline optimization problem formulation is used to compute the best sequence alignment using wide-baseline image matching techniques. Compared to a direct dynamic programming approach, the scanline optimization formulation increases the robustness of sequence alignment for general relative motions. The proposed alignment method is employed to integrate sparse 3D models reconstructed from separate video sequences. In addition loop closures are detected. Consequently, the 3D modeling process from sequential image data can be split into fast sequence processing and subsequent global integration steps. 1.
Toward automatic 3d modeling of scenes using a generic camera model
- In: CVPR’08
"... maxime.lhuillier.free.fr The automatic reconstruction of 3D models from image sequences is still a very active field of research. All existing methods are designed for a given camera model, and a new (and ambitious) challenge is 3D modeling with a method which is exploitable for any kind of camera. ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
maxime.lhuillier.free.fr The automatic reconstruction of 3D models from image sequences is still a very active field of research. All existing methods are designed for a given camera model, and a new (and ambitious) challenge is 3D modeling with a method which is exploitable for any kind of camera. A similar approach was recently suggested for structure-frommotion thanks to the use of generic camera models. In this paper, we first introduce geometric tools designed for 3D scene modeling with a generic camera model. Then, these tools are used to solve many issues: matching errors, wide range of point depths, depth discontinuities, and view-point selection for reconstruction. Experiments are provided for perspective and catadioptric cameras. 1.

