Results 1 - 10
of
439
MonoSLAM: Realtime single camera SLAM
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2007
"... Abstract—We present a real-time algorithm which can recover the 3D trajectory of a monocular camera, moving rapidly through a previously unknown scene. Our system, which we dub MonoSLAM, is the first successful application of the SLAM methodology from mobile robotics to the “pure vision ” domain of ..."
Abstract
-
Cited by 490 (26 self)
- Add to MetaCart
(Show Context)
Abstract—We present a real-time algorithm which can recover the 3D trajectory of a monocular camera, moving rapidly through a previously unknown scene. Our system, which we dub MonoSLAM, is the first successful application of the SLAM methodology from mobile robotics to the “pure vision ” domain of a single uncontrolled camera, achieving real time but drift-free performance inaccessible to Structure from Motion approaches. The core of the approach is the online creation of a sparse but persistent map of natural landmarks within a probabilistic framework. Our key novel contributions include an active approach to mapping and measurement, the use of a general motion model for smooth camera movement, and solutions for monocular feature initialization and feature orientation estimation. Together, these add up to an extremely efficient and robust algorithm which runs at 30 Hz with standard PC and camera hardware. This work extends the range of robotic systems in which SLAM can be usefully applied, but also opens up new areas. We present applications of MonoSLAM to real-time 3D localization and mapping for a high-performance full-size humanoid robot and live augmented reality with a hand-held camera. Index Terms—Autonomous vehicles, 3D/stereo scene analysis, tracking. Ç 1
KinectFusion: Real-Time Dense Surface Mapping and Tracking
"... We present a system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware. We fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface mo ..."
Abstract
-
Cited by 280 (25 self)
- Add to MetaCart
We present a system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware. We fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real-time. The current sensor pose is simultaneously obtained by tracking the live depth frame relative to the global model using a coarse-to-fine iterative closest point (ICP) algorithm, which uses all of the observed depth data available. We demonstrate the advantages of tracking against the growing full surface model compared with frame-to-frame tracking, obtaining tracking and mapping results
Fusing Points and Lines for High Performance Tracking
- IN INTERNATIONAL CONFERENCE ON COMPUTER VISION
, 2005
"... This paper addresses the problem of real-time 3D modelbased tracking by combining point-based and edge-based tracking systems. We present a careful analysis of the properties of these two sensor systems and show that this leads to some non-trivial design choices that collectively yield extremely hig ..."
Abstract
-
Cited by 151 (5 self)
- Add to MetaCart
This paper addresses the problem of real-time 3D modelbased tracking by combining point-based and edge-based tracking systems. We present a careful analysis of the properties of these two sensor systems and show that this leads to some non-trivial design choices that collectively yield extremely high performance. In particular, we present a method for integrating the two systems and robustly combining the pose estimates they produce. Further we show how on-line learning can be used to improve the performance of feature tracking. Finally, to aid real-time performance, we introduce the FAST feature detector which can perform full-frame feature detection at 400Hz. The combination of these techniques results in a system which is capable of tracking average prediction errors of 200 pixels. This level of robustness allows us to track very rapid motions, such as 50° camera shake at 6Hz.
FrameSLAM: From bundle adjustment to real-time visual mapping
- IEEE Trans. on Robotics
, 2008
"... Abstract—Many successful indoor mapping techniques employ frame-to-frame matching of laser scans to produce detailed local maps as well as the closing of large loops. In this paper, we propose a framework for applying the same techniques to visual imagery. We match visual frames with large numbers o ..."
Abstract
-
Cited by 147 (5 self)
- Add to MetaCart
(Show Context)
Abstract—Many successful indoor mapping techniques employ frame-to-frame matching of laser scans to produce detailed local maps as well as the closing of large loops. In this paper, we propose a framework for applying the same techniques to visual imagery. We match visual frames with large numbers of point features, using classic bundle adjustment techniques from computational vision, but we keep only relative frame pose information (a skeleton). The skeleton is a reduced nonlinear system that is a faithful approx-imation of the larger system and can be used to solve large loop closures quickly, as well as forming a backbone for data association and local registration. We illustrate the workings of the system with large outdoor datasets (10 km), showing large-scale loop closure and precise localization in real time. Index Terms—Visual mapping, visual odometry, visual SLAM. I.
Real-time markerless tracking for augmented reality: the virtual visual servoing framework
- IEEE TRANS. ON VISUALIZATION AND COMPUTER GRAPHICS
, 2006
"... Tracking is a very important research subject in a real-time augmented reality context. The main requirements for trackers are high accuracy and little latency at a reasonable cost. In order to address these issues, a real-time, robust, and efficient 3D modelbased tracking algorithm is proposed for ..."
Abstract
-
Cited by 114 (29 self)
- Add to MetaCart
(Show Context)
Tracking is a very important research subject in a real-time augmented reality context. The main requirements for trackers are high accuracy and little latency at a reasonable cost. In order to address these issues, a real-time, robust, and efficient 3D modelbased tracking algorithm is proposed for a “video see through ” monocular vision system. The tracking of objects in the scene amounts to calculating the pose between the camera and the objects. Virtual objects can then be projected into the scene using the pose. Here, nonlinear pose estimation is formulated by means of a virtual visual servoing approach. In this context, the derivation of point-to-curves interaction matrices are given for different 3D geometrical primitives including straight lines, circles, cylinders, and spheres. A local moving edges tracker is used in order to provide real-time tracking of points normal to the object contours. Robustness is obtained by integrating an M-estimator into the visual control law via an iteratively reweighted least squares implementation. This approach is then extended to address the 3D model-free augmented reality problem. The method presented in this paper has been validated on several complex image sequences including outdoor environments. Results show the method to be robust to occlusion, changes in illumination, and mistracking.
Fast Keypoint Recognition using Random Ferns. PAMI, 2009. Accepted for Publication
"... While feature point recognition is a key component of modern approaches to object detection, existing approaches require computationally expensive patch preprocessing to handle perspective distortion. In this paper, we show that formulating the problem in a Naive Bayesian classification framework ma ..."
Abstract
-
Cited by 113 (8 self)
- Add to MetaCart
(Show Context)
While feature point recognition is a key component of modern approaches to object detection, existing approaches require computationally expensive patch preprocessing to handle perspective distortion. In this paper, we show that formulating the problem in a Naive Bayesian classification framework makes such preprocessing unnecessary and produces an algorithm that is simple, efficient, and robust. Furthermore, it scales well as number of classes grows. To recognize the patches surrounding keypoints, our classifier uses hundreds of simple binary features and models class posterior probabilities. We make the problem computationally tractable by assuming independence betweenarbitrarysetsoffeatures.Eventhoughthisisnotstrictlytrue,wedemonstratethatourclassifier nevertheless performs remarkably well on image datasets containing very significant perspective changes. Index Terms Image processing and computer vision, object recognition, tracking, image registration, feature matching, naive bayesian 1 I.
Live dense reconstruction with a single moving camera
- IEEE Conference on Computer Vision and pattern Recognition
, 2010
"... We present a method which enables rapid and dense reconstruction of scenes browsed by a single live camera. We take point-based real-time structure from motion (SFM) as our starting point, generating accurate 3D camera pose estimates and a sparse point cloud. Our main novel contribution is to use an ..."
Abstract
-
Cited by 107 (5 self)
- Add to MetaCart
(Show Context)
We present a method which enables rapid and dense reconstruction of scenes browsed by a single live camera. We take point-based real-time structure from motion (SFM) as our starting point, generating accurate 3D camera pose estimates and a sparse point cloud. Our main novel contribution is to use an approximate but smooth base mesh generated from the SFM to predict the view at a bundle of poses around automatically selected reference frames spanning the scene, and then warp the base mesh into highly accurate depth maps based on view-predictive optical flow and a constrained scene flow update. The quality of the resulting depth maps means that a convincing global scene model can be obtained simply by placing them side by side and removing overlapping regions. We show that a cluttered indoor environment can be reconstructed from a live hand-held camera in a few seconds, with all processing performed by current desktop hardware. Real-time monocular dense reconstruction opens up many application areas, and we demonstrate both real-time novel view synthesis and advanced augmented reality where augmentations interact physically with the 3D scene and are correctly clipped by occlusions. 1.
Real-time 3D slam with wide-angle vision.
- In Proc. IFAC/EURON Symp. Intelligent Autonomous Vehicles,
, 2004
"... ..."
Mapping large loops with a single hand-held camera
- IN PROC. ROBOTICS: SCI. SYST
, 2007
"... This paper 1 presents a method for Simultaneous Localization and Mapping (SLAM) relying on a monocular camera as the only sensor which is able to build outdoor, closedloop maps much larger than previously achieved with such input. Our system, based on the Hierarchical Map approach [1], builds inde ..."
Abstract
-
Cited by 100 (19 self)
- Add to MetaCart
This paper 1 presents a method for Simultaneous Localization and Mapping (SLAM) relying on a monocular camera as the only sensor which is able to build outdoor, closedloop maps much larger than previously achieved with such input. Our system, based on the Hierarchical Map approach [1], builds independent local maps in real-time using the EKF-SLAM technique and the inverse depth representation proposed in [2]. The main novelty in the local mapping process is the use of a data association technique that greatly improves its robustness in dynamic and complex environments. A new visual map matching algorithm stitches these maps together and is able to detect large loops automatically, taking into account the unobservability of scale intrinsic to pure monocular SLAM. The loop closing constraint is applied at the upper level of the Hierarchical Map in near real-time. We present experimental results demonstrating monocular SLAM as a human carries a camera over long walked trajectories in outdoor areas with people and other clutter, even in the more difficult case of forward-looking camera, and show the closing of loops of several hundred meters.
Real-time Monocular SLAM: Why Filter?
"... Abstract—While the most accurate solution to off-line structure from motion (SFM) problems is undoubtedly to extract as much correspondence information as possible and perform global optimisation, sequential methods suitable for live video streams must approximate this to fit within fixed computatio ..."
Abstract
-
Cited by 67 (4 self)
- Add to MetaCart
(Show Context)
Abstract—While the most accurate solution to off-line structure from motion (SFM) problems is undoubtedly to extract as much correspondence information as possible and perform global optimisation, sequential methods suitable for live video streams must approximate this to fit within fixed computational bounds. Two quite different approaches to real-time SFM — also called monocular SLAM (Simultaneous Localisation and Mapping) — have proven successful, but they sparsify the problem in different ways. Filtering methods marginalise out past poses and summarise the information gained over time with a probability distribution. Keyframe methods retain the optimisation approach of global bundle adjustment, but computationally must select only a small number of past frames to process. In this paper we perform the first rigorous analysis of the relative advantages of filtering and sparse optimisation for sequential monocular SLAM. A series of experiments in simulation as well using a real image SLAM system were performed by means of covariance propagation and Monte Carlo methods, and comparisons made using a combined cost/accuracy measure. With some well-discussed reservations, we conclude that while filtering may have a niche in systems with low processing resources, in most modern applications keyframe optimisation gives the most accuracy per unit of computing time. I.