Results 1 - 10
of
98
Pose Tracking from Natural Features on Mobile Phones
"... In this paper we present two techniques for natural feature tracking in real-time on mobile phones. We achieve interactive frame rates of up to 20Hz for natural feature tracking from textured planar targets on current-generation phones. We use an approach based on heavily modified state-of-the-art f ..."
Abstract
-
Cited by 36 (7 self)
- Add to MetaCart
In this paper we present two techniques for natural feature tracking in real-time on mobile phones. We achieve interactive frame rates of up to 20Hz for natural feature tracking from textured planar targets on current-generation phones. We use an approach based on heavily modified state-of-the-art feature descriptors, namely SIFT and Ferns. While SIFT is known to be a strong, but computationally expensive feature descriptor, Ferns classification is fast, but requires large amounts of memory. This renders both original designs unsuitable for mobile phones. We give detailed descriptions on how we modified both approaches to make them suitable for mobile phones. We present evaluations on robustness and performance on various devices and finally discuss their appropriateness for Augmented Reality applications.
Improving the agility of keyframe-based SLAM
- In Proceedings of the European Conference on Computer Vision (ECCV
, 2008
"... Abstract. The ability to localise a camera moving in a previously unknown environment is desirable for a wide range of applications. In computer vision this problem is studied as monocular SLAM. Recent years have seen improvements to the usability and scalability of monocular SLAM systems to the poi ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
Abstract. The ability to localise a camera moving in a previously unknown environment is desirable for a wide range of applications. In computer vision this problem is studied as monocular SLAM. Recent years have seen improvements to the usability and scalability of monocular SLAM systems to the point that they may soon find uses outside of laboratory conditions. However, the robustness of these systems to rapid camera motions (we refer to this quality as agility) still lags behind that of tracking systems which use known object models. In this paper we attempt to remedy this. We present two approaches to improving the agility of a keyframe-based SLAM system: Firstly, we add edge features to the map and exploit their resilience to motion blur to improve tracking under fast motion. Secondly, we implement a very simple inter-frame rotation estimator to aid tracking when the camera is rapidly panning – and demonstrate that this method also enables a trivially simple yet effective relocalisation method. Results show that a SLAM system combining points, edge features and motion initialisation allows highly agile tracking at a moderate increase in processing time. 1
Live dense reconstruction with a single moving camera
- IEEE Conference on Computer Vision and pattern Recognition
, 2010
"... We present a method which enables rapid and dense reconstruction of scenes browsed by a single live camera. We take point-based real-time structure from motion (SFM) as our starting point, generating accurate 3D camera pose estimates and a sparse point cloud. Our main novel contribution is to use an ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
We present a method which enables rapid and dense reconstruction of scenes browsed by a single live camera. We take point-based real-time structure from motion (SFM) as our starting point, generating accurate 3D camera pose estimates and a sparse point cloud. Our main novel contribution is to use an approximate but smooth base mesh generated from the SFM to predict the view at a bundle of poses around automatically selected reference frames spanning the scene, and then warp the base mesh into highly accurate depth maps based on view-predictive optical flow and a constrained scene flow update. The quality of the resulting depth maps means that a convincing global scene model can be obtained simply by placing them side by side and removing overlapping regions. We show that a cluttered indoor environment can be reconstructed from a live hand-held camera in a few seconds, with all processing performed by current desktop hardware. Real-time monocular dense reconstruction opens up many application areas, and we demonstrate both real-time novel view synthesis and advanced augmented reality where augmentations interact physically with the 3D scene and are correctly clipped by occlusions. 1.
Unified loop closing and recovery for real time monocular slam
- In British Machine Vision Conference
, 2008
"... We present a unified method for recovering from tracking failure and closing loops in real time monocular simultaneous localisation and mapping. Within a graph-based map representation, we show that recovery and loop closing both reduce to the creation of a graph edge. We describe and implement a ba ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
We present a unified method for recovering from tracking failure and closing loops in real time monocular simultaneous localisation and mapping. Within a graph-based map representation, we show that recovery and loop closing both reduce to the creation of a graph edge. We describe and implement a bag-of-words appearance model for ranking potential loop closures, and a robust method for using both structure and image appearance to confirm likely matches. The resulting system closes loops and recovers from failures while mapping thousands of landmarks, all in real time. 1
ProFORMA: Probabilistic Feature-based On-line Rapid Model Acquisition. Under review for BMVC 2009. Included in supplementary material as 115-Item 1.pdf
"... Off-line model reconstruction relies on an image collection phase and a slow reconstruction phase, requiring a long time to verify a model obtained from an image sequence is acceptable. We propose a new model acquisition system, called ProFORMA, which generates a 3D model on-line as the input sequen ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Off-line model reconstruction relies on an image collection phase and a slow reconstruction phase, requiring a long time to verify a model obtained from an image sequence is acceptable. We propose a new model acquisition system, called ProFORMA, which generates a 3D model on-line as the input sequence is being collected. As the user rotates the object in front of a stationary camera, a partial model is reconstructed and displayed to the user to assist view planning. The model is also used by the system to robustly track the pose of the object. Models are rapidly produced through a Delaunay tetrahedralisation of points obtained from on-line structure from motion estimation, followed by a probabilistic tetrahedron carving step to obtain a textured surface mesh of the object. 1
Real-time Monocular SLAM: Why Filter?
"... Abstract—While the most accurate solution to off-line structure from motion (SFM) problems is undoubtedly to extract as much correspondence information as possible and perform global optimisation, sequential methods suitable for live video streams must approximate this to fit within fixed computatio ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Abstract—While the most accurate solution to off-line structure from motion (SFM) problems is undoubtedly to extract as much correspondence information as possible and perform global optimisation, sequential methods suitable for live video streams must approximate this to fit within fixed computational bounds. Two quite different approaches to real-time SFM — also called monocular SLAM (Simultaneous Localisation and Mapping) — have proven successful, but they sparsify the problem in different ways. Filtering methods marginalise out past poses and summarise the information gained over time with a probability distribution. Keyframe methods retain the optimisation approach of global bundle adjustment, but computationally must select only a small number of past frames to process. In this paper we perform the first rigorous analysis of the relative advantages of filtering and sparse optimisation for sequential monocular SLAM. A series of experiments in simulation as well using a real image SLAM system were performed by means of covariance propagation and Monte Carlo methods, and comparisons made using a combined cost/accuracy measure. With some well-discussed reservations, we conclude that while filtering may have a niche in systems with low processing resources, in most modern applications keyframe optimisation gives the most accuracy per unit of computing time. I.
View-Based Maps
"... Abstract — Robotic systems that can create and use visual maps in realtime have obvious advantages in many applications, from automatic driving to mobile manipulation in the home. In this paper we describe a mapping system based on retaining stereo views of the environment that are collected as the ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Abstract — Robotic systems that can create and use visual maps in realtime have obvious advantages in many applications, from automatic driving to mobile manipulation in the home. In this paper we describe a mapping system based on retaining stereo views of the environment that are collected as the robot moves. Connections among the views are formed by consistent geometric matching of their features. Out-of-sequence matching is the key problem: how to find connections from the current view to other corresponding views in the map. Our approach uses a vocabulary tree to propose candidate views, and a strong geometric filter to eliminate false positives – essentially, the robot continually re-recognizes where it is. We present experiments showing the utility of the approach on video data, including map building in large indoor and outdoor environments, map building without localization, and re-localization when lost. I.
KinectFusion: Real-Time Dense Surface Mapping and Tracking ∗
"... Figure 1: Example output from our system, generated in real-time with a handheld Kinect depth camera and no other sensing infrastructure. Normal maps (colour) and Phong-shaded renderings (greyscale) from our dense reconstruction system are shown. On the left for comparison is an example of the live, ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Figure 1: Example output from our system, generated in real-time with a handheld Kinect depth camera and no other sensing infrastructure. Normal maps (colour) and Phong-shaded renderings (greyscale) from our dense reconstruction system are shown. On the left for comparison is an example of the live, incomplete, and noisy data from the Kinect sensor (used as input to our system). We present a system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware. We fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real-time. The current sensor pose is simultaneously obtained by tracking the live depth frame relative to the global model using a coarse-to-fine iterative closest point (ICP) algorithm, which uses all of the observed depth data available. We demonstrate the advantages of tracking against the growing full surface model compared with frame-to-frame tracking, obtaining tracking and mapping results
Kinectfusion: real-time 3D reconstruction and interaction using a moving depth camera
- In Proc. UIST
, 2011
"... Figure 1: KinectFusion enables real-time detailed 3D reconstructions of indoor scenes using only the depth data from a standard Kinect camera. A) user points Kinect at coffee table scene. B) Phong shaded reconstructed 3D model (the wireframe frustum shows current tracked 3D pose of Kinect). C) 3D mo ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Figure 1: KinectFusion enables real-time detailed 3D reconstructions of indoor scenes using only the depth data from a standard Kinect camera. A) user points Kinect at coffee table scene. B) Phong shaded reconstructed 3D model (the wireframe frustum shows current tracked 3D pose of Kinect). C) 3D model texture mapped using Kinect RGB data with real-time particles simulated on the 3D model as reconstruction occurs. D) Multi-touch interactions performed on any reconstructed surface. E) Real-time segmentation and 3D tracking of a physical object. KinectFusion enables a user holding and moving a standard Kinect camera to rapidly create detailed 3D reconstructions of an indoor scene. Only the depth data from Kinect is used to track the 3D pose of the sensor and reconstruct, geometrically precise, 3D models of the physical scene in real-time. The capabilities of KinectFusion, as well as the novel GPUbased pipeline are described in full. We show uses of the core system for low-cost handheld scanning, and geometry-aware augmented reality and physics-based interactions. Novel extensions to the core GPU pipeline demonstrate object segmentation and user interaction directly in front of the sensor, without degrading camera tracking or reconstruction. These extensions are used to enable real-time multi-touch interactions anywhere, allowing any planar or non-planar reconstructed physical surface to be appropriated for touch. ACM Classification: H5.2 [Information Interfaces and Presentation]:
Visual-Inertial Navigation, Mapping and Localization: A Scalable Real-Time Causal Approach
, 2010
"... We present a model to estimate motion from monocular visual and inertial measurements. We analyze the model and characterize the conditions under which its state is observable, and its parameters are identifiable. These include the unknown gravity vector, and the unknown transformation between the c ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
We present a model to estimate motion from monocular visual and inertial measurements. We analyze the model and characterize the conditions under which its state is observable, and its parameters are identifiable. These include the unknown gravity vector, and the unknown transformation between the camera coordinate frame and the inertial unit. We show that it is possible to estimate both state and parameters as part of an on-line procedure, but only provided that the motion sequence is “rich enough,” a condition that we characterize explicitly. We then describe an efficient implementation of a filter to estimate the state and parameters of this model, including gravity and camera-to-inertial calibration. It runs in real-time on an embedded platform, and its performance has been tested extensively. We report experiments of continuous operation, without failures, re-initialization, or re-calibration, on paths of length up to 30Km. We also describe an integrated approach to “loop-closure,” that is the recognition of previously-seen locations and the topological re-adjustment of the traveled path. It represents visual features relative to the global orientation reference provided by the gravity vector estimated by the filter, and relative to the scale provided by their known position within the map; these features are organized into “locations ” defined by visibility constraints, represented in a topological graph, where loop closure can be performed without the need to re-compute past trajectories or perform bundle adjustment. The software infrastructure as well as the embedded platform is described in detail in a technical report (Jones and Soatto (2009).)

