Results 1 - 10
of
84
ORB: an efficient alternative to SIFT or SURF
- In ICCV
"... Feature matching is at the base of many computer vision problems, such as object recognition or structure from motion. Current methods rely on costly descriptors for detection and matching. In this paper, we propose a very fast binary descriptor based on BRIEF, called ORB, which is rotation invarian ..."
Abstract
-
Cited by 171 (0 self)
- Add to MetaCart
(Show Context)
Feature matching is at the base of many computer vision problems, such as object recognition or structure from motion. Current methods rely on costly descriptors for detection and matching. In this paper, we propose a very fast binary descriptor based on BRIEF, called ORB, which is rotation invariant and resistant to noise. We demonstrate through experiments how ORB is at two orders of magnitude faster than SIFT, while performing as well in many situations. The efficiency is tested on several real-world applications, including object detection and patch-tracking on a smart phone. 1.
Scale Drift-Aware Large Scale Monocular SLAM
"... Abstract—State of the art visual SLAM systems have recently been presented which are capable of accurate, large-scale and real-time performance, but most of these require stereo vision. Important application areas in robotics and beyond open up if similar performance can be demonstrated using monocu ..."
Abstract
-
Cited by 65 (4 self)
- Add to MetaCart
(Show Context)
Abstract—State of the art visual SLAM systems have recently been presented which are capable of accurate, large-scale and real-time performance, but most of these require stereo vision. Important application areas in robotics and beyond open up if similar performance can be demonstrated using monocular vision, since a single camera will always be cheaper, more compact and easier to calibrate than a multi-camera rig. With high quality estimation, a single camera moving through a static scene of course effectively provides its own stereo geometry via frames distributed over time. However, a classic issue with monocular visual SLAM is that due to the purely projective nature of a single camera, motion estimates and map structure can only be recovered up to scale. Without the known inter-camera distance of a stereo rig to serve as an anchor, the scale of locally constructed map portions and the corresponding motion estimates is therefore liable to drift over time. In this paper we describe a new near real-time visual SLAM system which adopts the continuous keyframe optimisation approach of the best current stereo systems, but accounts for the additional challenges presented by monocular input. In particular, we present a new pose-graph optimisation technique which allows for the efficient correction of rotation, translation and scale drift at loop closures. Especially, we describe the Lie group of similarity transformations and its relation to the corresponding Lie algebra. We also present in detail the system’s new image processing front-end which is able accurately to track hundreds of features per frame, and a filter-based approach for feature initialisation within keyframe-based SLAM. Our approach is proven via large-scale simulation and real-world experiments where a camera completes large looped trajectories. I.
Unified real-time tracking and recognition with rotation-invariant fast features
- IN [PROC. IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR’10
, 2010
"... We present a method that unifies tracking and video content recognition with applications to Mobile Augmented Reality (MAR). We introduce the Radial Gradient Transform (RGT) and an approximate RGT, yielding the Rotation-Invariant, Fast Feature (RIFF) descriptor. We demonstrate that RIFF is fast enou ..."
Abstract
-
Cited by 33 (11 self)
- Add to MetaCart
We present a method that unifies tracking and video content recognition with applications to Mobile Augmented Reality (MAR). We introduce the Radial Gradient Transform (RGT) and an approximate RGT, yielding the Rotation-Invariant, Fast Feature (RIFF) descriptor. We demonstrate that RIFF is fast enough for real-time tracking, while robust enough for large scale retrieval tasks. At 26 × the speed, our trackingscheme obtains a more accurate global affine motionmodel than the Kanade Lucas Tomasi (KLT) tracker. The same descriptors can achieve 94^% retrieval accuracy from a database of 10 4 images.
Real-Time Self-Localization from Panoramic Images on Mobile Devices
- In ISMAR
, 2011
"... Self-localization in large environments is a vital task for accurately registered information visualization in outdoor Augmented Reality (AR) applications. In this work, we present a system for selflocalization on mobile phones using a GPS prior and an onlinegenerated panoramic view of the user’s en ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
(Show Context)
Self-localization in large environments is a vital task for accurately registered information visualization in outdoor Augmented Reality (AR) applications. In this work, we present a system for selflocalization on mobile phones using a GPS prior and an onlinegenerated panoramic view of the user’s environment. The approach is suitable for executing entirely on current generation mobile devices, such as smartphones. Parallel execution of online incremental panorama generation and accurate 6DOF pose estimation using 3D point reconstructions allows for real-time self-localization and registration in large-scale environments. The power of our approach is demonstrated in several experimental evaluations.
Rolling shutter bundle adjustment
- in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, 2012
"... This paper introduces a bundle adjustment (BA) method that obtains accurate structure and motion from rolling shutter (RS) video sequences: RSBA. When a classical BA algorithm processes a rolling shutter video, the resultant camera trajectory is brittle, and complete failures are not uncommon. We ex ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
(Show Context)
This paper introduces a bundle adjustment (BA) method that obtains accurate structure and motion from rolling shutter (RS) video sequences: RSBA. When a classical BA algorithm processes a rolling shutter video, the resultant camera trajectory is brittle, and complete failures are not uncommon. We exploit the temporal continuity of the cam-era motion to define residuals of image point trajectories with respect to the camera trajectory. We compare the cam-era trajectories from RSBA to those from classical BA, and from classical BA on rectified videos. The comparisons are done on real video sequences from an iPhone 4, with ground truth obtained from a global shutter camera, rigidly mounted to the iPhone 4. Compared to classical BA, the rolling shutter model requires just six extra parameters. It also degrades the sparsity of the system Jacobian slightly, but as we demonstrate, the increase in computation time is moderate. Decisive advantages are that RSBA succeeds in cases where competing methods diverge, and consistently produces more accurate results. 1.
MEVBench: A Mobile Computer Vision Benchmarking Suite
- In Proceedings of the IEEE International Symposium on Workload Characterization
, 2011
"... The growth in mobile vision applications, coupled with the performance limitations of mobile platforms, has led to a growing need to understand computer vision applications. Computationally intensive mobile vision applications, such as augmented reality or object recognition, place significant perfo ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
(Show Context)
The growth in mobile vision applications, coupled with the performance limitations of mobile platforms, has led to a growing need to understand computer vision applications. Computationally intensive mobile vision applications, such as augmented reality or object recognition, place significant performance and power demands on existing embedded platforms, often leading to degraded application quality. With a better understanding of this growing application space, it will be possible to more effectively optimize future embedded platforms. In this work, we introduce and evaluate a custom benchmark suite for mobile embedded vision applications named MEVBench. MEVBench provides a wide range of mobile vision applications such as face detection, feature classification, object tracking and feature extraction. To better understand mobile vision processing characteristics at the architectural level, we analyze single and multithread implementations of many algorithms to evaluate performance, scalability, and memory characteristics. We provide insights into the major areas where architecture can improve the performance of these applications in embedded systems. large growth in vision applications as mobile devices such as tablets and smartphones gain more capable imaging devices. This, coupled with the proliferation of smartphones and tablets, is leading to mobile computer vision becoming a key application domain in embedded computing. Figure 1. Augmented Reality The figure shows an example of augmented reality available on mobile platforms. The left image shows the original scene. In the right image a red cube frame has been rendered in proper perspective as though attached to the marker. Current mobile computing devices are capable of rendering detailed objects into the scene. 1
LDB: An Ultra-Fast Feature for Scalable Augmented Reality on Mobile Devices
- In Proceedings of International Syposium on Mixed and Augmented Reality (ISMAR
, 2012
"... The efficiency, robustness and distinctiveness of a feature descriptor are critical to the user experience and scalability of a mobile Augmented Reality (AR) system. However, existing descriptors are either too compute-expensive to achieve real-time performance on a mobile device such as a smartphon ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
(Show Context)
The efficiency, robustness and distinctiveness of a feature descriptor are critical to the user experience and scalability of a mobile Augmented Reality (AR) system. However, existing descriptors are either too compute-expensive to achieve real-time performance on a mobile device such as a smartphone or tablet, or not sufficiently robust and distinctive to identify correct matches from a large database. As a result, current mobile AR systems still only have limited capabilities, which greatly restrict their deployment in practice. In this paper, we propose a highly efficient, robust and distinctive binary descriptor, called Local Difference Binary (LDB). LDB directly computes a binary string for an image patch using simple intensity and gradient difference tests on pairwise grid cells within the patch. A multiple gridding strategy is applied to capture the distinct patterns of the patch at different spatial granularities. Experimental results demonstrate that LDB is extremely fast to compute and to match against a large database due to its high robustness and distinctiveness. Comparing to the state-of-the-art binary descriptor BRIEF, primarily designed for speed, LDB has similar computational efficiency, while achieves a greater accuracy and 5x faster matching speed when matching over a large database with 1.7M+ descriptors.
M.: Tool Support for Prototyping Interfaces for Vision-Based Indoor Navigation
- In: Proc. of the Workshop on Mobile Vision and HCI (MobiVis). Held in Conjunction with Mobile HCI. (2012
"... Vision-based approaches are a promising method for indoor navigation, but prototyping and evaluating them poses several challenges. These include the effort of realizing the localization component, difficulties in simulating real-world behavior and the interaction between vision-based localization a ..."
Abstract
-
Cited by 10 (8 self)
- Add to MetaCart
(Show Context)
Vision-based approaches are a promising method for indoor navigation, but prototyping and evaluating them poses several challenges. These include the effort of realizing the localization component, difficulties in simulating real-world behavior and the interaction between vision-based localization and the user interface. In this paper, we report on initial findings from the development of a tool to support this process. We identify key requirements for such a tool and use an example visionbased system to evaluate a first prototype of the tool. Author Keywords Indoor navigation; vision-based; augmented reality; virtual
Rolling shutter camera calibration
- in Proc. of The IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR
, 2013
"... Rolling Shutter (RS) cameras are used across a wide range of consumer electronic devices—from smart-phones to high-end cameras. It is well known, that if a RS camera is used with a moving camera or scene, significant image distortions are introduced. The quality or even success of structure from mot ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
(Show Context)
Rolling Shutter (RS) cameras are used across a wide range of consumer electronic devices—from smart-phones to high-end cameras. It is well known, that if a RS camera is used with a moving camera or scene, significant image distortions are introduced. The quality or even success of structure from motion on rolling shutter images requires the usual intrinsic parameters such as focal length and distor-tion coefficients as well as accurate modelling of the shutter timing. The current state-of-the-art technique for calibrating the shutter timings requires specialised hardware. We present a new method that only requires video of a known calibration pattern. Experimental results on over 60 real datasets show that our method is more accurate than the current state of the art. 1.
Real-time Motion Tracking on a Cellphone using Inertial Sensing and a Rolling-Shutter Camera
"... Abstract — All existing methods for vision-aided inertial navigation assume a camera with a global shutter, in which all the pixels in an image are captured simultaneously. However, the vast majority of consumer-grade cameras use rolling-shutter sensors, which capture each row of pixels at a slightl ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
(Show Context)
Abstract — All existing methods for vision-aided inertial navigation assume a camera with a global shutter, in which all the pixels in an image are captured simultaneously. However, the vast majority of consumer-grade cameras use rolling-shutter sensors, which capture each row of pixels at a slightly different time instant. The effects of the rolling shutter distortion when a camera is in motion can be very significant, and are not modelled by existing visual-inertial motion-tracking methods. In this paper we describe the first, to the best of our knowledge, method for vision-aided inertial navigation using rolling-shutter cameras. Specifically, we present an extended Kalman filter (EKF)-based method for visual-inertial odometry, which fuses the IMU measurements with observations of visual feature tracks provided by the camera. The key contribution of this work is a computationally tractable approach for taking into account the rolling-shutter effect, incurring only minimal approximations. The experimental results from the application of the method show that it is able to track, in real time, the position of a mobile phone moving in an unknown environment with an error accumulation of approximately 0.8 % of the distance travelled, over hundreds of meters. I.