Results 1 - 10
of
37
Learning local image descriptors
- In CVPR
, 2007
"... In this paper we study interest point descriptors for image matching and 3D reconstruction. We examine the building blocks of descriptor algorithms and evaluate numerous combinations of components. Various published descriptors such as SIFT, GLOH, and Spin Images can be cast into our framework. For ..."
Abstract
-
Cited by 53 (2 self)
- Add to MetaCart
In this paper we study interest point descriptors for image matching and 3D reconstruction. We examine the building blocks of descriptor algorithms and evaluate numerous combinations of components. Various published descriptors such as SIFT, GLOH, and Spin Images can be cast into our framework. For each candidate algorithm we learn good choices for parameters using a training set consisting of patches from a multi-image 3D reconstruction where accurate ground-truth matches are known. The best descriptors were those with log polar histogramming regions and feature vectors constructed from rectified outputs of steerable quadrature filters. At a 95 % detection rate these gave one third of the incorrect matches produced by SIFT. 1.
Fast Keypoint Recognition using Random Ferns. PAMI, 2009. Accepted for Publication
"... While feature point recognition is a key component of modern approaches to object detection, existing approaches require computationally expensive patch preprocessing to handle perspective distortion. In this paper, we show that formulating the problem in a Naive Bayesian classification framework ma ..."
Abstract
-
Cited by 22 (5 self)
- Add to MetaCart
While feature point recognition is a key component of modern approaches to object detection, existing approaches require computationally expensive patch preprocessing to handle perspective distortion. In this paper, we show that formulating the problem in a Naive Bayesian classification framework makes such preprocessing unnecessary and produces an algorithm that is simple, efficient, and robust. Furthermore, it scales well as number of classes grows. To recognize the patches surrounding keypoints, our classifier uses hundreds of simple binary features and models class posterior probabilities. We make the problem computationally tractable by assuming independence betweenarbitrarysetsoffeatures.Eventhoughthisisnotstrictlytrue,wedemonstratethatourclassifier nevertheless performs remarkably well on image datasets containing very significant perspective changes. Index Terms Image processing and computer vision, object recognition, tracking, image registration, feature matching, naive bayesian 1 I.
Informed visual search: Combining attention and object recognition
- In Proceedings of ICRA
, 2008
"... Abstract — This paper studies the sequential object recognition problem faced by a mobile robot searching for specific objects within a cluttered environment. In contrast to current state-of-the-art object recognition solutions which are evaluated on databases of static images, the system described ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Abstract — This paper studies the sequential object recognition problem faced by a mobile robot searching for specific objects within a cluttered environment. In contrast to current state-of-the-art object recognition solutions which are evaluated on databases of static images, the system described in this paper employs an active strategy based on identifying potential objects using an attention mechanism and planning to obtain images of these objects from numerous viewpoints. We demonstrate the use of a bag-of-features technique for ranking potential objects, and show that this measure outperforms geometric matching for invariance across viewpoints. Our system implements informed visual search by prioritising map locations and re-examining promising locations first. Experimental results demonstrate that our system is a highly competent object recognition system that is capable of locating numerous challenging objects amongst distractors. I.
Maximally Stable Colour Regions for Recognition and Matching
, 2007
"... This paper introduces a novel colour-based affine covariant region detector. Our algorithm is an extension of the maximally stable extremal region (MSER) to colour. The extension to colour is done by looking at successive time-steps of an agglomerative clustering of image pixels. The selection of ti ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This paper introduces a novel colour-based affine covariant region detector. Our algorithm is an extension of the maximally stable extremal region (MSER) to colour. The extension to colour is done by looking at successive time-steps of an agglomerative clustering of image pixels. The selection of time-steps is stabilised against intensity scalings and image blur by modelling the distribution of edge magnitudes. The algorithm contains a novel edge significance measure based on a Poisson image noise model, which we show performs better than the commonly used Euclidean distance. We compare our algorithm to the original MSER detector and a competing colour-based blob feature detector, and show through a repeatability test that our detector performs better. We also extend the state of the art in feature repeatability tests, by using scenes consisting of two planes where one is piecewise transparent. This new test is able to evaluate how stable a feature is against changing backgrounds.
Fundamental Matrix Estimation via TIP -- Transfer of Invariant Parameters
, 2006
"... The fundamental matrix (FM) represents the perspective transform between two or more uncalibrated images of a stationary scene, and is traditionally estimated based on 2parameter point-to-point correspondences between image pairs. Recent invariant correspondence techniques however, provide robust co ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The fundamental matrix (FM) represents the perspective transform between two or more uncalibrated images of a stationary scene, and is traditionally estimated based on 2parameter point-to-point correspondences between image pairs. Recent invariant correspondence techniques however, provide robust correspondences in terms of 4 to 6parameter invariant regions. Such correspondences contain important information regarding scene geometry, information which is lost in FM estimation techniques based solely on 2-parameter point translation. In this article, we present a method of incorporating this additional information into point-based FM estimation routines, entitled TIP (transfer of invariant parameters). The TIP method transforms invariant correspondence parameters into additional point correspondences, which can be used with FM estimation routines. Experimentation shows that the TIP methods result in more robust FM estimates in the case of sparse correspondence, and allows estimation based on as few as 3 correspondences in the case of affine-invariant features.
The City of Sights: Design, Construction, and Measurement of an Augmented Reality Stage Set
"... views of the City of Sights, showing a virtual and a real representation of the total assembly. We describe the design and implementation of a physical and virtual model of an imaginary urban scene — the “City of Sights ” — that can serve as a backdrop or “stage ” for a variety of Augmented Reality ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
views of the City of Sights, showing a virtual and a real representation of the total assembly. We describe the design and implementation of a physical and virtual model of an imaginary urban scene — the “City of Sights ” — that can serve as a backdrop or “stage ” for a variety of Augmented Reality (AR) research. We argue that the AR research community would benefit from such a standard model dataset which can be used for evaluation of such AR topics as tracking systems, modeling, spatial AR, rendering tests, collaborative AR and user interface design. By openly sharing the digital blueprints and assembly instructions for our models, we allow the proposed set to be physically replicable by anyone and permit customization and experimental changes to the stage design which enable comprehensive exploration of algorithms and methods. Furthermore we provide an accompanying rich dataset consisting of video sequences under varying conditions with ground truth camera pose. We employed three different ground truth acquisition methods to support a broad range of use cases. The goal of our design is to enable and improve the replicability and evaluation of future augmented reality research. 1
On recall rate of interest point detectors
- In 3DPVT
"... In this paper we provide a method for evaluating interest point detectors independently of image descriptors. This is possible because we have compiled a unique data set enabling us to determine if common interest points are found. The data contains 60 scenes of a wide range of object types, and for ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
In this paper we provide a method for evaluating interest point detectors independently of image descriptors. This is possible because we have compiled a unique data set enabling us to determine if common interest points are found. The data contains 60 scenes of a wide range of object types, and for each scene we have 119 precisely located camera positions obtained from a camera mounted on an industrial robot arm. The scene surfaces have been scanned using structured light, providing precise 3D ground truth. We have investigated a number of the most popular interest point detectors. This is done in relation to the number of interest points, the recall rate as a function of camera position and light variation, and the sensitivity relative to model parameter change. The overall conclusion is that the Harris corner detector has a very high recall rate, but is sensitive to change in scale. The Hessian corners perform overall well followed by MSER (Maximally Stable Extremal Regions), whereas the FAST corner detector, IBR (Intensity Based Regions) and EBR (Edge Based Regions) performs poorly. Furthermore, the repeatability of the corner detectors is quite unaffected by the parameter setting, and only the number of interest points change. 1.
Video-based descriptors for object recognition
- Image and Vision Computing, 29(10):639
"... We describe a visual recognition system operating on a hand-held device, based on a video-based feature descriptor, and characterize its invariance and discriminative properties. Feature selection and tracking are performed in real-time, and used to train a template-based classifier during a capture ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
We describe a visual recognition system operating on a hand-held device, based on a video-based feature descriptor, and characterize its invariance and discriminative properties. Feature selection and tracking are performed in real-time, and used to train a template-based classifier during a capture phase prompted by the user. During normal operation, the system scores objects in the field of view based on their ranking. Severe resource constraints have prompted a re-evaluation of existing algorithms improving their performance (accuracy and robustness) as well as computational efficiency. We motivate the design choices in the implementation with a characterization of the stability properties of local invariant detectors, and of the conditions under which a template-based descriptor is optimal. The analysis also highlights the role of time as “weak supervisor ” during training, which we exploit in our implementation.
Keypoint Descriptors for Matching Across Multiple Image Modalities and Non-linear Intensity Variations
"... In this paper, we investigate the effect of substantial inter-image intensity changes and changes in modality on the performance of keypoint detection, description, and matching algorithms in the context of image registration. In doing so, we modify widely-used keypoint descriptors such as SIFT and ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
In this paper, we investigate the effect of substantial inter-image intensity changes and changes in modality on the performance of keypoint detection, description, and matching algorithms in the context of image registration. In doing so, we modify widely-used keypoint descriptors such as SIFT and shape contexts, attempting to capture the insight that some structural information is indeed preserved between images despite dramatic appearance changes. These extensions include (a) pairing opposite-direction gradients in the formation of orientation histograms and (b) focusing on edge structures only. We also compare the stability of MSER, Laplacian-of-Gaussian, and Harris corner keypoint location detection and the impact of detection errors on matching results. Our experiments on multimodal image pairs and on image pairs with significant intensity differences show that indexing based on our modified descriptors produces more correct matches on difficult pairs than current techniques at the cost of a small decrease in performance on easier pairs. This extends the applicability of image registration algorithms such as the Dual-Bootstrap which rely on correctly matching only a small number of keypoints. 1.

