Results 1 - 10
of
52
Learning the discriminative powerinvariance trade-off
- In ICCV
, 2007
"... We investigate the problem of learning optimal descriptors for a given classification task. Many hand-crafted descriptors have been proposed in the literature for measuring visual similarity. Looking past initial differences, what really distinguishes one descriptor from another is the tradeoff that ..."
Abstract
-
Cited by 80 (3 self)
- Add to MetaCart
We investigate the problem of learning optimal descriptors for a given classification task. Many hand-crafted descriptors have been proposed in the literature for measuring visual similarity. Looking past initial differences, what really distinguishes one descriptor from another is the tradeoff that it achieves between discriminative power and invariance. Since this trade-off must vary from task to task, no single descriptor can be optimal in all situations. Our focus, in this paper, is on learning the optimal tradeoff for classification given a particular training set and prior constraints. The problem is posed in the kernel learning framework. We learn the optimal, domain-specific kernel as a combination of base kernels corresponding to base features which achieve different levels of trade-off (such as no invariance, rotation invariance, scale invariance, affine invariance, etc.) This leads to a convex optimisation problem with a unique global optimum which can be solved for efficiently. The method is shown to achieve state-of-the-art performance on the UIUC textures, Oxford flowers and Caltech 101 datasets. 1.
A Fast Local Descriptor for Dense Matching
, 2008
"... We introduce a novel local image descriptor designed for dense wide-baseline matching purposes. We feed our descriptors to a graph-cuts based dense depth map estimation algorithm and this yields better wide-baseline performance than the commonly used correlation windows for which the size is hard to ..."
Abstract
-
Cited by 35 (2 self)
- Add to MetaCart
We introduce a novel local image descriptor designed for dense wide-baseline matching purposes. We feed our descriptors to a graph-cuts based dense depth map estimation algorithm and this yields better wide-baseline performance than the commonly used correlation windows for which the size is hard to tune. As a result, unlike competing techniques that require many high-resolution images to produce good reconstructions, our descriptor can compute them from pairs of low-quality images such as the ones captured by video streams. Our descriptor is inspired from earlier ones such as SIFT and GLOH but can be computed much faster for our purposes. Unlike SURF which can also be computed efficiently at every pixel, it does not introduce artifacts that degrade the matching performance. Our approach was tested with ground truth laser scanned depth maps as well as on a wide variety of image pairs of different resolutions and we show that good reconstructions are achieved even with only two low quality images.
Discriminant Embedding for Local Image Descriptors
"... Invariant feature descriptors such as SIFT and GLOH have been demonstrated to be very robust for image matching and visual recognition. However, such descriptors are generally parameterised in very high dimensional spaces e.g. 128 dimensions in the case of SIFT. This limits the performance of featur ..."
Abstract
-
Cited by 34 (3 self)
- Add to MetaCart
Invariant feature descriptors such as SIFT and GLOH have been demonstrated to be very robust for image matching and visual recognition. However, such descriptors are generally parameterised in very high dimensional spaces e.g. 128 dimensions in the case of SIFT. This limits the performance of feature matching techniques in terms of speed and scalability. Furthermore, these descriptors have traditionally been carefully hand crafted by manually tuning many parameters. In this paper, we tackle both of these problems by formulating descriptor design as a nonparametric dimensionality reduction problem. In contrast to previous approaches that use only the global statistics of the inputs, we adopt a discriminative approach. Starting from a large training set of labelled match/non-match pairs, we pursue lower dimensional embeddings that are optimised for their discriminative power. Extensive comparative experiments demonstrate that we can exceed the performance of the current state of the art techniques such as SIFT with far fewer dimensions, and with virtually no parameters to be tuned by hand.
Daisy: An efficient dense descriptor applied to wide baseline stereo
- IEEE TRANS. PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2010
"... In this paper, we introduce a local image descriptor, DAISY, which is very efficient to compute densely. We also present an EM-based algorithm to compute dense depth and occlusion maps from wide-baseline image pairs using this descriptor. This yields much better results in wide-baseline situations t ..."
Abstract
-
Cited by 27 (9 self)
- Add to MetaCart
In this paper, we introduce a local image descriptor, DAISY, which is very efficient to compute densely. We also present an EM-based algorithm to compute dense depth and occlusion maps from wide-baseline image pairs using this descriptor. This yields much better results in wide-baseline situations than the pixel and correlation-based algorithms that are commonly used in narrowbaseline stereo. Also, using a descriptor makes our algorithm robust against many photometric and geometric transformations. Our descriptor is inspired from earlier ones such as SIFT and GLOH but can be computed much faster for our purposes. Unlike SURF, which can also be computed efficiently at every pixel, it does not introduce artifacts that degrade the matching performance when used densely. It is important to note that our approach is the first algorithm that attempts to estimate dense depth maps from wide-baseline image pairs, and we show that it is a good one at that with many experiments for depth estimation accuracy, occlusion detection, and comparing it against other descriptors on laser-scanned ground truth scenes. We also tested our approach on a variety of indoor and outdoor scenes with different photometric and geometric transformations and our experiments support our claim to being robust against these.
PageRank for Product Image Search
- IN: WWW 2008. REFEREED TRACK: RICH MEDIA
, 2008
"... In this paper, we cast the image-ranking problem into the task of identifying “authority” nodes on an inferred visual similarity graph and propose an algorithm to analyze the visual link structure that can be created among a group of images. Through an iterative procedure based on the PageRank compu ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
In this paper, we cast the image-ranking problem into the task of identifying “authority” nodes on an inferred visual similarity graph and propose an algorithm to analyze the visual link structure that can be created among a group of images. Through an iterative procedure based on the PageRank computation, a numerical weight is assigned to each image; this measures its relative importance to the other images being considered. The incorporation of visual signals in this process differs from the majority of largescale commercial-search engines in use today. Commercial search-engines often solely rely on the text clues of the pages in which images are embedded to rank images, and often entirely ignore the content of the images themselves as a ranking signal. To quantify the performance of our approach in a real-world system, we conducted a series of experiments based on the task of retrieving images for 2000 of the most popular products queries. Our experimental results show significant improvement, in terms of user satisfaction and relevancy, in comparison to the most recent Google Image Search results.
LDAHash: Improved matching with smaller descriptors
, 2010
"... SIFT-like local feature descriptors are ubiquitously employed in such computer vision applications as content-based retrieval, video analysis, copy detection, object recognition, photo-tourism and 3D reconstruction. Feature descriptors can be designed to be invariant to certain classes of photometri ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
SIFT-like local feature descriptors are ubiquitously employed in such computer vision applications as content-based retrieval, video analysis, copy detection, object recognition, photo-tourism and 3D reconstruction. Feature descriptors can be designed to be invariant to certain classes of photometric and geometric transformations, in particular, affine and intensity scale transformations. However, real transformations that an image can undergo can only be approximately modeled in this way, and thus most descriptorsareonlyapproximatelyinvariantinpractice. Secondly, descriptors are usually high-dimensional (e.g. SIFT is represented as a 128-dimensional vector). In large-scale retrieval and matching problems, this can pose challenges in storing and retrieving descriptor data. We map the descriptor vectors into the Hamming space, in which the Hamming metric is used to compare the resulting representations. This way, we reduce the size of the descriptors by representing them as short binary strings and learn descriptor invariance from examples. We show extensive experimental validation, demonstrating the advantage of the proposed approach.
CueFlik: Interactive Concept Learning in Image Search
"... Web image search is difficult in part because a handful of keywords are generally insufficient for characterizing the visual properties of an image. Popular engines have begun to provide tags based on simple characteristics of images (such as tags for black and white images or images that contain a ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
Web image search is difficult in part because a handful of keywords are generally insufficient for characterizing the visual properties of an image. Popular engines have begun to provide tags based on simple characteristics of images (such as tags for black and white images or images that contain a face), but such approaches are limited by the fact that it is unclear what tags end-users want to be able to use in examining Web image search results. This paper presents CueFlik, a Web image search application that allows end-users to quickly create their own rules for re-ranking images based on their visual characteristics. End-users can then re-rank any future Web image search results according to their rule. In an experiment we present in this paper, end-users quickly create effective rules for such concepts as “product photos”, “portraits of people”, and “clipart”. When asked to conceive of and create their own rules, participants create such rules as “sports action shot ” with images from queries for “basketball ” and “football”. CueFlik represents both a promising new approach to Web image search and an important study in end-user interactive machine learning.
Improving Descriptors for Fast Tree Matching by Optimal Linear Projection
"... In this paper we propose to transform an image descriptor so that nearest neighbor (NN) search for correspondences becomes the optimal matching strategy under the assumption that inter-image deviations of corresponding descriptors have Gaussian distribution. The Euclidean NN in the transformed domai ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
In this paper we propose to transform an image descriptor so that nearest neighbor (NN) search for correspondences becomes the optimal matching strategy under the assumption that inter-image deviations of corresponding descriptors have Gaussian distribution. The Euclidean NN in the transformed domain corresponds to the NN according to a truncated Mahalanobis metric in the original descriptor space. We provide theoretical justification for the proposed approach and show experimentally that the transformation allows a significant dimensionality reduction and improves matching performance of a state-of-the art SIFT descriptor. We observe consistent improvement in precision-recall and speed of fast matching in tree structures at the expense of little overhead for projecting the descriptors into transformed space. In the context of SIFT vs. transformed M-SIFT comparison, tree search structures are evaluated according to different criteria and query types. All search tree experiments confirm that transformed M-SIFT performs better than the original SIFT. 1.
Multiple Target Localisation at over 100 FPS
, 2009
"... This paper presents a method for fast feature-based matching which enables 7 independent targets to be localised in a video sequence with an average total processing time of 7.46ms per frame. We extend recent work [14] on fast matching using Histogrammed Intensity Patches (HIPs) by adding a rotation ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
This paper presents a method for fast feature-based matching which enables 7 independent targets to be localised in a video sequence with an average total processing time of 7.46ms per frame. We extend recent work [14] on fast matching using Histogrammed Intensity Patches (HIPs) by adding a rotation invariant framework and a treebased lookup scheme. Compared to state-of-the-art fast localisation schemes [15] we achieve better matching robustness in under a quarter of the computation time and requiring 5-10 times less memory.
Transform Coding of Image Feature Descriptors
"... We investigate transform coding to efficiently store and transmit SIFT and SURF image descriptors. We show that image and feature matching algorithms are robust to significantly compressed features. We achieve nearperfect image matching and retrieval for both SIFT and SURF using ∼2 bits/dimension. W ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
We investigate transform coding to efficiently store and transmit SIFT and SURF image descriptors. We show that image and feature matching algorithms are robust to significantly compressed features. We achieve nearperfect image matching and retrieval for both SIFT and SURF using ∼2 bits/dimension. When applied to SIFT and SURF, this provides a 16 × compression relative to conventional floating point representation. We establish a strong correlation between MSE and matching error for feature points and images. Feature compression enables many application that may not otherwise be possible, especially on mobile devices.

