Results 1 - 10
of
21
A PERFORMANCE EVALUATION OF LOCAL DESCRIPTORS
, 2005
"... In this paper we compare the performance of descriptors computed for local interest regions, as for example extracted by the Harris-Affine detector [32]. Many different descriptors have been proposed in the literature. However, it is unclear which descriptors are more appropriate and how their perfo ..."
Abstract
-
Cited by 775 (24 self)
- Add to MetaCart
In this paper we compare the performance of descriptors computed for local interest regions, as for example extracted by the Harris-Affine detector [32]. Many different descriptors have been proposed in the literature. However, it is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [3], steerable filters [12], PCA-SIFT [19], differential invariants [20], spin images [21], SIFT [26], complex filters [37], moment invariants [43], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor, and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.
PCA-SIFT: A more distinctive representation for local image descriptors
, 2004
"... Stable local feature detection and representation is a fundamental component of many image registration and object recognition algorithms. Mikolajczyk and Schmid [14] recently evaluated a variety of approaches and identified the SIFT [11] algorithm as being the most resistant to common image deforma ..."
Abstract
-
Cited by 237 (6 self)
- Add to MetaCart
Stable local feature detection and representation is a fundamental component of many image registration and object recognition algorithms. Mikolajczyk and Schmid [14] recently evaluated a variety of approaches and identified the SIFT [11] algorithm as being the most resistant to common image deformations. This paper examines (and improves upon) the local image descriptor used by SIFT. Like SIFT, our descriptors encode the salient aspects of the image gradient in the feature point's neighborhood; however, instead of using SIFT's smoothed weighted histograms, we apply Principal Components Analysis (PCA) to the normalized gradient patch. Our experiments demonstrate that the PCAbased local descriptors are more distinctive, more robust to image deformations, and more compact than the standard SIFT representation. We also present results showing that using these descriptors in an image retrieval application results in increased accuracy and faster matching.
Simultaneous object recognition and segmentation by image exploration
- In Proceedings of the European Conference on Computer Vision
, 2004
"... Abstract. Methods based on local, viewpoint invariant features have proven capable of recognizing objects in spite of viewpoint changes, occlusion and clutter. However, these approaches fail when these factors are too strong, due to the limited repeatability and discriminative power of the features. ..."
Abstract
-
Cited by 93 (13 self)
- Add to MetaCart
Abstract. Methods based on local, viewpoint invariant features have proven capable of recognizing objects in spite of viewpoint changes, occlusion and clutter. However, these approaches fail when these factors are too strong, due to the limited repeatability and discriminative power of the features. As additional shortcomings, the objects need to be rigid and only their approximate location is found. We present an object recognition approach which overcomes these limitations. An initial set of feature correspondences is first generated. The method anchors on it and then gradually explores the surrounding area, trying to construct more and more matching features, increasingly farther from the initial ones. The resulting process covers the object with matches, and simultaneously separates the correct matches from the wrong ones. Hence, recognition and segmentation are achieved at the same time. Only very few correct initial matches suffice for reliable recognition. Experimental results on still images and television news broadcasts demonstrate the stronger power of the presented method in dealing with extensive clutter, dominant occlusion, large scale and viewpoint changes. Moreover non-rigid deformations are explicitly taken into account, and the approximative contours of the object are produced. The approach can extend any viewpoint invariant feature extractor. 1
Wide-baseline multiple-view correspondences
- In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, 2003
"... We present a novel approach for establishing multiple-view feature correspondences along an unordered set of images taken from substantially different viewpoints. While recently several wide-baseline stereo (WBS) algorithms have appeared, the N-view case is largely unexplored. In this paper, an esta ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
We present a novel approach for establishing multiple-view feature correspondences along an unordered set of images taken from substantially different viewpoints. While recently several wide-baseline stereo (WBS) algorithms have appeared, the N-view case is largely unexplored. In this paper, an established WBS algorithm is used to extract and match features in pairs of views. The pairwise matches are first integrated into disjoint feature tracks, each representing a single physical surface patch in several views. By exploiting the interplay between the tracks, they are extended over more views, while unrelated image features are removed. Similarity and spatial relationships between the features are simultaneously used. The output consists of many reliable and accurate feature tracks, strongly connecting the input views. Applications include 3D reconstruction and object recognition. The proposed approach is not restricted to the particular choice of features and matching criteria. It can extend any method that provides feature correspondences between pairs of images. 1.
A Thousand Words in a Scene
, 2007
"... This paper presents a novel approach for visual scene modeling and classification, investigating the combined use of text modeling methods and local invariant features. Our work attempts to elucidate (1) whether a text-like bag-of-visterms representation (histogram of quantized local visual feature ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
This paper presents a novel approach for visual scene modeling and classification, investigating the combined use of text modeling methods and local invariant features. Our work attempts to elucidate (1) whether a text-like bag-of-visterms representation (histogram of quantized local visual features) is suitable for scene (rather than object) classification, (2) whether some analogies between discrete scene representations and text documents exist, and (3) whether unsupervised, latent space models can be used both as feature extractors for the classification task and to discover patterns of visual co-occurrence. Using several data sets, we validate our approach, presenting and discussing experiments on each of these issues. We first show, with extensive experiments on binary and multi-class scene classification tasks using a 9500-image data set, that the bag-of-visterms representation consistently outperforms classical scene classification approaches. In other data sets we show that our approach competes with or outperforms other recent, more complex, methods. We also show that Probabilistic Latent Semantic Analysis (PLSA) generates a compact scene representation, discriminative for accurate classification, and more robust than the bag-of-visterms representation when less labeled training data is available. Finally, through aspect-based image ranking experiments, we show the ability of PLSA to automatically extract visually meaningful scene patterns, making such representation useful for browsing image collections.
Determining vision graphs for distributed camera networks using feature digests
- EURASIP Journal on Advances in Signal Processing 2007
, 2007
"... We propose a method for obtaining the vision graph for a distributed camera network, in which each camera is represented by a node, and an edge appears between two nodes if the two cameras jointly image a sufficiently large part of the environment. The technique is decentralized, requires no orderin ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
We propose a method for obtaining the vision graph for a distributed camera network, in which each camera is represented by a node, and an edge appears between two nodes if the two cameras jointly image a sufficiently large part of the environment. The technique is decentralized, requires no ordering on the set of cameras, and assumes that cameras can only communicate a finite amount of information with each other in order to establish the vision graph. Each camera first detects a large number of feature points that are approximately scale- and viewpoint-invariant. Both the number of features and the length of each feature descriptor are substantially reduced to form a fixed-length “feature digest” that is broadcast to the rest of the network. Each receiver camera decompresses the feature digest to recover approximate feature descriptors, robustly estimates the epipolar geometry to reject incorrect matches and grow additional ones, and decides whether sufficient evidence exists to form a vision graph edge. We use receiver-operating-characteristics (ROC) curves to analyze the performance of different message formation schemes, and show that high detection rates can be achieved while maintaining low false alarm rates. Finally, we show how a camera calibration algorithm that passes messages only along vision graph edges can recover accurate 3D structure and camera positions in a distributed manner. We demonstrate the accurate performance of the vision graph generation and camera calibration algorithms using a simulated 60-node outdoor camera network. In this simulation, we achieved vision graph edge detection probabilities exceeding 0.8 while maintaining false alarm probabilities below 0.05. I.
What's Beyond Query By Example?
, 2003
"... Over the last ten years, the crucial problem of information retrieval in multimedia documents has boosted research activities in the fieM of visual appearance indexing and retrieval by content. In the early research years, the concept of the "query by visual example" (QB FE) has been proposed and sh ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Over the last ten years, the crucial problem of information retrieval in multimedia documents has boosted research activities in the fieM of visual appearance indexing and retrieval by content. In the early research years, the concept of the "query by visual example" (QB FE) has been proposed and shown to be relevant for visual information retrieval. It is obvious that QBVE is not able to satisJ the multiple visual search usage requirements. In this paper, we focus on two major approaches that correspond to two different retrieval paradigms. First, we present the partial visual query that ignores the background of the images and allows a straight user expression on its visual interest without relevance feedback mechanism. The second retrieval paradigm consists in searching for the user mental target image when no starting visual example is available. A visual thesaurus is generated and allows query by logical composition of region categories. This query paradigm is closely related to that of text retrieval.
A.L.: Bottom-up & top-down object detection using primal sketch features and graphical models
- In: Proc. Intl. Conf. on Computer Vision and Pattern Recognition
, 2006
"... A combination of techniques that is becoming increasingly popular is the construction of part-based object representations using the outputs of interest-point detectors. Our contributions in this paper are twofold: first, we propose a primal-sketch-based set of image tokens that are used for object ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
A combination of techniques that is becoming increasingly popular is the construction of part-based object representations using the outputs of interest-point detectors. Our contributions in this paper are twofold: first, we propose a primal-sketch-based set of image tokens that are used for object representation and detection. Second, top-down information is introduced based on an efficient method for the evaluation of the likelihood of hypothesized part locations. This allows us to use graphical model techniques to complement bottom-up detection, by proposing and finding the parts of the object that were missed by the front-end feature detection stage. Detection results for four object categories validate the merits of this joint top-down and bottom-up approach.
Maximally stable local description for scale selection
- In ECCV, pages IV: 504–516
, 2006
"... Abstract. Scale and affine-invariant local features have shown excellent performance in image matching, object and texture recognition. This paper optimizes keypoint detection to achieve stable local descriptors, and therefore, an improved image representation. The technique performs scale selection ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. Scale and affine-invariant local features have shown excellent performance in image matching, object and texture recognition. This paper optimizes keypoint detection to achieve stable local descriptors, and therefore, an improved image representation. The technique performs scale selection based on a region descriptor, here SIFT, and chooses regions for which this descriptor is maximally stable. Maximal stability is obtained, when the difference between descriptors extracted for consecutive scales reaches a minimum. This scale selection technique is applied to multi-scale Harris and Laplacian points. Affine invariance is achieved by an integrated affine adaptation process based on the second moment matrix. An experimental evaluation compares our detectors to Harris-Laplace and the Laplacian in the context of image matching as well as of category and texture classification. The comparison shows the improved performance of our detector. 1

