Results 1  10
of
44
Searching in Metric Spaces
, 1999
"... The problem of searching the elements of a set which are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. We are interested in the rather ge ..."
Abstract

Cited by 321 (34 self)
 Add to MetaCart
The problem of searching the elements of a set which are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. We are interested in the rather general case where the similarity criterion defines a metric space, instead of the more restricted case of a vector space. A large number of solutions have been proposed in different areas, in many cases without crossknowledge. Because of this, the same ideas have been reinvented several times, and very different presentations have been given for the same approaches. We
Indexdriven similarity search in metric spaces
 ACM Transactions on Database Systems
, 2003
"... Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search th ..."
Abstract

Cited by 133 (6 self)
 Add to MetaCart
Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search that make the general assumption that similarity is represented with a distance metric d. Existing methods for handling similarity search in this setting typically fall into one of two classes. The first directly indexes the objects based on distances (distancebased indexing), while the second is based on mapping to a vector space (mappingbased approach). The main part of this article is dedicated to a survey of distancebased indexing methods, but we also briefly outline how search occurs in mappingbased methods. We also present a general framework for performing search based on distances, and present algorithms for common types of queries that operate on an arbitrary “search hierarchy. ” These algorithms can be applied on each of the methods presented, provided a suitable search hierarchy is defined.
Searching in Metric Spaces by Spatial Approximation
, 1999
"... We propose a new data structure to search in metric spaces. A metric space is formed by a collection of objects and a distance function defined among them, which satisfies the triangle inequality. The goal is, given a set of objects and a query, retrieve those objects close enough to the query. The ..."
Abstract

Cited by 64 (20 self)
 Add to MetaCart
We propose a new data structure to search in metric spaces. A metric space is formed by a collection of objects and a distance function defined among them, which satisfies the triangle inequality. The goal is, given a set of objects and a query, retrieve those objects close enough to the query. The complexity measure is the number of distances computed to achieve this goal. Our data structure, called satree ("spatial approximation tree"), is based on approaching spatially the searched objects, that is, getting closer and closer to them, rather than the classical divideandconquer approach of other data structures. We analyze our method and show that the number of distance evaluations to search among n objects is sublinear. We show experimentally that the satree is the best existing technique when the metric space is hard to search or the query has low selectivity. These are the most important unsolved cases in real applications. As a practical advantage, our data structure is one of the few that do not need to tune parameters, which makes it appealing for use by nonexperts.
Pivot Selection Techniques for Proximity Searching in Metric Spaces
, 2001
"... With few exceptions, proximity search algorithms in metric spaces based on the use of pivots select them at random among the objects of the metric space. However, it is well known that the way in which the pivots are selected can drastically a#ect the performance of the algorithm. Between two sets o ..."
Abstract

Cited by 52 (6 self)
 Add to MetaCart
With few exceptions, proximity search algorithms in metric spaces based on the use of pivots select them at random among the objects of the metric space. However, it is well known that the way in which the pivots are selected can drastically a#ect the performance of the algorithm. Between two sets of pivots of the same size, better chosen pivots can largely reduce the search time. Alternatively, a better chosen small set of pivots (requiring much less space) can yield the same e#ciency as a larger, randomly chosen, set. We propose an e#ciency measure to compare two pivot sets, combined with an optimization technique that allows us to select good sets of pivots. We obtain abundant empirical evidence showing that our technique is e#ective, and it is the first that we are aware of in producing consistently good results in a wide variety of cases and in being based on a formal theory. We also show that good pivots are outliers, but that selecting outliers does not ensure that good pivots are selected.
A flexible image database system for contentbased retrieval
 IN 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION
, 1999
"... There is a growing need for the ability to query image databases based on similarity of image content rather than strict keyword search. As distance computations can be expensive, there is a need for indexing systems and algorithms that can eliminate candidate images without performing distance calc ..."
Abstract

Cited by 35 (4 self)
 Add to MetaCart
There is a growing need for the ability to query image databases based on similarity of image content rather than strict keyword search. As distance computations can be expensive, there is a need for indexing systems and algorithms that can eliminate candidate images without performing distance calculations. As user needs may change from session to session, there is also a need for runtime creation of distance measures. In this paper, we present FIDS, “flexible image database system. ” FIDS allows the user to query the database based on complex combinations of dozens of predefined distance measures. Using an indexing scheme and algorithms based on the triangle inequality, FIDS can often return matches to the query image without directly comparing the query image to more than a small percentage of the database. This paper describes the technical contributions of the FIDS approach to contentbased image retrieval.
A compact space decomposition for effective metric indexing
 Pattern Recognition Letters
, 2005
"... Abstract The metric space model abstracts many proximity search problems, from nearestneighborclassifiers to textual and multimedia information retrieval. In this context, an index is a data structure that speeds up proximity queries. However, indexes lose their efficiency as the intrinsicdata dime ..."
Abstract

Cited by 27 (6 self)
 Add to MetaCart
Abstract The metric space model abstracts many proximity search problems, from nearestneighborclassifiers to textual and multimedia information retrieval. In this context, an index is a data structure that speeds up proximity queries. However, indexes lose their efficiency as the intrinsicdata dimensionality increases. In this paper we present a simple index called list of clusters (LC), which is based on a compact partitioning of the data set. The LC is shown to require little space,to be suitable both for main and secondary memory implementations, and most importantly, to be very resistant to the intrinsic dimensionality of the data set. In this aspect our structure isunbeaten. We finish with a discussion of the role of unbalancing in metric space searching, and how it permits trading memory space for construction time. 1 Introduction The problem of proximity searching has received much attention in recent times, due to an increasing interest in manipulating and retrieving the more and more common multimedia data. Multimedia data have to be classified, forecasted, filtered, organized, and so on. Their manipulation poses new challenges to classifiers and function approximators. The wellknown knearest neighbor (knn) classifier is a favorite candidate for this task for being simple enough and well understood. One of the main obstacles, however, of using this classifier for massive data classification is its linear complexity to find a set of k neighbors for a given query.
Fixed Queries Array: A Fast and Economical Data Structure for Proximity Searching
, 2001
"... . Pivotbased algorithms are effective tools for proximity searching in metric spaces. They allow trading space overhead for number of distance evaluations performed at query time. With additional search structures (that pose extra space overhead) they can also reduce the amount of side computations ..."
Abstract

Cited by 26 (12 self)
 Add to MetaCart
. Pivotbased algorithms are effective tools for proximity searching in metric spaces. They allow trading space overhead for number of distance evaluations performed at query time. With additional search structures (that pose extra space overhead) they can also reduce the amount of side computations. We introduce a new data structure, the Fixed Queries Array (FQA), whose novelties are (1) it permits sublinear extra CPU time without any extra data structure; (2) it permits trading number of pivots for their precision so as to make better use of the available memory. We show experimentally that the FQA is an efficient tool to search in metric spaces and that it compares favorably against other state of the art approaches. Its simplicity converts it into a simple yet effective tool for practitioners seeking for a blackbox method to plug in their applications. Keywords: Metric spaces, similarity search, range search, fixed queries tree. 1.
Incremental Similarity Search in Multimedia Databases
, 2000
"... Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some distance measure d, usually a distance metric. Existing methods for handling simi ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some distance measure d, usually a distance metric. Existing methods for handling similarity search in this setting fall into one of two classes. The first is based on mapping to a lowdimensionalvector space (making use of data structures such as the Rtree), while the second directly indexes the objects based on distances (making use of data structures such as the Mtree). We introduce a general framework for performing search based on distances, and present an incremental nearest neighbor algorithm that operates on an arbitrary "search hierarchy". We show how this framework can be applied in both classes of similarity search methods, by defining a suitable search hierarchy for a number of different indexing structures. Armed with an appropriate search hierarchy, our algorithm thus performs incremental similarity search, wherein the result objects are reported one by one in order of similarity to a query object, with as little effort as possible expended to produce each new result object. This is especially important in interactive database applications, as it makes it possible to display partial query results early. The incremental aspect also provides significant benefits in situations when the number of desired neighbors is unknown in advance. Furthermore, our algorithm is at least as efficient as existing knearest neighbor algorithms, in terms of the number of distance computations and index node accesses. In fact, provided that the search hierarchy is properly defined, our algorithm can be shown to be optimal in the sense of performing as few distance ...
Fast Approximate String Matching in a Dictionary
 In Proc. SPIRE'98
, 1998
"... A successful technique to search large textual databases allowing errors relies on an online search in the vocabulary of the text. To reduce the time of that online search, we index the vocabulary as a metric space. We show that with reasonable space overhead we can improve by a factor of two over t ..."
Abstract

Cited by 23 (8 self)
 Add to MetaCart
A successful technique to search large textual databases allowing errors relies on an online search in the vocabulary of the text. To reduce the time of that online search, we index the vocabulary as a metric space. We show that with reasonable space overhead we can improve by a factor of two over the fastest online algorithms, when the tolerated error level is low (which is reasonable in text searching). 1 Introduction Approximate string matching is a recurrent problem in many branches of computer science, with applications to text searching, computational biology, pattern recognition, signal processing, etc. The problem can be stated as follows: given a long text of length n, and a (comparatively short) pattern of length m, retrieve all the segments (or "occurrences") of the text whose edit distance to the pattern is at most k. The edit distance ed() between two strings is defined as the minimum number of character insertions, deletions and replacements needed to make them equal. I...
Efficient Image Retrieval With Multiple Distance Measures
, 1996
"... Introduction Consider the following simple model of an image database system: The user presents the system with a query image and asks for all images in the database which are "similar" to the query. The system then uses a predefined distance measure to compare the query image to each image in the ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
Introduction Consider the following simple model of an image database system: The user presents the system with a query image and asks for all images in the database which are "similar" to the query. The system then uses a predefined distance measure to compare the query image to each image in the database. It then returns the images which have the smallest computed distance to the query image. Some distance measures are computationally expensive to calculate. This cost can be prohibitive if the database of images is very large and the database system must compare the query image to each image in the database. Thus, it is desirable to reduce the number of total distance measure calculations performed for each query. For certain distance measures and data sets, indexing or clustering schemes can be used to reduce the number of direct comparisons. There are also schemes in the literature based on the triangle inequality which can reduce the number of distance measure calculation