Results 1 - 10
of
196
Finding Nearest Neighbors in Growth-restricted Metrics
- In 34th Annual ACM Symposium on the Theory of Computing
, 2002
"... Most research on nearest neighbor algorithms in the literature has been focused on the Euclidean case. In many practical search problems however, the underlying metric is non-Euclidean. Nearest neighbor algorithms for general metric spaces are quite weak, which motivates a search for other classes o ..."
Abstract
-
Cited by 123 (0 self)
- Add to MetaCart
Most research on nearest neighbor algorithms in the literature has been focused on the Euclidean case. In many practical search problems however, the underlying metric is non-Euclidean. Nearest neighbor algorithms for general metric spaces are quite weak, which motivates a search for other classes of metric spaces that can be tractably searched.
Index-driven similarity search in metric spaces
- ACM Transactions on Database Systems
, 2003
"... Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search th ..."
Abstract
-
Cited by 118 (6 self)
- Add to MetaCart
Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search that make the general assumption that similarity is represented with a distance metric d. Existing methods for handling similarity search in this setting typically fall into one of two classes. The first directly indexes the objects based on distances (distance-based indexing), while the second is based on mapping to a vector space (mapping-based approach). The main part of this article is dedicated to a survey of distance-based indexing methods, but we also briefly outline how search occurs in mapping-based methods. We also present a general framework for performing search based on distances, and present algorithms for common types of queries that operate on an arbitrary “search hierarchy. ” These algorithms can be applied on each of the methods presented, provided a suitable search hierarchy is defined.
Nearest-neighbor searching and metric space dimensions
- In Nearest-Neighbor Methods for Learning and Vision: Theory and Practice
, 2006
"... Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distan ..."
Abstract
-
Cited by 63 (0 self)
- Add to MetaCart
Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distance function as a “black box”. The structure is able to speed up nearest neighbor searching in a variety of settings, for example: points in low-dimensional or structured Euclidean space, strings under Hamming and edit distance, and bit vector data from an OCR application. The data structures are observed to need linear space, with a modest constant factor. The preprocessing time needed per site is observed to match the query time. The data structure can be viewed as an application of a “kd-tree ” approach in the metric space setting, using Voronoi regions of a subset in place of axis-aligned boxes. 1
Searching in Metric Spaces by Spatial Approximation
, 1999
"... We propose a new data structure to search in metric spaces. A metric space is formed by a collection of objects and a distance function defined among them, which satisfies the triangle inequality. The goal is, given a set of objects and a query, retrieve those objects close enough to the query. The ..."
Abstract
-
Cited by 62 (20 self)
- Add to MetaCart
We propose a new data structure to search in metric spaces. A metric space is formed by a collection of objects and a distance function defined among them, which satisfies the triangle inequality. The goal is, given a set of objects and a query, retrieve those objects close enough to the query. The complexity measure is the number of distances computed to achieve this goal. Our data structure, called sa-tree ("spatial approximation tree"), is based on approaching spatially the searched objects, that is, getting closer and closer to them, rather than the classical divide-and-conquer approach of other data structures. We analyze our method and show that the number of distance evaluations to search among n objects is sublinear. We show experimentally that the sa-tree is the best existing technique when the metric space is hard to search or the query has low selectivity. These are the most important unsolved cases in real applications. As a practical advantage, our data structure is one of the few that do not need to tune parameters, which makes it appealing for use by non-experts.
Pivot Selection Techniques for Proximity Searching in Metric Spaces
, 2001
"... With few exceptions, proximity search algorithms in metric spaces based on the use of pivots select them at random among the objects of the metric space. However, it is well known that the way in which the pivots are selected can drastically a#ect the performance of the algorithm. Between two sets o ..."
Abstract
-
Cited by 47 (6 self)
- Add to MetaCart
With few exceptions, proximity search algorithms in metric spaces based on the use of pivots select them at random among the objects of the metric space. However, it is well known that the way in which the pivots are selected can drastically a#ect the performance of the algorithm. Between two sets of pivots of the same size, better chosen pivots can largely reduce the search time. Alternatively, a better chosen small set of pivots (requiring much less space) can yield the same e#ciency as a larger, randomly chosen, set. We propose an e#ciency measure to compare two pivot sets, combined with an optimization technique that allows us to select good sets of pivots. We obtain abundant empirical evidence showing that our technique is e#ective, and it is the first that we are aware of in producing consistently good results in a wide variety of cases and in being based on a formal theory. We also show that good pivots are outliers, but that selecting outliers does not ensure that good pivots are selected.
Feature-based similarity search in 3D object databases
- ACM Computing Surveys
, 2005
"... The development of effective content-based multimedia search systems is an important research issue due to the growing amount of digital audio-visual information. In the case of images and video, the growth of digital data has been observed since the introduction of 2D capture devices. A similar dev ..."
Abstract
-
Cited by 45 (10 self)
- Add to MetaCart
The development of effective content-based multimedia search systems is an important research issue due to the growing amount of digital audio-visual information. In the case of images and video, the growth of digital data has been observed since the introduction of 2D capture devices. A similar development is expected for 3D data as
Intrinsic Dimension Estimation Using Packing Numbers
, 2003
"... We propose a new algorithm to estimate the intrinsic dimension of data sets. The method is based on geometric properties of the data and requires neither parametric assumptions on the data generating model nor input parameters to set. The method is compared to a similar, widely-used algorithm from t ..."
Abstract
-
Cited by 43 (0 self)
- Add to MetaCart
We propose a new algorithm to estimate the intrinsic dimension of data sets. The method is based on geometric properties of the data and requires neither parametric assumptions on the data generating model nor input parameters to set. The method is compared to a similar, widely-used algorithm from the same family of geometric techniques. Experiments show that our method is more robust in terms of the data generating distribution and more reliable in the presence of noise.
A Review of Algorithms for Audio Fingerprinting
- In Workshop on Multimedia Signal Processing
, 2002
"... An audio fingerprint is a content-based compact signature that summarizes an audio recording. Audio Fingerprinting technologies have recently attracted attention since they allow the monitoring of audio independently of its format and without the need of meta-data or watermark embedding. The differe ..."
Abstract
-
Cited by 42 (2 self)
- Add to MetaCart
An audio fingerprint is a content-based compact signature that summarizes an audio recording. Audio Fingerprinting technologies have recently attracted attention since they allow the monitoring of audio independently of its format and without the need of meta-data or watermark embedding. The different approaches to fingerprinting are usually described with different rationales and terminology depending on the background: Pattern matching, Multimedia (Music) Information Retrieval or Cryptography (Robust Hashing). In this paper, we review different techniques mapping functional parts to blocks of a unified framework.
On the Marriage of L_p-norms and Edit Distance
- IN VLDB
, 2004
"... Existing studies on time series are based on two categories of distance functions. The first category consists of the Lp-norms. They are metric distance functions but cannot support local time shifting. The second category consists of distance functions which are capable of handling local time shift ..."
Abstract
-
Cited by 36 (1 self)
- Add to MetaCart
Existing studies on time series are based on two categories of distance functions. The first category consists of the Lp-norms. They are metric distance functions but cannot support local time shifting. The second category consists of distance functions which are capable of handling local time shifting but are nonmetric. The first
XenoSearch: Distributed Resource Discovery in the XenoServer Open Platform
- In Proceedings of HPDC
, 2003
"... We describe the XenoSearch system for performing expressive resource discovery searches in a distributed environment. We represent server meta-data, such as their locations and facilities, as points in a multi-dimensional space and then express queries as predicates over these points. Each XenoSearc ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
We describe the XenoSearch system for performing expressive resource discovery searches in a distributed environment. We represent server meta-data, such as their locations and facilities, as points in a multi-dimensional space and then express queries as predicates over these points. Each XenoSearch node holds a portion of this space and the key goal of XenoSearch is to direct queries to those nodes containing the meta-data of matching XenoServers. Communication between these XenoSearch nodes is based on the self-organizing Pastry peer-to-peer routing substrate. Our initial performance evaluation on a wide-area prototype shows that queries are only a factor of 3-5 times longer than basic Pastry routing, while supporting multi-dimensional searches of arbitrary shapes.

