Results 1  10
of
140
An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions
 ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1994
"... Consider a set S of n data points in real ddimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q 2 R d , the closest point of S to q can be reported quickly. Given any po ..."
Abstract

Cited by 786 (31 self)
 Add to MetaCart
Consider a set S of n data points in real ddimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q 2 R d , the closest point of S to q can be reported quickly. Given any positive real ffl, a data point p is a (1 + ffl)approximate nearest neighbor of q if its distance from q is within a factor of (1 + ffl) of the distance to the true nearest neighbor. We show that it is possible to preprocess a set of n points in R d in O(dn log n) time and O(dn) space, so that given a query point q 2 R d , and ffl ? 0, a (1 + ffl)approximate nearest neighbor of q can be computed in O(c d;ffl log n) time, where c d;ffl d d1 + 6d=ffle d is a factor depending only on dimension and ffl. In general, we show that given an integer k 1, (1 + ffl)approximations to the k nearest neighbors of q can be computed in additional O(kd log n) time.
Multidimensional Access Methods
, 1998
"... Search operations in databases require special support at the physical level. This is true for conventional databases as well as spatial databases, where typical search operations include the point query (find all objects that contain a given search point) and the region query (find all objects that ..."
Abstract

Cited by 561 (3 self)
 Add to MetaCart
Search operations in databases require special support at the physical level. This is true for conventional databases as well as spatial databases, where typical search operations include the point query (find all objects that contain a given search point) and the region query (find all objects that overlap a given search region). More
A Quantitative Analysis and Performance Study for SimilaritySearch Methods in HighDimensional Spaces
, 1998
"... For similarity search in highdimensional vector spaces (or `HDVSs'), researchers have proposed a number of new methods (or adaptations of existing methods) based, in the main, on dataspace partitioning. However, the performance of these methods generally degrades as dimensionality increases. Altho ..."
Abstract

Cited by 487 (12 self)
 Add to MetaCart
For similarity search in highdimensional vector spaces (or `HDVSs'), researchers have proposed a number of new methods (or adaptations of existing methods) based, in the main, on dataspace partitioning. However, the performance of these methods generally degrades as dimensionality increases. Although this phenomenonknown as the `dimensional curse'is well known, little or no quantitative analysis of the phenomenon is available. In this paper, we provide a detailed analysis of partitioning and clustering techniques for similarity search in HDVSs. We show formally that these methods exhibit linear complexity at high dimensionality, and that existing methods are outperformed on average by a simple sequential scan if the number of dimensions exceeds around 10. Consequently, we come up with an alternative organization based on approximations to make the unavoidable sequential scan as fast as possible. We describe a simple vector approximation scheme, called VAfile, and report on an ...
Similarity Indexing with the SStree
 In Proceedings of the 12th International Conference on Data Engineering
, 1996
"... jain0ece.ucsd.edu ..."
When Is "Nearest Neighbor" Meaningful?
 In Int. Conf. on Database Theory
, 1999
"... . We explore the effect of dimensionality on the "nearest neighbor " problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the fa ..."
Abstract

Cited by 292 (1 self)
 Add to MetaCart
. We explore the effect of dimensionality on the "nearest neighbor " problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 1015 dimensions. These results should not be interpreted to mean that highdimensional indexing is never meaningful; we illustrate this point by identifying some highdimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate highdimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple...
On Packing Rtrees
 In ACM CIKM
, 1993
"... – main idea; file structure – algorithms: insertion/split – deletion – search: range, nn, spatial joins – performance analysis – variations (packed; hilbert;...) 15721 Copyright: C. Faloutsos (2001) 2 Problem • Given a collection of geometric objects (points, lines, polygons,...) • organize them on ..."
Abstract

Cited by 220 (16 self)
 Add to MetaCart
– main idea; file structure – algorithms: insertion/split – deletion – search: range, nn, spatial joins – performance analysis – variations (packed; hilbert;...) 15721 Copyright: C. Faloutsos (2001) 2 Problem • Given a collection of geometric objects (points, lines, polygons,...) • organize them on disk, to answer spatial queries (range, nn, etc) 15721 Copyright: C. Faloutsos (2001) 3 1 (Who cares?)
Generalized Search Trees for Database Systems
 IN PROC. 21 ST INTERNATIONAL CONFERENCE ON VLDB
, 1995
"... This paper introduces the Generalized Search Tree (GiST), an index structure supporting an extensible set of queries and data types. The GiST allows new data types to be indexed in a manner supporting queries natural to the types; this is in contrast to previous work on tree extensibility which only ..."
Abstract

Cited by 205 (19 self)
 Add to MetaCart
This paper introduces the Generalized Search Tree (GiST), an index structure supporting an extensible set of queries and data types. The GiST allows new data types to be indexed in a manner supporting queries natural to the types; this is in contrast to previous work on tree extensibility which only supported the traditional set of equality and range predicates. In a single data structure, the GiST provides all the basic search tree logic required by a database system, thereby unifying disparate structures such as B+trees and Rtrees in a single piece of code, and opening the application of search trees to general extensibility. To illustrate the exibility of the GiST, we provide simple method implementations that allow it to behave like a B+tree, an Rtree, and an RDtree, a new index for data with setvalued attributes. We also present a preliminary performance analysis of RDtrees, which leads to discussion on the nature of tree indices and how they behave for various datasets.
Dimensionality Reduction for Similarity Searching in Dynamic Databases
, 1998
"... Databases are increasingly being used to store multimedia objects such as maps, images, audio and video. Storage and retrieval of these objects is accomplished using multidimensional index structures such as R*trees and SStrees. As dimensionality increases, query performance in these index struc ..."
Abstract

Cited by 100 (5 self)
 Add to MetaCart
Databases are increasingly being used to store multimedia objects such as maps, images, audio and video. Storage and retrieval of these objects is accomplished using multidimensional index structures such as R*trees and SStrees. As dimensionality increases, query performance in these index structures degrades. This phenomenon, generally referred to as the dimensionality curse, can be circumvented by reducing the dimensionality of the data. Such a reduction is however accompanied by a loss of precision of query results. Current techniques such as QBIC use SVD transformbased dimensionality reduction to ensure high query precision. The drawback of this approach is that SVD is expensive to compute, and therefore not readily applicable to dynamic databases. In this paper, we propose novel techniques for performing SVDbased dimensionality reduction in dynamic databases. When the data distribution changes considerably so as to degrade query precision, we recompute the SVD transform a...
Indexing Large Metric Spaces for Similarity Search Queries
, 1999
"... In many database applications, one of the common queries is to find approximate matches to a given query item from a collection of data items. For example, given an image database, one may want to retrieve all images that are similar to a given query image. Distance based index structures are propos ..."
Abstract

Cited by 66 (0 self)
 Add to MetaCart
In many database applications, one of the common queries is to find approximate matches to a given query item from a collection of data items. For example, given an image database, one may want to retrieve all images that are similar to a given query image. Distance based index structures are proposed for applications where the distance computations between objects of the data domain are expensive (such as high dimensional data), and the distance function used is metric. In this paper, we consider using distancebased index structures for similarity queries on large metric spaces. We elaborate on the approach of using reference points (vantage points) to partition the data space into spherical shelllike regions in a hierarchical manner. We introduce the multivantage point tree structure (mvptree) that uses more than one vantage points to partition the space into spherical cuts at each level. In answering similarity based queries, the mvptree also utilizes the precomputed (at con...