Results 1  10
of
217
An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions
 ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1994
"... Consider a set S of n data points in real ddimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q 2 R d , the closest point of S to q can be reported quickly. Given any po ..."
Abstract

Cited by 984 (32 self)
 Add to MetaCart
Consider a set S of n data points in real ddimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q 2 R d , the closest point of S to q can be reported quickly. Given any positive real ffl, a data point p is a (1 + ffl)approximate nearest neighbor of q if its distance from q is within a factor of (1 + ffl) of the distance to the true nearest neighbor. We show that it is possible to preprocess a set of n points in R d in O(dn log n) time and O(dn) space, so that given a query point q 2 R d , and ffl ? 0, a (1 + ffl)approximate nearest neighbor of q can be computed in O(c d;ffl log n) time, where c d;ffl d d1 + 6d=ffle d is a factor depending only on dimension and ffl. In general, we show that given an integer k 1, (1 + ffl)approximations to the k nearest neighbors of q can be computed in additional O(kd log n) time.
Multidimensional Access Methods
, 1998
"... Search operations in databases require special support at the physical level. This is true for conventional databases as well as spatial databases, where typical search operations include the point query (find all objects that contain a given search point) and the region query (find all objects that ..."
Abstract

Cited by 686 (3 self)
 Add to MetaCart
Search operations in databases require special support at the physical level. This is true for conventional databases as well as spatial databases, where typical search operations include the point query (find all objects that contain a given search point) and the region query (find all objects that overlap a given search region).
A quantitative analysis and performance study for similaritysearch methods in high dimensional spaces, in:
 Proceedings of the 24th VLDB International Conference on Very Large Data Bases,
, 1998
"... ..."
When Is "Nearest Neighbor" Meaningful?
 In Int. Conf. on Database Theory
, 1999
"... . We explore the effect of dimensionality on the "nearest neighbor " problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance ..."
Abstract

Cited by 408 (2 self)
 Add to MetaCart
. We explore the effect of dimensionality on the "nearest neighbor " problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 1015 dimensions. These results should not be interpreted to mean that highdimensional indexing is never meaningful; we illustrate this point by identifying some highdimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate highdimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple...
Similarity Indexing with the SStree
 In Proceedings of the 12th International Conference on Data Engineering
, 1996
"... jain0ece.ucsd.edu ..."
Generalized Search Trees for Database Systems
 IN PROC. 21 ST INTERNATIONAL CONFERENCE ON VLDB
, 1995
"... This paper introduces the Generalized Search Tree (GiST), an index structure supporting an extensible set of queries and data types. The GiST allows new data types to be indexed in a manner supporting queries natural to the types; this is in contrast to previous work on tree extensibility which only ..."
Abstract

Cited by 237 (18 self)
 Add to MetaCart
(Show Context)
This paper introduces the Generalized Search Tree (GiST), an index structure supporting an extensible set of queries and data types. The GiST allows new data types to be indexed in a manner supporting queries natural to the types; this is in contrast to previous work on tree extensibility which only supported the traditional set of equality and range predicates. In a single data structure, the GiST provides all the basic search tree logic required by a database system, thereby unifying disparate structures such as B+trees and Rtrees in a single piece of code, and opening the application of search trees to general extensibility. To illustrate the exibility of the GiST, we provide simple method implementations that allow it to behave like a B+tree, an Rtree, and an RDtree, a new index for data with setvalued attributes. We also present a preliminary performance analysis of RDtrees, which leads to discussion on the nature of tree indices and how they behave for various datasets.
An Introduction to Spatial Database Systems
 THE VLDB JOURNAL
, 1994
"... We propose a definition of a spatial database system as a database system that offers spatial data types in its data model and query language, and supports ..."
Abstract

Cited by 214 (9 self)
 Add to MetaCart
We propose a definition of a spatial database system as a database system that offers spatial data types in its data model and query language, and supports
Dimensionality Reduction for Similarity Searching in Dynamic Databases
, 1998
"... Databases are increasingly being used to store multimedia objects such as maps, images, audio and video. Storage and retrieval of these objects is accomplished using multidimensional index structures such as R*trees and SStrees. As dimensionality increases, query performance in these index struc ..."
Abstract

Cited by 112 (6 self)
 Add to MetaCart
(Show Context)
Databases are increasingly being used to store multimedia objects such as maps, images, audio and video. Storage and retrieval of these objects is accomplished using multidimensional index structures such as R*trees and SStrees. As dimensionality increases, query performance in these index structures degrades. This phenomenon, generally referred to as the dimensionality curse, can be circumvented by reducing the dimensionality of the data. Such a reduction is however accompanied by a loss of precision of query results. Current techniques such as QBIC use SVD transformbased dimensionality reduction to ensure high query precision. The drawback of this approach is that SVD is expensive to compute, and therefore not readily applicable to dynamic databases. In this paper, we propose novel techniques for performing SVDbased dimensionality reduction in dynamic databases. When the data distribution changes considerably so as to degrade query precision, we recompute the SVD transform a...
iDistance: An Adaptive B+tree Based Indexing Method for Nearest Neighbor Search
, 2005
"... In this article, we present an efficient B +tree based indexing method, called iDistance, for Knearest neighbor (KNN) search in a highdimensional metric space. iDistance partitions the data based on a space or datapartitioning strategy, and selects a reference point for each partition. The data ..."
Abstract

Cited by 93 (10 self)
 Add to MetaCart
In this article, we present an efficient B +tree based indexing method, called iDistance, for Knearest neighbor (KNN) search in a highdimensional metric space. iDistance partitions the data based on a space or datapartitioning strategy, and selects a reference point for each partition. The data points in each partition are transformed into a single dimensional value based on their similarity with respect to the reference point. This allows the points to be indexed using a B +tree structure and KNN search to be performed using onedimensional range search. The choice of partition and reference points adapts the index structure to the data distribution. We conducted extensive experiments to evaluate the iDistance technique, and report results demonstrating its effectiveness. We also present a cost model for iDistance KNN search, which can be exploited in query optimization.