Results 1  10
of
37
An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions
 ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1994
"... Consider a set S of n data points in real ddimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q 2 R d , the closest point of S to q can be reported quickly. Given any po ..."
Abstract

Cited by 786 (31 self)
 Add to MetaCart
Consider a set S of n data points in real ddimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q 2 R d , the closest point of S to q can be reported quickly. Given any positive real ffl, a data point p is a (1 + ffl)approximate nearest neighbor of q if its distance from q is within a factor of (1 + ffl) of the distance to the true nearest neighbor. We show that it is possible to preprocess a set of n points in R d in O(dn log n) time and O(dn) space, so that given a query point q 2 R d , and ffl ? 0, a (1 + ffl)approximate nearest neighbor of q can be computed in O(c d;ffl log n) time, where c d;ffl d d1 + 6d=ffle d is a factor depending only on dimension and ffl. In general, we show that given an integer k 1, (1 + ffl)approximations to the k nearest neighbors of q can be computed in additional O(kd log n) time.
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality
, 1998
"... The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the ddimens ..."
Abstract

Cited by 715 (33 self)
 Add to MetaCart
The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the ddimensional Euclidean space where X = ! d under some l p norm. Despite decades of effort, the current solutions are far from satisfactory; in fact, for large d, in theory or in practice, they provide little improvement over the bruteforce algorithm which compares the query point to each data point. Of late, there has been some interest in the approximate nearest neighbors problem, which is: Find a point p 2 P that is an fflapproximate nearest neighbor of the query q in that for all p 0 2 P , d(p; q) (1 + ffl)d(p 0 ; q). We present two algorithmic results for the approximate version that significantly improve the known bounds: (a) preprocessing cost polynomial in n and d, and a trul...
When Is "Nearest Neighbor" Meaningful?
 In Int. Conf. on Database Theory
, 1999
"... . We explore the effect of dimensionality on the "nearest neighbor " problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the fa ..."
Abstract

Cited by 292 (1 self)
 Add to MetaCart
. We explore the effect of dimensionality on the "nearest neighbor " problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 1015 dimensions. These results should not be interpreted to mean that highdimensional indexing is never meaningful; we illustrate this point by identifying some highdimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate highdimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple...
Distance Browsing in Spatial Databases
, 1999
"... Two different techniques of browsing through a collection of spatial objects stored in an Rtree spatial data structure on the basis of their distances from an arbitrary spatial query object are compared. The conventional approach is one that makes use of a knearest neighbor algorithm where k is kn ..."
Abstract

Cited by 291 (19 self)
 Add to MetaCart
Two different techniques of browsing through a collection of spatial objects stored in an Rtree spatial data structure on the basis of their distances from an arbitrary spatial query object are compared. The conventional approach is one that makes use of a knearest neighbor algorithm where k is known prior to the invocation of the algorithm. Thus if m#kneighbors are needed, the knearest neighbor algorithm needs to be reinvoked for m neighbors, thereby possibly performing some redundant computations. The second approach is incremental in the sense that having obtained the k nearest neighbors, the k +1 st neighbor can be obtained without having to calculate the k +1nearest neighbors from scratch. The incremental approach finds use when processing complex queries where one of the conditions involves spatial proximity (e.g., the nearest city to Chicago with population greater than a million), in which case a query engine can make use of a pipelined strategy. A general incremental nearest neighbor algorithm is presented that is applicable to a large class of hierarchical spatial data structures. This algorithm is adapted to the Rtree and its performance is compared to an existing knearest neighbor algorithm for Rtrees [45]. Experiments show that the incremental nearest neighbor algorithm significantly outperforms the knearest neighbor algorithm for distance browsing queries in a spatial database that uses the Rtree as a spatial index. Moreover, the incremental nearest neighbor algorithm also usually outperforms the knearest neighbor algorithm when applied to the knearest neighbor problem for the Rtree, although the improvement is not nearly as large as for distance browsing queries. In fact, we prove informally that, at any step in its execution, the incremental...
Approximate Range Searching
 in Proc. 11th Annu. ACM Sympos. Comput. Geom
, 1995
"... The range searching problem is a fundamental problem in computational geometry, with numerous important applications. Most research has focused on solving this problem exactly, but lower bounds show that if linear space is assumed, the problem cannot be solved in polylogarithmic time, except for the ..."
Abstract

Cited by 86 (20 self)
 Add to MetaCart
The range searching problem is a fundamental problem in computational geometry, with numerous important applications. Most research has focused on solving this problem exactly, but lower bounds show that if linear space is assumed, the problem cannot be solved in polylogarithmic time, except for the case of orthogonal ranges. In this paper we show that if one is willing to allow approximate ranges, then it is possible to do much better. In particular, given a bounded range Q of diameter w and ffl ? 0, an approximate range query treats the range as a fuzzy object, meaning that points lying within distance fflw of the boundary of Q either may or may not be counted. We show that in any fixed dimension d, a set of n points in R d can be preprocessed in O(n log n) time and O(n) space, such that approximate queries can be answered in O(logn + (1=ffl) d ) time. The only assumption we make about ranges is that the intersection of a range and a ddimensional cube can be answered in const...
Properties of embedding methods for similarity searching in metric spaces
 PAMI
, 2003
"... Complex data types—such as images, documents, DNA sequences, etc.—are becoming increasingly important in modern database applications. A typical query in many of these applications seeks to find objects that are similar to some target object, where (dis)similarity is defined by some distance functi ..."
Abstract

Cited by 80 (4 self)
 Add to MetaCart
Complex data types—such as images, documents, DNA sequences, etc.—are becoming increasingly important in modern database applications. A typical query in many of these applications seeks to find objects that are similar to some target object, where (dis)similarity is defined by some distance function. Often, the cost of evaluating the distance between two objects is very high. Thus, the number of distance evaluations should be kept at a minimum, while (ideally) maintaining the quality of the result. One way to approach this goal is to embed the data objects in a vector space so that the distances of the embedded objects approximates the actual distances. Thus, queries can be performed (for the most part) on the embedded objects. In this paper, we are especially interested in examining the issue of whether or not the embedding methods will ensure that no relevant objects are left out (i.e., there are no false dismissals and, hence, the correct result is reported). Particular attention is paid to the SparseMap, FastMap, and MetricMap embedding methods. SparseMap is a variant of Lipschitz embeddings, while FastMap and MetricMap are inspired by dimension reduction methods for Euclidean spaces (using KLT or the related PCA and SVD). We show that, in general, none of these embedding methods guarantee that queries on the embedded objects have no false dismissals, while also demonstrating the limited cases in which the guarantee does hold. Moreover, we describe a variant of SparseMap that allows queries with no false dismissals. In addition, we show that with FastMap and MetricMap, the distances of the embedded objects can be much greater than the actual distances. This makes it impossible (or at least impractical) to modify FastMap and MetricMap to guarantee no false dismissals.
ClosestPoint Problems in Computational Geometry
, 1997
"... This is the preliminary version of a chapter that will appear in the Handbook on Computational Geometry, edited by J.R. Sack and J. Urrutia. A comprehensive overview is given of algorithms and data structures for proximity problems on point sets in IR D . In particular, the closest pair problem, th ..."
Abstract

Cited by 65 (14 self)
 Add to MetaCart
This is the preliminary version of a chapter that will appear in the Handbook on Computational Geometry, edited by J.R. Sack and J. Urrutia. A comprehensive overview is given of algorithms and data structures for proximity problems on point sets in IR D . In particular, the closest pair problem, the exact and approximate postoffice problem, and the problem of constructing spanners are discussed in detail. Contents 1 Introduction 1 2 The static closest pair problem 4 2.1 Preliminary remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Algorithms that are optimal in the algebraic computation tree model . 5 2.2.1 An algorithm based on the Voronoi diagram . . . . . . . . . . . 5 2.2.2 A divideandconquer algorithm . . . . . . . . . . . . . . . . . . 5 2.2.3 A plane sweep algorithm . . . . . . . . . . . . . . . . . . . . . . 6 2.3 A deterministic algorithm that uses indirect addressing . . . . . . . . . 7 2.3.1 The degraded grid . . . . . . . . . . . . . . . . . . ...
Approximate Nearest Neighbor Queries Revisited
, 1998
"... This paper proposes new methods to answer approximate nearest neighbor queries on a set of n points in ddimensional Euclidean space. For any fixed constant d, a data structure with O(" (1\Gammad)=2 n log n) preprocessing time and O(" (1\Gammad)=2 log n) query time achieves approximation factor ..."
Abstract

Cited by 57 (3 self)
 Add to MetaCart
This paper proposes new methods to answer approximate nearest neighbor queries on a set of n points in ddimensional Euclidean space. For any fixed constant d, a data structure with O(" (1\Gammad)=2 n log n) preprocessing time and O(" (1\Gammad)=2 log n) query time achieves approximation factor 1 + " for any given 0 ! " ! 1; a variant reduces the "dependence by a factor of " \Gamma1=2 . For any arbitrary d, a data structure with O(d 2 n log n) preprocessing time and O(d 2 log n) query time achieves approximation factor O(d 3=2 ). Applications to various proximity problems are discussed. 1 Introduction Let P be a set of n point sites in ddimensional space IR d . In the wellknown post office problem, we want to preprocess P into a data structure so that a site closest to a given query point q (called the nearest neighbor of q) can be found efficiently. Distances are measured under the Euclidean metric. The post office problem has many applications within computational...
QuerySensitive Ray Shooting
 IN PROC. 10TH ANNU. ACM SYMPOS. COMPUT. GEOM
, 1994
"... Ray (segment) shooting is the problem of determining the first intersection between a ray (directed line segment) and a collection of polygonal or polyhedral obstacles. In order to process queries efficiently, the set of obstacle polyhedra is usually preprocessed into a data structure. In this pa ..."
Abstract

Cited by 48 (10 self)
 Add to MetaCart
Ray (segment) shooting is the problem of determining the first intersection between a ray (directed line segment) and a collection of polygonal or polyhedral obstacles. In order to process queries efficiently, the set of obstacle polyhedra is usually preprocessed into a data structure. In this paper, we propose a querysensitive data structure for ray shooting, which means that the performance of our data structure depends on the "local" geometry of obstacles near the query segment. We measure the complexity of the local geometry near the segment by a parameter called the simple cover complexity , denoted by scc(s) for a segment s. Our data structure consists of a subdivision that partitions the space into a collection of polyhedral cells of O(1) complexity. We answer a segment shooting query by walking along the segment through the subdivision. Our first result is that, for any fixed dimension d, there exists a simple hierarchical subdivision in which no query segment s int...
Lower bounds for high dimensional nearest neighbor search and related problems
, 1999
"... In spite of extensive and continuing research, for various geometric search problems (such as nearest neighbor search), the best algorithms known have performance that degrades exponentially in the dimension. This phenomenon is sometimes called the curse of dimensionality. Recent results [38, 37, 40 ..."
Abstract

Cited by 47 (2 self)
 Add to MetaCart
In spite of extensive and continuing research, for various geometric search problems (such as nearest neighbor search), the best algorithms known have performance that degrades exponentially in the dimension. This phenomenon is sometimes called the curse of dimensionality. Recent results [38, 37, 40] show that in some sense it is possible to avoid the curse of dimensionality for the approximate nearest neighbor search problem. But must the exact nearest neighbor search problem suffer this curse? We provide some evidence in support of the curse. Specifically we investigate the exact nearest neighbor search problem and the related problem of exact partial match within the asymmetric communication model first used by Miltersen [43] to study data structure problems. We derive nontrivial asymptotic lower bounds for the exact problem that stand in contrast to known algorithms for approximate nearest neighbor search. 1