Results 1  10
of
43
An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions
 ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1994
"... Consider a set S of n data points in real ddimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q 2 R d , the closest point of S to q can be reported quickly. Given any po ..."
Abstract

Cited by 786 (31 self)
 Add to MetaCart
Consider a set S of n data points in real ddimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q 2 R d , the closest point of S to q can be reported quickly. Given any positive real ffl, a data point p is a (1 + ffl)approximate nearest neighbor of q if its distance from q is within a factor of (1 + ffl) of the distance to the true nearest neighbor. We show that it is possible to preprocess a set of n points in R d in O(dn log n) time and O(dn) space, so that given a query point q 2 R d , and ffl ? 0, a (1 + ffl)approximate nearest neighbor of q can be computed in O(c d;ffl log n) time, where c d;ffl d d1 + 6d=ffle d is a factor depending only on dimension and ffl. In general, we show that given an integer k 1, (1 + ffl)approximations to the k nearest neighbors of q can be computed in additional O(kd log n) time.
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality
, 1998
"... The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the ddimens ..."
Abstract

Cited by 715 (33 self)
 Add to MetaCart
The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the ddimensional Euclidean space where X = ! d under some l p norm. Despite decades of effort, the current solutions are far from satisfactory; in fact, for large d, in theory or in practice, they provide little improvement over the bruteforce algorithm which compares the query point to each data point. Of late, there has been some interest in the approximate nearest neighbors problem, which is: Find a point p 2 P that is an fflapproximate nearest neighbor of the query q in that for all p 0 2 P , d(p; q) (1 + ffl)d(p 0 ; q). We present two algorithmic results for the approximate version that significantly improve the known bounds: (a) preprocessing cost polynomial in n and d, and a trul...
Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces
, 1998
"... We address the problem of designing data structures that allow efficient search for approximate nearest neighbors. More specifically, given a database consisting of a set of vectors in some high dimensional Euclidean space, we want to construct a spaceefficient data structure that would allow us to ..."
Abstract

Cited by 188 (9 self)
 Add to MetaCart
We address the problem of designing data structures that allow efficient search for approximate nearest neighbors. More specifically, given a database consisting of a set of vectors in some high dimensional Euclidean space, we want to construct a spaceefficient data structure that would allow us to search, given a query vector, for the closest or nearly closest vector in the database. We also address this problem when distances are measured by the L 1 norm, and in the Hamming cube. Significantly improving and extending recent results of Kleinberg, we construct data structures whose size is polynomial in the size of the database, and search algorithms that run in time nearly linear or nearly quadratic in the dimension (depending on the case; the extra factors are polylogarithmic in the size of the database). Computer Science Department, Technion  IIT, Haifa 32000, Israel. Email: eyalk@cs.technion.ac.il y Bell Communications Research, MCC1C365B, 445 South Street, Morristown, NJ ...
Two Algorithms for NearestNeighbor Search in High Dimensions
, 1997
"... Representing data as points in a highdimensional space, so as to use geometric methods for indexing, is an algorithmic technique with a wide array of uses. It is central to a number of areas such as information retrieval, pattern recognition, and statistical data analysis; many of the problems aris ..."
Abstract

Cited by 169 (0 self)
 Add to MetaCart
Representing data as points in a highdimensional space, so as to use geometric methods for indexing, is an algorithmic technique with a wide array of uses. It is central to a number of areas such as information retrieval, pattern recognition, and statistical data analysis; many of the problems arising in these applications can involve several hundred or several thousand dimensions. We consider the nearestneighbor problem for ddimensional Euclidean space: we wish to preprocess a database of n points so that given a query point, one can efficiently determine its nearest neighbors in the database. There is a large literature on algorithms for this problem, in both the exact and approximate cases. The more sophisticated algorithms typically achieve a query time that is logarithmic in n at the expense of an exponential dependence on the dimension d; indeed, even the averagecase analysis of heuristics such as kd trees reveals an exponential dependence on d in the query time. In this wor...
Nearest Neighbors In HighDimensional Spaces
, 2004
"... In this chapter we consider the following problem: given a set P of points in a highdimensional space, construct a data structure which given any query point q nds the point in P closest to q. This problem, called nearest neighbor search is of significant importance to several areas of computer sci ..."
Abstract

Cited by 76 (2 self)
 Add to MetaCart
In this chapter we consider the following problem: given a set P of points in a highdimensional space, construct a data structure which given any query point q nds the point in P closest to q. This problem, called nearest neighbor search is of significant importance to several areas of computer science, including pattern recognition, searching in multimedial data, vector compression [GG91], computational statistics [DW82], and data mining. Many of these applications involve data sets which are very large (e.g., a database containing Web documents could contain over one billion documents). Moreover, the dimensionality of the points is usually large as well (e.g., in the order of a few hundred). Therefore, it is crucial to design algorithms which scale well with the database size as well as with the dimension. The nearestneighbor problem is an example of a large class of proximity problems, which, roughly speaking, are problems whose definitions involve the notion of...
BMultiProbe LSH: Efficient indexing for highdimensional similarity search
 in Proc. 33rd Int. Conf. Very Large Data Bases
"... Similarity indices for highdimensional data are very desirable for building contentbased search systems for featurerich data such as audio, images, videos, and other sensor data. Recently, locality sensitive hashing (LSH) and its variations have been proposed as indexing techniques for approximate ..."
Abstract

Cited by 51 (3 self)
 Add to MetaCart
Similarity indices for highdimensional data are very desirable for building contentbased search systems for featurerich data such as audio, images, videos, and other sensor data. Recently, locality sensitive hashing (LSH) and its variations have been proposed as indexing techniques for approximate similarity search. A significant drawback of these approaches is the requirement for a large number of hash tables in order to achieve good search quality. This paper proposes a new indexing scheme called multiprobe LSH that overcomes this drawback. Multiprobe LSH is built on the wellknown LSH technique, but it intelligently probes multiple buckets that are likely to contain query results in a hash table. Our method is inspired by and improves upon recent theoretical work on entropybased LSH designed to reduce the space requirement of the basic LSH method. We have implemented the multiprobe LSH method and evaluated the implementation with two different highdimensional datasets. Our evaluation shows that the multiprobe LSH method substantially improves upon previously proposed methods in both space and time efficiency. To achieve the same search quality, multiprobe LSH has a similar timeefficiency as the basic LSH method while reducing the number of hash tables by an order of magnitude. In comparison with the entropybased LSH method, to achieve the same search quality, multiprobe LSH uses less query time and 5 to 8 times fewer number of hash tables. 1.
LocalityPreserving Hashing in Multidimensional Spaces
 In Proceedings of the 29th ACM Symposium on Theory of Computing
, 1997
"... this paper was published in Proceedings of the 29th Annual ACM Symposium on Theory of Computing, pages 618625, 1997 ..."
Abstract

Cited by 50 (4 self)
 Add to MetaCart
this paper was published in Proceedings of the 29th Annual ACM Symposium on Theory of Computing, pages 618625, 1997
Lower bounds for high dimensional nearest neighbor search and related problems
, 1999
"... In spite of extensive and continuing research, for various geometric search problems (such as nearest neighbor search), the best algorithms known have performance that degrades exponentially in the dimension. This phenomenon is sometimes called the curse of dimensionality. Recent results [38, 37, 40 ..."
Abstract

Cited by 47 (2 self)
 Add to MetaCart
In spite of extensive and continuing research, for various geometric search problems (such as nearest neighbor search), the best algorithms known have performance that degrades exponentially in the dimension. This phenomenon is sometimes called the curse of dimensionality. Recent results [38, 37, 40] show that in some sense it is possible to avoid the curse of dimensionality for the approximate nearest neighbor search problem. But must the exact nearest neighbor search problem suffer this curse? We provide some evidence in support of the curse. Specifically we investigate the exact nearest neighbor search problem and the related problem of exact partial match within the asymmetric communication model first used by Miltersen [43] to study data structure problems. We derive nontrivial asymptotic lower bounds for the exact problem that stand in contrast to known algorithms for approximate nearest neighbor search. 1
On Approximate Nearest Neighbors in NonEuclidean Spaces
 In FOCS
, 1998
"... The nearest neighbor search (NNS) problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding a point in P closest to a query point q 2 X. The approximate nearest neighbor search (cNNS) is a ..."
Abstract

Cited by 33 (8 self)
 Add to MetaCart
The nearest neighbor search (NNS) problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding a point in P closest to a query point q 2 X. The approximate nearest neighbor search (cNNS) is a relaxation of NNS which allows to return any point within c times the distance to the nearest neighbor (called cnearest neighbor). This problem is of major and growing importance to a variety of applications. In this paper, we give an algorithm for (4dlog 1+ae log 4de + 3)NNS algorithm in l d 1 with O(dn 1+ae log n) storage and O(d log n) query time. In particular, this yields the first algorithm for O(1)NNS for l 1 with subexponential storage. The preprocessing time is close to linear in the size of the data structure. The algorithm can be also used (after simple modifications) to output the exact nearest neighbor in time bounded by O(d log n) plus the number of (4dlog 1+ae log 4d...
Reductions Among High Dimensional Proximity Problems
, 2000
"... We present improved running times for a wide range of approximate high dimensional proximity problems. We obtain subquadratic running time for each of these problems. These improved running times are obtained by reduction to Nearest Neighbour queries. The problems we consider in this paper are Ap ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
We present improved running times for a wide range of approximate high dimensional proximity problems. We obtain subquadratic running time for each of these problems. These improved running times are obtained by reduction to Nearest Neighbour queries. The problems we consider in this paper are Approximate Diameter, Approximate Furthest Neighbours, Approximate Discrete Center, Approximate Line Center, Approximate Metric Facility Location, Approximate Bottleneck Matching, and Approximate Minimum Weight Matching. University of Southern California. Email: agoel@cs.usc.edu . y Stanford University. Email: indyk@cs.stanford.edu . z University of Iowa. Email: kvaradar@cs.uiowa.edu . 0 Problem Ref Approx. Time Comments Diameter [10] p 3 O(dn) [12] 1 + ffl O(dn log n + n 2 ) [2] 1 + ffl ~ O(n 2\GammaO(ffl 2 ) + dn) [18] 1 + ffl ~ O(n 1+1=(1+ffl=6) + dn) here 1 + ffl ~ O(n 1+1=(1+ffl) + dn) ~ O(n) (1 + ffl)NNS queries here p 2 ~ O(dn) see Section 3 for some e...