Results 1  10
of
90
An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions
 ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1994
"... Consider a set S of n data points in real ddimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q 2 R d , the closest point of S to q can be reported quickly. Given any po ..."
Abstract

Cited by 786 (31 self)
 Add to MetaCart
Consider a set S of n data points in real ddimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q 2 R d , the closest point of S to q can be reported quickly. Given any positive real ffl, a data point p is a (1 + ffl)approximate nearest neighbor of q if its distance from q is within a factor of (1 + ffl) of the distance to the true nearest neighbor. We show that it is possible to preprocess a set of n points in R d in O(dn log n) time and O(dn) space, so that given a query point q 2 R d , and ffl ? 0, a (1 + ffl)approximate nearest neighbor of q can be computed in O(c d;ffl log n) time, where c d;ffl d d1 + 6d=ffle d is a factor depending only on dimension and ffl. In general, we show that given an integer k 1, (1 + ffl)approximations to the k nearest neighbors of q can be computed in additional O(kd log n) time.
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality
, 1998
"... The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the ddimens ..."
Abstract

Cited by 715 (33 self)
 Add to MetaCart
The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the ddimensional Euclidean space where X = ! d under some l p norm. Despite decades of effort, the current solutions are far from satisfactory; in fact, for large d, in theory or in practice, they provide little improvement over the bruteforce algorithm which compares the query point to each data point. Of late, there has been some interest in the approximate nearest neighbors problem, which is: Find a point p 2 P that is an fflapproximate nearest neighbor of the query q in that for all p 0 2 P , d(p; q) (1 + ffl)d(p 0 ; q). We present two algorithmic results for the approximate version that significantly improve the known bounds: (a) preprocessing cost polynomial in n and d, and a trul...
Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces
, 1993
"... We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation is very high. Also relevant are highdim ..."
Abstract

Cited by 273 (4 self)
 Add to MetaCart
We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation is very high. Also relevant are highdimensional Euclidian settings in which the distribution of data is in some sense of lower dimension and embedded in the space. The vptree (vantage point tree) is introduced in several forms, together with associated algorithms, as an improved method for these difficult search problems. Tree construction executes in O(n log(n)) time, and search is under certain circumstances and in the limit, O(log(n)) expected time. The theoretical basis for this approach is developed and the results of several experiments are reported. In Euclidian cases, kdtree performance is compared.
Geometric Range Searching and Its Relatives
 CONTEMPORARY MATHEMATICS
"... ... process a set S of points in so that the points of S lying inside a query R region can be reported or counted quickly. Wesurvey the known techniques and data structures for range searching and describe their application to other related searching problems. ..."
Abstract

Cited by 256 (40 self)
 Add to MetaCart
... process a set S of points in so that the points of S lying inside a query R region can be reported or counted quickly. Wesurvey the known techniques and data structures for range searching and describe their application to other related searching problems.
Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces
, 1998
"... We address the problem of designing data structures that allow efficient search for approximate nearest neighbors. More specifically, given a database consisting of a set of vectors in some high dimensional Euclidean space, we want to construct a spaceefficient data structure that would allow us to ..."
Abstract

Cited by 188 (9 self)
 Add to MetaCart
We address the problem of designing data structures that allow efficient search for approximate nearest neighbors. More specifically, given a database consisting of a set of vectors in some high dimensional Euclidean space, we want to construct a spaceefficient data structure that would allow us to search, given a query vector, for the closest or nearly closest vector in the database. We also address this problem when distances are measured by the L 1 norm, and in the Hamming cube. Significantly improving and extending recent results of Kleinberg, we construct data structures whose size is polynomial in the size of the database, and search algorithms that run in time nearly linear or nearly quadratic in the dimension (depending on the case; the extra factors are polylogarithmic in the size of the database). Computer Science Department, Technion  IIT, Haifa 32000, Israel. Email: eyalk@cs.technion.ac.il y Bell Communications Research, MCC1C365B, 445 South Street, Morristown, NJ ...
Two Algorithms for NearestNeighbor Search in High Dimensions
, 1997
"... Representing data as points in a highdimensional space, so as to use geometric methods for indexing, is an algorithmic technique with a wide array of uses. It is central to a number of areas such as information retrieval, pattern recognition, and statistical data analysis; many of the problems aris ..."
Abstract

Cited by 169 (0 self)
 Add to MetaCart
Representing data as points in a highdimensional space, so as to use geometric methods for indexing, is an algorithmic technique with a wide array of uses. It is central to a number of areas such as information retrieval, pattern recognition, and statistical data analysis; many of the problems arising in these applications can involve several hundred or several thousand dimensions. We consider the nearestneighbor problem for ddimensional Euclidean space: we wish to preprocess a database of n points so that given a query point, one can efficiently determine its nearest neighbors in the database. There is a large literature on algorithms for this problem, in both the exact and approximate cases. The more sophisticated algorithms typically achieve a query time that is logarithmic in n at the expense of an exponential dependence on the dimension d; indeed, even the averagecase analysis of heuristics such as kd trees reveals an exponential dependence on d in the query time. In this wor...
Combinatorial Geometry
, 1995
"... Abstract. Let P be a set of n points in ~d (where d is a small fixed positive integer), and let F be a collection of subsets of ~d, each of which is defined by a constant number of bounded degree polynomial inequalities. We consider the following Frange searching problem: Given P, build a data stru ..."
Abstract

Cited by 164 (26 self)
 Add to MetaCart
Abstract. Let P be a set of n points in ~d (where d is a small fixed positive integer), and let F be a collection of subsets of ~d, each of which is defined by a constant number of bounded degree polynomial inequalities. We consider the following Frange searching problem: Given P, build a data structure for efficient answering of queries of the form, "Given a 7 ~ F, count (or report) the points of P lying in 7." Generalizing the simplex range searching techniques, we give a solution with nearly linear space and preprocessing time and with O(n 1 x/b+~) query time, where d < b < 2d 3 and ~> 0 is an arbitrarily small constant. The acutal value of b is related to the problem of partitioning arrangements of algebraic surfaces into cells with a constant description complexity. We present some of the applications of Frange searching problem, including improved ray shooting among triangles in ~3 1.
RAY SHOOTING AND PARAMETRIC SEARCH
, 1993
"... Efficient algorithms for the ray shooting problem are presented: Given a collection F of objects in d, build a data structure so that, for a query ray, the first object of F hit by the ray can be quickly determined. Using the parametric search technique, this problem is reduced to the segment emptin ..."
Abstract

Cited by 127 (25 self)
 Add to MetaCart
Efficient algorithms for the ray shooting problem are presented: Given a collection F of objects in d, build a data structure so that, for a query ray, the first object of F hit by the ray can be quickly determined. Using the parametric search technique, this problem is reduced to the segment emptiness problem. For various ray shooting problems, space/querytime tradeoffs of the following type are achieved: For some integer b and a parameter m (n _< m < n b) the queries are answered in time O((n/m /b) log <) n), with O(m!+) space and preprocessing time (t> 0 is arbitrarily small but fixed constant), b Ld/2J is obtained for ray shooting in a convex dpolytope defined as an intersection of n half spaces, b d for an arrangement of n hyperplanes in d, and b 3 for an arrangement of n half planes in 3. This approach also yields fast procedures for finding the first k objects hit by a query ray, for searching nearest and farthest neighbors, and for the hidden surface removal. All the data structures can be maintained dynamically in amortized time O (m + / n) per insert/delete operation.
A Simple Algorithm for Nearest Neighbor Search in High Dimensions
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1997
"... Abstract—The problem of finding the closest point in highdimensional spaces is common in pattern recognition. Unfortunately, the complexity of most existing search algorithms, such as kd tree and Rtree, grows exponentially with dimension, making them impractical for dimensionality above 15. In ne ..."
Abstract

Cited by 126 (1 self)
 Add to MetaCart
Abstract—The problem of finding the closest point in highdimensional spaces is common in pattern recognition. Unfortunately, the complexity of most existing search algorithms, such as kd tree and Rtree, grows exponentially with dimension, making them impractical for dimensionality above 15. In nearly all applications, the closest point is of interest only if it lies within a userspecified distance e. We present a simple and practical algorithm to efficiently search for the nearest neighbor within Euclidean distance e. The use of projection search combined with a novel data structure dramatically improves performance in high dimensions. A complexity analysis is presented which helps to automatically determine e in structured problems. A comprehensive set of benchmarks clearly shows the superiority of the proposed algorithm for a variety of structured and unstructured search problems. Object recognition is demonstrated as an example application. The simplicity of the algorithm makes it possible to construct an inexpensive hardware search engine which can be 100 times faster than its software equivalent. A C++ implementation of our algorithm is available upon request to search@cs.columbia.edu/CAVE/.
Nearest neighbor queries in metric spaces
 Discrete Comput. Geom
, 1997
"... Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives data structures for this problem when the sites and queries are in a metric spa ..."
Abstract

Cited by 115 (1 self)
 Add to MetaCart
Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives data structures for this problem when the sites and queries are in a metric space. One data structure, D(S), uses a divideandconquer recursion. The other data structure, M(S, Q), is somewhat like a skiplist. Both are simple and implementable. The data structures are analyzed when the metric space obeys a certain spherepacking bound, and when the sites and query points are random and have distributions with an exchangeability property. This property implies, for example, that query point q is a random element of S ∪ {q}. Under these conditions, the preprocessing and space bounds for the algorithms are close to linear in n. They depend also on the spherepacking bound, and on the logarithm of the distance ratio Υ(S) of S, the ratio of the distance between the farthest pair of points in S to the distance between the closest pair. The data structure M(S, Q) requires as input data an additional set Q, taken to be representative of the query points. The resource bounds of M(S, Q) have a dependence on the distance ratio of S ∪ Q. While M(S, Q) can return wrong answers, its failure probability can be bounded, and is decreasing in a parameter K. Here K ≤ Q/n is chosen when building M(S, Q). The expected query time for M(S, Q) is O(K log n) log Υ(S ∪ Q), and the resource bounds increase linearly in K. The data structure D(S) has expected O(log n) O(1) query time, for fixed distance ratio. The preprocessing algorithm for M(S, Q) can be used to solve the allnearestneighbor problem for S in O(n(log n) 2 (log Υ(S)) 2) expected time. 1