Two Algorithms for Nearest-Neighbor Search in High Dimensions
user correction - Legacy Corrections
Jon M. Kleinberg
SVM HeaderParse 0.1
; 2; IBM Almaden Research Center, San Jose CA 95120, on leave from Department of Computer Science,; Cornell University, Ithaca NY 14853.
SVM HeaderParse 0.2
Representing data as points in a high-dimensional space, so as to use geometric methods for indexing, is an algorithmic technique with a wide array of uses. It is central to a number of areas such as information retrieval, pattern recognition, and statistical data analysis; many of the problems arising in these applications can involve several hundred or several thousand dimensions. We consider the nearest-neighbor problem for d-dimensional Euclidean space: we wish to pre-process a database of n points so that given a query point, one can efficiently determine its nearest neighbors in the database. There is a large literature on algorithms for this problem, in both the exact and approximate cases. The more sophisticated algorithms typically achieve a query time that is logarithmic in n at the expense of an exponential dependence on the dimension d; indeed, even the averagecase analysis of heuristics such as k-d trees reveals an exponential dependence on d in the query time. In this wor...