Results 1 
3 of
3
Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces
, 1993
"... We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation is very high. Also relevant are highdim ..."
Abstract

Cited by 273 (4 self)
 Add to MetaCart
We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation is very high. Also relevant are highdimensional Euclidian settings in which the distribution of data is in some sense of lower dimension and embedded in the space. The vptree (vantage point tree) is introduced in several forms, together with associated algorithms, as an improved method for these difficult search problems. Tree construction executes in O(n log(n)) time, and search is under certain circumstances and in the limit, O(log(n)) expected time. The theoretical basis for this approach is developed and the results of several experiments are reported. In Euclidian cases, kdtree performance is compared.
Excluded Middle Vantage Point Forests for Nearest Neighbor Search
 In DIMACS Implementation Challenge, ALENEX'99
, 1999
"... The excluded middle vantage point forest is a new data structure that supports worst case sublinear time searches in a metric space for nearest neighbors within a xed radius of arbitrary queries. Worst case performance depends on the dataset but is not aected by the distribution of queries. Our an ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
The excluded middle vantage point forest is a new data structure that supports worst case sublinear time searches in a metric space for nearest neighbors within a xed radius of arbitrary queries. Worst case performance depends on the dataset but is not aected by the distribution of queries. Our analysis predicts vpforest performance in simple settings such as L p spaces with uniform random datasets  and experiments conrm these predictions. Another contribution of the analysis is a new perspective on the curse of dimensionality in the context of our methods and kdtrees as well. In our idealized setting the dataset is organized into a forest of O(N 1 ) trees, each of depth O(log N ). Here may be viewed as depending on , the distance function, and on the dataset. The radius of interest is an input to the organization process and the result is a linear space data structure specialized to answer queries within this distance. Searches then require O(N 1 log N) time, or...
Locally Lifting the Curse of Dimensionality for Nearest Neighbor Search (Extended Abstract)
 IN PROC. 11TH ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA'00
, 1999
"... We consider the problem of nearest neighbor search in the Euclidean hypercube [ 1, +1]^d with uniform distributions, and the additional natural assumption that the nearest neighbor is located within a constant fraction R of the maximum interpoint distance in this space, i.e. within distance 2R&radic ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
We consider the problem of nearest neighbor search in the Euclidean hypercube [ 1, +1]^d with uniform distributions, and the additional natural assumption that the nearest neighbor is located within a constant fraction R of the maximum interpoint distance in this space, i.e. within distance 2R√d of the query. We introduce the idea of aggressive pruning and give a family of practical algorithms, an idealized analysis, and describe experiments. Our main result is that search complexity measured in terms of ddimensional inner product operations, is i) strongly sublinear with respect to the data set size n for moderate R, ii) asymptotically, and as a practical matter, independent of dimension. Given a random data set, a random query within distance 2R√d of some database element, and a randomly constructed data structure, the search succeeds with a specified probability, which is a parameter of the search algorithm. On average a search performs...