Results 1  10
of
118
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality
, 1998
"... The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the ddimens ..."
Abstract

Cited by 715 (33 self)
 Add to MetaCart
The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the ddimensional Euclidean space where X = ! d under some l p norm. Despite decades of effort, the current solutions are far from satisfactory; in fact, for large d, in theory or in practice, they provide little improvement over the bruteforce algorithm which compares the query point to each data point. Of late, there has been some interest in the approximate nearest neighbors problem, which is: Find a point p 2 P that is an fflapproximate nearest neighbor of the query q in that for all p 0 2 P , d(p; q) (1 + ffl)d(p 0 ; q). We present two algorithmic results for the approximate version that significantly improve the known bounds: (a) preprocessing cost polynomial in n and d, and a trul...
Searching in Metric Spaces
, 1999
"... The problem of searching the elements of a set which are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. We are interested in the rather ge ..."
Abstract

Cited by 321 (34 self)
 Add to MetaCart
The problem of searching the elements of a set which are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. We are interested in the rather general case where the similarity criterion defines a metric space, instead of the more restricted case of a vector space. A large number of solutions have been proposed in different areas, in many cases without crossknowledge. Because of this, the same ideas have been reinvented several times, and very different presentations have been given for the same approaches. We
Distance Browsing in Spatial Databases
, 1999
"... Two different techniques of browsing through a collection of spatial objects stored in an Rtree spatial data structure on the basis of their distances from an arbitrary spatial query object are compared. The conventional approach is one that makes use of a knearest neighbor algorithm where k is kn ..."
Abstract

Cited by 291 (19 self)
 Add to MetaCart
Two different techniques of browsing through a collection of spatial objects stored in an Rtree spatial data structure on the basis of their distances from an arbitrary spatial query object are compared. The conventional approach is one that makes use of a knearest neighbor algorithm where k is known prior to the invocation of the algorithm. Thus if m#kneighbors are needed, the knearest neighbor algorithm needs to be reinvoked for m neighbors, thereby possibly performing some redundant computations. The second approach is incremental in the sense that having obtained the k nearest neighbors, the k +1 st neighbor can be obtained without having to calculate the k +1nearest neighbors from scratch. The incremental approach finds use when processing complex queries where one of the conditions involves spatial proximity (e.g., the nearest city to Chicago with population greater than a million), in which case a query engine can make use of a pipelined strategy. A general incremental nearest neighbor algorithm is presented that is applicable to a large class of hierarchical spatial data structures. This algorithm is adapted to the Rtree and its performance is compared to an existing knearest neighbor algorithm for Rtrees [45]. Experiments show that the incremental nearest neighbor algorithm significantly outperforms the knearest neighbor algorithm for distance browsing queries in a spatial database that uses the Rtree as a spatial index. Moreover, the incremental nearest neighbor algorithm also usually outperforms the knearest neighbor algorithm when applied to the knearest neighbor problem for the Rtree, although the improvement is not nearly as large as for distance browsing queries. In fact, we prove informally that, at any step in its execution, the incremental...
Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces
, 1993
"... We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation is very high. Also relevant are highdim ..."
Abstract

Cited by 273 (4 self)
 Add to MetaCart
We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation is very high. Also relevant are highdimensional Euclidian settings in which the distribution of data is in some sense of lower dimension and embedded in the space. The vptree (vantage point tree) is introduced in several forms, together with associated algorithms, as an improved method for these difficult search problems. Tree construction executes in O(n log(n)) time, and search is under certain circumstances and in the limit, O(log(n)) expected time. The theoretical basis for this approach is developed and the results of several experiments are reported. In Euclidian cases, kdtree performance is compared.
Finding Nearest Neighbors in Growthrestricted Metrics
 In 34th Annual ACM Symposium on the Theory of Computing
, 2002
"... Most research on nearest neighbor algorithms in the literature has been focused on the Euclidean case. In many practical search problems however, the underlying metric is nonEuclidean. Nearest neighbor algorithms for general metric spaces are quite weak, which motivates a search for other classes o ..."
Abstract

Cited by 150 (0 self)
 Add to MetaCart
Most research on nearest neighbor algorithms in the literature has been focused on the Euclidean case. In many practical search problems however, the underlying metric is nonEuclidean. Nearest neighbor algorithms for general metric spaces are quite weak, which motivates a search for other classes of metric spaces that can be tractably searched.
Cover trees for nearest neighbor
 In Proceedings of the 23rd international conference on Machine learning
, 2006
"... ABSTRACT. We present a tree data structure for fast nearest neighbor operations in generalpoint metric spaces. The data structure requires space regardless of the metric’s structure. If the point set has an expansion constant � in the sense of Karger and Ruhl [KR02], the data structure can be const ..."
Abstract

Cited by 139 (0 self)
 Add to MetaCart
ABSTRACT. We present a tree data structure for fast nearest neighbor operations in generalpoint metric spaces. The data structure requires space regardless of the metric’s structure. If the point set has an expansion constant � in the sense of Karger and Ruhl [KR02], the data structure can be constructed in � time. Nearest neighbor queries obeying the expansion bound require � time. In addition, the nearest neighbor of points can be queried in time. We experimentally test the algorithm showing speedups over the brute force search varying between 1 and 2000 on natural machine learning datasets. 1.
Indexdriven similarity search in metric spaces
 ACM Transactions on Database Systems
, 2003
"... Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search th ..."
Abstract

Cited by 133 (6 self)
 Add to MetaCart
Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search that make the general assumption that similarity is represented with a distance metric d. Existing methods for handling similarity search in this setting typically fall into one of two classes. The first directly indexes the objects based on distances (distancebased indexing), while the second is based on mapping to a vector space (mappingbased approach). The main part of this article is dedicated to a survey of distancebased indexing methods, but we also briefly outline how search occurs in mappingbased methods. We also present a general framework for performing search based on distances, and present algorithms for common types of queries that operate on an arbitrary “search hierarchy. ” These algorithms can be applied on each of the methods presented, provided a suitable search hierarchy is defined.
A Review of Statistical Data Association Techniques for Motion Correspondence
 International Journal of Computer Vision
, 1993
"... Motion correspondence is a fundamental problem in computer vision and many other disciplines. This article describes statistical data association techniques originally developed in the context of target tracking and surveillance and now beginning to be used in dynamic motion analysis by the computer ..."
Abstract

Cited by 119 (3 self)
 Add to MetaCart
Motion correspondence is a fundamental problem in computer vision and many other disciplines. This article describes statistical data association techniques originally developed in the context of target tracking and surveillance and now beginning to be used in dynamic motion analysis by the computer vision community. The Mahalanobis distance measure is first introduced before discussing the limitations of nearest neighbor algorithms. Then, the tracksplitting, joint likelihood, multiple hypothesis algorithms are described, each method solving an increasingly more complicated optimization. Realtime constraints may prohibit the application of these optimal methods. The suboptimal joint probabilistic data association algorithm is therefore described. The advantages, limitations, and relationships between the approaches are discussed. 1
Distancebased indexing for highdimensional metric spaces
 In Proc. ACM SIGMOD International Conference on Management of Data
, 1997
"... In many database applications, one of the common queries is to find approximate matches to a given query item from a collection of data items. For example, given an image database, one may want to retrieve all images that are similar to a given query image. Distance based index structures are propos ..."
Abstract

Cited by 116 (3 self)
 Add to MetaCart
In many database applications, one of the common queries is to find approximate matches to a given query item from a collection of data items. For example, given an image database, one may want to retrieve all images that are similar to a given query image. Distance based index structures are proposed for applications where the data domain is high dimensional, or the distance function used to compute distances between data objects is nonEuclidean. In this paper, we introduce a distance based index structure called multivantage point (mvp) tree for similarity queries on highdimensional metric spaces. The mvptree uses more than one vantage point to partition the space into spherical cuts at each level. It also utilizes the precomputed (at construction time) distances between the data points and the vantage points. We have done experiments to compare mvptrees with vptrees which have a similar partitioning strategy, but use only one vantage point at each level, and do not make use of the precomputed distances. Empirical studies show that mvptree outperforms the vptree 20 % to 80 % for varying query ranges and different distance distributions. 1.
Nearest neighbor queries in metric spaces
 Discrete Comput. Geom
, 1997
"... Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives data structures for this problem when the sites and queries are in a metric spa ..."
Abstract

Cited by 115 (1 self)
 Add to MetaCart
Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives data structures for this problem when the sites and queries are in a metric space. One data structure, D(S), uses a divideandconquer recursion. The other data structure, M(S, Q), is somewhat like a skiplist. Both are simple and implementable. The data structures are analyzed when the metric space obeys a certain spherepacking bound, and when the sites and query points are random and have distributions with an exchangeability property. This property implies, for example, that query point q is a random element of S ∪ {q}. Under these conditions, the preprocessing and space bounds for the algorithms are close to linear in n. They depend also on the spherepacking bound, and on the logarithm of the distance ratio Υ(S) of S, the ratio of the distance between the farthest pair of points in S to the distance between the closest pair. The data structure M(S, Q) requires as input data an additional set Q, taken to be representative of the query points. The resource bounds of M(S, Q) have a dependence on the distance ratio of S ∪ Q. While M(S, Q) can return wrong answers, its failure probability can be bounded, and is decreasing in a parameter K. Here K ≤ Q/n is chosen when building M(S, Q). The expected query time for M(S, Q) is O(K log n) log Υ(S ∪ Q), and the resource bounds increase linearly in K. The data structure D(S) has expected O(log n) O(1) query time, for fixed distance ratio. The preprocessing algorithm for M(S, Q) can be used to solve the allnearestneighbor problem for S in O(n(log n) 2 (log Υ(S)) 2) expected time. 1