Results 1  10
of
11
Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces
, 1993
"... We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation is very high. Also relevant are highdim ..."
Abstract

Cited by 273 (4 self)
 Add to MetaCart
We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation is very high. Also relevant are highdimensional Euclidian settings in which the distribution of data is in some sense of lower dimension and embedded in the space. The vptree (vantage point tree) is introduced in several forms, together with associated algorithms, as an improved method for these difficult search problems. Tree construction executes in O(n log(n)) time, and search is under certain circumstances and in the limit, O(log(n)) expected time. The theoretical basis for this approach is developed and the results of several experiments are reported. In Euclidian cases, kdtree performance is compared.
Near neighbor search in large metric spaces
 In Proceedings of the 21th International Conference on Very Large Data Bases
, 1995
"... Given user data, one often wants to find approximate matches in a large database. A good example of such a task is finding images similar to a given image in a large collection of images. We focus on the important and technically difficult case where each data element is high dimensional, or more ge ..."
Abstract

Cited by 183 (0 self)
 Add to MetaCart
Given user data, one often wants to find approximate matches in a large database. A good example of such a task is finding images similar to a given image in a large collection of images. We focus on the important and technically difficult case where each data element is high dimensional, or more generally, is represented by a point in a large metric spaceand distance calculations are computationally expensive. In this paper we introduce a data structure to solve this problem called a GNAT Geometric Nearneighbor Access Tree. It is based on the philosophy that the data structure should act as a hierarchical geometrical model of the data as opposed to a simple decomposition of the data that does not use its intrinsic geometry. In experiments, we find that GNAT’s outperform previous data structures in a number of applications.
Nearest neighbor queries in metric spaces
 Discrete Comput. Geom
, 1997
"... Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives data structures for this problem when the sites and queries are in a metric spa ..."
Abstract

Cited by 115 (1 self)
 Add to MetaCart
Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives data structures for this problem when the sites and queries are in a metric space. One data structure, D(S), uses a divideandconquer recursion. The other data structure, M(S, Q), is somewhat like a skiplist. Both are simple and implementable. The data structures are analyzed when the metric space obeys a certain spherepacking bound, and when the sites and query points are random and have distributions with an exchangeability property. This property implies, for example, that query point q is a random element of S ∪ {q}. Under these conditions, the preprocessing and space bounds for the algorithms are close to linear in n. They depend also on the spherepacking bound, and on the logarithm of the distance ratio Υ(S) of S, the ratio of the distance between the farthest pair of points in S to the distance between the closest pair. The data structure M(S, Q) requires as input data an additional set Q, taken to be representative of the query points. The resource bounds of M(S, Q) have a dependence on the distance ratio of S ∪ Q. While M(S, Q) can return wrong answers, its failure probability can be bounded, and is decreasing in a parameter K. Here K ≤ Q/n is chosen when building M(S, Q). The expected query time for M(S, Q) is O(K log n) log Υ(S ∪ Q), and the resource bounds increase linearly in K. The data structure D(S) has expected O(log n) O(1) query time, for fixed distance ratio. The preprocessing algorithm for M(S, Q) can be used to solve the allnearestneighbor problem for S in O(n(log n) 2 (log Υ(S)) 2) expected time. 1
New Techniques for BestMatch Retrieval
 ACM Transactions on Information Systems
, 1990
"... A scheme to answer bestmatch queries from a file containing a collection of objects is described. A bestmatch query is to find the objects in the file that are closest (according to some (dis)similarity measure) to a given target. Previous work [5, 331 suggests that one can reduce the number of co ..."
Abstract

Cited by 53 (5 self)
 Add to MetaCart
A scheme to answer bestmatch queries from a file containing a collection of objects is described. A bestmatch query is to find the objects in the file that are closest (according to some (dis)similarity measure) to a given target. Previous work [5, 331 suggests that one can reduce the number of comparisons required to achieve the desired results using the triangle inequality, starting with a data structure for the file that reflects some precomputed intrafile distances. We generalize the technique to allow the optimum use of any given set of precomputed intrafile distances. Some empirical results are presented which illustrate the effectiveness of our scheme, and its performance relative to previous algorithms.
Excluded Middle Vantage Point Forests for Nearest Neighbor Search
 In DIMACS Implementation Challenge, ALENEX'99
, 1999
"... The excluded middle vantage point forest is a new data structure that supports worst case sublinear time searches in a metric space for nearest neighbors within a xed radius of arbitrary queries. Worst case performance depends on the dataset but is not aected by the distribution of queries. Our an ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
The excluded middle vantage point forest is a new data structure that supports worst case sublinear time searches in a metric space for nearest neighbors within a xed radius of arbitrary queries. Worst case performance depends on the dataset but is not aected by the distribution of queries. Our analysis predicts vpforest performance in simple settings such as L p spaces with uniform random datasets  and experiments conrm these predictions. Another contribution of the analysis is a new perspective on the curse of dimensionality in the context of our methods and kdtrees as well. In our idealized setting the dataset is organized into a forest of O(N 1 ) trees, each of depth O(log N ). Here may be viewed as depending on , the distance function, and on the dataset. The radius of interest is an input to the organization process and the result is a linear space data structure specialized to answer queries within this distance. Searches then require O(N 1 log N) time, or...
Locally Lifting the Curse of Dimensionality for Nearest Neighbor Search (Extended Abstract)
 IN PROC. 11TH ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA'00
, 1999
"... We consider the problem of nearest neighbor search in the Euclidean hypercube [ 1, +1]^d with uniform distributions, and the additional natural assumption that the nearest neighbor is located within a constant fraction R of the maximum interpoint distance in this space, i.e. within distance 2R&radic ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
We consider the problem of nearest neighbor search in the Euclidean hypercube [ 1, +1]^d with uniform distributions, and the additional natural assumption that the nearest neighbor is located within a constant fraction R of the maximum interpoint distance in this space, i.e. within distance 2R√d of the query. We introduce the idea of aggressive pruning and give a family of practical algorithms, an idealized analysis, and describe experiments. Our main result is that search complexity measured in terms of ddimensional inner product operations, is i) strongly sublinear with respect to the data set size n for moderate R, ii) asymptotically, and as a practical matter, independent of dimension. Given a random data set, a random query within distance 2R√d of some database element, and a randomly constructed data structure, the search succeeds with a specified probability, which is a parameter of the search algorithm. On average a search performs...
Fast Nearest Neighbor Search of EntropyConstrained Vector Quantization
 IEEE Trans. Image Processing
, 2000
"... Entropyconstrained vector quantization (ECVQ) [3] offers substantially improved image quality over vector quantization (VQ) at the cost of additional encoding complexity. We extend results in the literature for fast nearest neighbor search of VQ to ECVQ. We use a new, easily computed distance that ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Entropyconstrained vector quantization (ECVQ) [3] offers substantially improved image quality over vector quantization (VQ) at the cost of additional encoding complexity. We extend results in the literature for fast nearest neighbor search of VQ to ECVQ. We use a new, easily computed distance that successfully eliminates most codewords from consideration. Keywords Entropyconstrained vector quantization, ECVQ, fast full search of ECVQ, fast full search of vector quantization, Voronoi diagram I. Introduction Full search vector quantization [8] encodes an input vector by choosing its nearest neighbor from a codebook according to a distortion measure such as the Euclidean distance. The nearest neighbor is found by computing the distortion from the input vector to each codeword, which is a computationally intensive operation. Algorithms to reduce search complexity [9], [5] narrow the field of candidate codewords for which the distortion must be calculated. These techniques have the...
Efficient Matching of Dynamically Changing Graphs
, 1991
"... Subgraph isomorphism detection is a fundamental technique in computer vision. In this paper we propose a new subgraph matching procedure that is particularly useful if the number of prototype graphs is large, and if the graph representation of the image to be interpreted is dynamically changing. ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Subgraph isomorphism detection is a fundamental technique in computer vision. In this paper we propose a new subgraph matching procedure that is particularly useful if the number of prototype graphs is large, and if the graph representation of the image to be interpreted is dynamically changing. Our procedure is derived from the RETEmatching algorithm that has been developed for forward chaining rulebased systems [1]. We introduce our new method and discuss its computational complexity. It will be shown that the computational complexity of the proposed approach is not better than that of a naive, straigthforward solution to the problem. In the best case, however, a significant speedup can be achieved. Finally, we show experimental results which confirm our theoretical complexity analysis. 1 Introduction Graph matching is a fundamental technique in computer vision and image understanding. In many vision systems a graph extracted from an image is matched to stored model gra...
Nearest Neighbour Search in Hausdorff Distance Pattern Spaces
, 2001
"... We devise the first data structure that, with a sufficient amount of preprocessing, answers nearestneigbour queries among point patterns in ddimensional Euclidean space with respect to (various variants of) the Hausdorff distance in nearoptimal O(m log² n) time, if the query set consists of ..."
Abstract
 Add to MetaCart
We devise the first data structure that, with a sufficient amount of preprocessing, answers nearestneigbour queries among point patterns in ddimensional Euclidean space with respect to (various variants of) the Hausdorff distance in nearoptimal O(m log² n) time, if the query set consists of m points and the preprocessed patterns contain a total of n points.
Fuzzy Clustering for Contentbased Indexing in Multimedia Database
, 2001
"... In this information age, how to manage information is one of the important issues in our daily life. In a contentbased retrieval database, contents or features of the database objects are used for retrieval. Typically, these data exist in natural clusters. However, many of the currently indexing me ..."
Abstract
 Add to MetaCart
In this information age, how to manage information is one of the important issues in our daily life. In a contentbased retrieval database, contents or features of the database objects are used for retrieval. Typically, these data exist in natural clusters. However, many of the currently indexing methods omit this data clusters information in the construction of the indexing structure which leads to performance degradation.