Results 1  10
of
18
When Is "Nearest Neighbor" Meaningful?
 In Int. Conf. on Database Theory
, 1999
"... . We explore the effect of dimensionality on the "nearest neighbor " problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the fa ..."
Abstract

Cited by 292 (1 self)
 Add to MetaCart
. We explore the effect of dimensionality on the "nearest neighbor " problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 1015 dimensions. These results should not be interpreted to mean that highdimensional indexing is never meaningful; we illustrate this point by identifying some highdimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate highdimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple...
Shape Indexing Using Approximate NearestNeighbour Search in HighDimensional Spaces
, 1997
"... Shape indexing is a way of making rapid associations between features detected in an image and object models that could have produced them. When model databases are large, the use of highdimensional features is critical, due to the improved level of discrimination they can provide. Unfortunately, f ..."
Abstract

Cited by 187 (12 self)
 Add to MetaCart
Shape indexing is a way of making rapid associations between features detected in an image and object models that could have produced them. When model databases are large, the use of highdimensional features is critical, due to the improved level of discrimination they can provide. Unfortunately, finding the nearest neighbour to a query point rapidly becomes inefficient as the dimensionality of the feature space increases. Past indexing methods have used hash tables for hypothesis recovery, but only in lowdimensional situations. In this paper, we show that a new variant of the kd tree search algorithm makes indexing in higherdimensional spaces practical. This Best Bin First, or BBF, search is an approximate algorithm which finds the nearest neighbour for a large fraction of the queries, and a very close neighbour in the remaining cases. The technique has been integrated into a fully developed recognition system, which is able to detect complex objects in real, cluttered scenes in just a few seconds.
Two Algorithms for NearestNeighbor Search in High Dimensions
, 1997
"... Representing data as points in a highdimensional space, so as to use geometric methods for indexing, is an algorithmic technique with a wide array of uses. It is central to a number of areas such as information retrieval, pattern recognition, and statistical data analysis; many of the problems aris ..."
Abstract

Cited by 169 (0 self)
 Add to MetaCart
Representing data as points in a highdimensional space, so as to use geometric methods for indexing, is an algorithmic technique with a wide array of uses. It is central to a number of areas such as information retrieval, pattern recognition, and statistical data analysis; many of the problems arising in these applications can involve several hundred or several thousand dimensions. We consider the nearestneighbor problem for ddimensional Euclidean space: we wish to preprocess a database of n points so that given a query point, one can efficiently determine its nearest neighbors in the database. There is a large literature on algorithms for this problem, in both the exact and approximate cases. The more sophisticated algorithms typically achieve a query time that is logarithmic in n at the expense of an exponential dependence on the dimension d; indeed, even the averagecase analysis of heuristics such as kd trees reveals an exponential dependence on d in the query time. In this wor...
A Simple Algorithm for Nearest Neighbor Search in High Dimensions
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1997
"... Abstract—The problem of finding the closest point in highdimensional spaces is common in pattern recognition. Unfortunately, the complexity of most existing search algorithms, such as kd tree and Rtree, grows exponentially with dimension, making them impractical for dimensionality above 15. In ne ..."
Abstract

Cited by 126 (1 self)
 Add to MetaCart
Abstract—The problem of finding the closest point in highdimensional spaces is common in pattern recognition. Unfortunately, the complexity of most existing search algorithms, such as kd tree and Rtree, grows exponentially with dimension, making them impractical for dimensionality above 15. In nearly all applications, the closest point is of interest only if it lies within a userspecified distance e. We present a simple and practical algorithm to efficiently search for the nearest neighbor within Euclidean distance e. The use of projection search combined with a novel data structure dramatically improves performance in high dimensions. A complexity analysis is presented which helps to automatically determine e in structured problems. A comprehensive set of benchmarks clearly shows the superiority of the proposed algorithm for a variety of structured and unstructured search problems. Object recognition is demonstrated as an example application. The simplicity of the algorithm makes it possible to construct an inexpensive hardware search engine which can be 100 times faster than its software equivalent. A C++ implementation of our algorithm is available upon request to search@cs.columbia.edu/CAVE/.
Similarity Indexing: Algorithms and Performance
 In Proceedings SPIE Storage and Retrieval for Image and Video Databases
, 1996
"... Efficient indexing support is essential to allow contentbased image and video databases using similaritybased retrieval to scale to large databases (tens of thousands up to millions of images). In this paper, we take an in depth look at this problem. One of the major difficulties in solving this pr ..."
Abstract

Cited by 111 (1 self)
 Add to MetaCart
Efficient indexing support is essential to allow contentbased image and video databases using similaritybased retrieval to scale to large databases (tens of thousands up to millions of images). In this paper, we take an in depth look at this problem. One of the major difficulties in solving this problem is the high dimension (6100) of the feature vectors that are used to represent objects. We provide an overview of the work in computational geometry on this problem and highlight the results we found are most useful in practice, including the use of approximate nearest neighbor algorithms. We also present a variant of the optimized kd tree we call the VAM kd tree, and provide algorithms to create an optimized Rtree we call the VAMSplit Rtree. We found that the VAMSplit Rtree provided better overall performance than all competing structures we tested for main memory and secondary memory applications. We observed large improvements in performance relative to the R*tree and SStree in secondary memory applications, and modest improvements relative to optimized kd tree variants.Nearest Neighbor Search
What is the Nearest Neighbor in High Dimensional Spaces?
, 2000
"... Nearest neighbor search in high dimensional spaces is an interesting and important problem which is relevant for a wide variety of novel database applications. As recent results show, however, the problem is a very difficult one, not only with regards to the performance issue but also to the quality ..."
Abstract

Cited by 110 (8 self)
 Add to MetaCart
Nearest neighbor search in high dimensional spaces is an interesting and important problem which is relevant for a wide variety of novel database applications. As recent results show, however, the problem is a very difficult one, not only with regards to the performance issue but also to the quality issue. In this paper, we discuss the quality issue and identify a new generalized notion of nearest neighbor search as the relevant problem in high dimensional space. In contrast to previous approaches, our new notion of nearest neighbor search does not treat all dimensions equally but uses a quality criterion to select relevant dimensions (projections) with respect to the given query. As an example for a useful quality criterion, we rate how well the data is clustered around the query point within the selected projection. We then propose an efficient and effective algorithm to solve the generalized nearest neighbor problem. Our experiments based on a number of real and synthetic data sets show that our new approach provides new insights into the nature of nearest neighbor search on high dimensional data.
Independent Quantization: An Index Compression Technique for HighDimensional Data Spaces
 IN ICDE
, 1999
"... Two major approaches have been proposed to efficiently process queries in databases: Speeding up the search by using index structures, and speeding up the search by operating on a compressed database, such as a signature file. Both approaches have their limitations: Indexing techniques are inefficie ..."
Abstract

Cited by 79 (21 self)
 Add to MetaCart
Two major approaches have been proposed to efficiently process queries in databases: Speeding up the search by using index structures, and speeding up the search by operating on a compressed database, such as a signature file. Both approaches have their limitations: Indexing techniques are inefficient in extreme configurations, such as highdimensional spaces, where even a simple scan may be cheaper than an indexbased search. Compression techniques are not very efficient in all other situations. We propose to combine both techniques to search for nearest neighbors in a highdimensional space. For this purpose, we develop a compressed index, called the IQtree, with a threelevel structure: The first level is a regular (flat) directory consisting of minimum bounding boxes, the second level contains data points in a compressed representation, and the third level contains the actual data. We overcome several engineering challenges in constructing an effective index structure of this type...
Fast parallel similarity search in multimedia databases
 In Proc. ACM SIGMOD Int. Conf. on Management of Data
, 1997
"... Most similarity search techniques map the data objects into some highdimensional feature space. The similarity search then corresponds to a nearestneighbor search in the feature space which is computationally very intensive. In this paper, we present a new parallel method for fast nearestneighbor ..."
Abstract

Cited by 60 (3 self)
 Add to MetaCart
Most similarity search techniques map the data objects into some highdimensional feature space. The similarity search then corresponds to a nearestneighbor search in the feature space which is computationally very intensive. In this paper, we present a new parallel method for fast nearestneighbor search in highdimensional feature spaces. The core problem of designing a parallel nearestneighbor algorithm is to find an adequate distribution of the data onto the disks. Unfortunately, the known declustering methods do not perform well for highdimensional nearestneighbor search. In contrast, our method has been optimized based on the special properties of highdimensional spaces and therefore provides a nearoptimal distribution of the data items among the disks. The basic idea of our data declustering technique is to assign the buckets corresponding to different quadrants of the data space to different disks.
A Cost Model for Query Processing in HighDimensional Data Spaces
, 2000
"... During the last decade, multimedia databases have become increasingly important in many application areas such as medicine, CAD, geography or molecular biology. An important research issue in the field of multimedia databases is similarity search in large data sets. Most current approaches addressin ..."
Abstract

Cited by 47 (0 self)
 Add to MetaCart
During the last decade, multimedia databases have become increasingly important in many application areas such as medicine, CAD, geography or molecular biology. An important research issue in the field of multimedia databases is similarity search in large data sets. Most current approaches addressing similarity search use the socalled feature approach which transforms important properties of the stored objects into points of a highdimensional space (feature vectors). Thus, the similarity search is transformed into a neighborhood search in the feature space. For the management of the feature vectors, multidimensional index structures are usually applied. The performance of query processing can be substantially improved by opti...
Fast nearest neighbor search in highdimensional space
 In ICDE
, 1998
"... Similarity search in multimedia databases requires an efficient support of nearestneighbor search on a large set of highdimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearestneighbor search are not efficient in high ..."
Abstract

Cited by 46 (0 self)
 Add to MetaCart
Similarity search in multimedia databases requires an efficient support of nearestneighbor search on a large set of highdimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearestneighbor search are not efficient in higher dimensions. In our new approach, we therefore precompute the result of any nearestneighbor search which corresponds to a computation of the voronoi cell of each data point. In a second step, we store the voronoi cells in an index structure efficient for highdimensional data spaces. As a result, nearest neighbor search corresponds to a simple point query on the index structure. Although our technique is based on a precomputation of the solution space, it is dynamic, i.e. it supports insertions of new data points. An extensive experimental evaluation of our technique demonstrates the high efficiency for uniformly distributed as well as real data. We obtained a significant reduction of the search time compared to nearest neighbor search in the Xtree (up to a factor of 4). 1Introduction An important research issue in the field of multimedia databases is the content based retrieval of similar multimedia objects such as images, text and videos [Alt+ 90]