Results 1  10
of
48
An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions
 ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1994
"... Consider a set S of n data points in real ddimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q 2 R d , the closest point of S to q can be reported quickly. Given any po ..."
Abstract

Cited by 776 (31 self)
 Add to MetaCart
Consider a set S of n data points in real ddimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q 2 R d , the closest point of S to q can be reported quickly. Given any positive real ffl, a data point p is a (1 + ffl)approximate nearest neighbor of q if its distance from q is within a factor of (1 + ffl) of the distance to the true nearest neighbor. We show that it is possible to preprocess a set of n points in R d in O(dn log n) time and O(dn) space, so that given a query point q 2 R d , and ffl ? 0, a (1 + ffl)approximate nearest neighbor of q can be computed in O(c d;ffl log n) time, where c d;ffl d d1 + 6d=ffle d is a factor depending only on dimension and ffl. In general, we show that given an integer k 1, (1 + ffl)approximations to the k nearest neighbors of q can be computed in additional O(kd log n) time.
Quantization
 IEEE TRANS. INFORM. THEORY
, 1998
"... The history of the theory and practice of quantization dates to 1948, although similar ideas had appeared in the literature as long ago as 1898. The fundamental role of quantization in modulation and analogtodigital conversion was first recognized during the early development of pulsecode modula ..."
Abstract

Cited by 638 (11 self)
 Add to MetaCart
The history of the theory and practice of quantization dates to 1948, although similar ideas had appeared in the literature as long ago as 1898. The fundamental role of quantization in modulation and analogtodigital conversion was first recognized during the early development of pulsecode modulation systems, especially in the 1948 paper of Oliver, Pierce, and Shannon. Also in 1948, Bennett published the first highresolution analysis of quantization and an exact analysis of quantization noise for Gaussian processes, and Shannon published the beginnings of rate distortion theory, which would provide a theory for quantization as analogtodigital conversion and as data compression. Beginning with these three papers of fifty years ago, we trace the history of quantization from its origins through this decade, and we survey the fundamentals of the theory and many of the popular and promising techniques for quantization.
An algorithm for finding best matches in logarithmic expected time
 ACM Transactions on Mathematical Software
, 1977
"... An algorithm and data structure are presented for searching a file containing N records, each described by k real valued keys, for the m closest matches or nearest neighbors to a given query record. The computation required to organize the file is proportional to kNlogN. The expected number of recor ..."
Abstract

Cited by 585 (2 self)
 Add to MetaCart
An algorithm and data structure are presented for searching a file containing N records, each described by k real valued keys, for the m closest matches or nearest neighbors to a given query record. The computation required to organize the file is proportional to kNlogN. The expected number of records examined in each search is independent of the file size. The expected computation to perform each search is proportionalto 1ogN. Empirical evidence suggests that except for very small files, this algorithm is considerably faster than other methods.
Problems in Computational Geometry
 Packing and Covering
, 1974
"...  reproduced, stored In a retrieval system, or transmlt'ted, In any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the author. ..."
Abstract

Cited by 452 (2 self)
 Add to MetaCart
 reproduced, stored In a retrieval system, or transmlt'ted, In any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the author.
Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces
, 1993
"... We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation is very high. Also relevant are highdim ..."
Abstract

Cited by 269 (4 self)
 Add to MetaCart
We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation is very high. Also relevant are highdimensional Euclidian settings in which the distribution of data is in some sense of lower dimension and embedded in the space. The vptree (vantage point tree) is introduced in several forms, together with associated algorithms, as an improved method for these difficult search problems. Tree construction executes in O(n log(n)) time, and search is under certain circumstances and in the limit, O(log(n)) expected time. The theoretical basis for this approach is developed and the results of several experiments are reported. In Euclidian cases, kdtree performance is compared.
The TVtree  an index structure for highdimensional data
 VLDB Journal
, 1994
"... We propose a file structure to index highdimensionality data, typically, points in some feature space. The idea is to use only a few of the features, utilizing additional features whenever the additional discriminatory power is absolutely necessary. We present in detail the design of our tree struc ..."
Abstract

Cited by 193 (7 self)
 Add to MetaCart
We propose a file structure to index highdimensionality data, typically, points in some feature space. The idea is to use only a few of the features, utilizing additional features whenever the additional discriminatory power is absolutely necessary. We present in detail the design of our tree structure and the associated algorithms that handle such `varying length' feature vectors. Finally we report simulation results, comparing the proposed structure with the R tree, which is one of the most successful methods for lowdimensionality spaces. The results illustrate the superiority of our method, with up to 80% savings in disk accesses. Type of Contribution: New Index Structure, for highdimensionality feature spaces. Algorithms and performance measurements. Keywords: Spatial Index, Similarity Retrieval, Query by Content 1 Introduction Many applications require enhanced indexing, capable of performing similarity searching on several, nontraditional (`exotic') data types. The targ...
Indexdriven similarity search in metric spaces
 ACM Transactions on Database Systems
, 2003
"... Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search th ..."
Abstract

Cited by 132 (6 self)
 Add to MetaCart
Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search that make the general assumption that similarity is represented with a distance metric d. Existing methods for handling similarity search in this setting typically fall into one of two classes. The first directly indexes the objects based on distances (distancebased indexing), while the second is based on mapping to a vector space (mappingbased approach). The main part of this article is dedicated to a survey of distancebased indexing methods, but we also briefly outline how search occurs in mappingbased methods. We also present a general framework for performing search based on distances, and present algorithms for common types of queries that operate on an arbitrary “search hierarchy. ” These algorithms can be applied on each of the methods presented, provided a suitable search hierarchy is defined.
A Simple Algorithm for Nearest Neighbor Search in High Dimensions
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1997
"... Abstract—The problem of finding the closest point in highdimensional spaces is common in pattern recognition. Unfortunately, the complexity of most existing search algorithms, such as kd tree and Rtree, grows exponentially with dimension, making them impractical for dimensionality above 15. In ne ..."
Abstract

Cited by 125 (1 self)
 Add to MetaCart
Abstract—The problem of finding the closest point in highdimensional spaces is common in pattern recognition. Unfortunately, the complexity of most existing search algorithms, such as kd tree and Rtree, grows exponentially with dimension, making them impractical for dimensionality above 15. In nearly all applications, the closest point is of interest only if it lies within a userspecified distance e. We present a simple and practical algorithm to efficiently search for the nearest neighbor within Euclidean distance e. The use of projection search combined with a novel data structure dramatically improves performance in high dimensions. A complexity analysis is presented which helps to automatically determine e in structured problems. A comprehensive set of benchmarks clearly shows the superiority of the proposed algorithm for a variety of structured and unstructured search problems. Object recognition is demonstrated as an example application. The simplicity of the algorithm makes it possible to construct an inexpensive hardware search engine which can be 100 times faster than its software equivalent. A C++ implementation of our algorithm is available upon request to search@cs.columbia.edu/CAVE/.
Adaptive, Template Moderated, Spatially Varying Statistical Classification
 Medical Image Analysis
, 1998
"... A novel image segmentation algorithm was developed to allow the automatic segmentation of both normal and abnormal anatomy. The new algorithm is a form of spatially varying classification (SVC), in which an explicit anatomical template is used to moderate the segmentation obtained by k Nearest Neigh ..."
Abstract

Cited by 85 (15 self)
 Add to MetaCart
A novel image segmentation algorithm was developed to allow the automatic segmentation of both normal and abnormal anatomy. The new algorithm is a form of spatially varying classification (SVC), in which an explicit anatomical template is used to moderate the segmentation obtained by k Nearest Neighbour (\knnrule) statistical classification. The new algorithm consists of an iterated sequence of spatially varying classification and nonlinear registration, which creates an adaptive, template moderated (ATM), spatially varying classification (SVC). The ATM SVC algorithm was applied to several segmentation problems, involving different types of imaging and different locations in the body. Segmentation and validation experiments were carried out for problems involving the quantification of normal anatomy (MRI of brains of babies, MRI of knee cartilage of normal volunteers) and pathology of various types (MRI of patients with multiple sclerosis, MRI of patients with brain tumours, MRI of patients with damaged knee cartilage). In each case, the ATM SVC algorithm provided a better segmentation than statistical classification or elastic matching alone. \emph{Keywords:} template moderated segmentation, elastic matching, nearest neighbour classification, knee cartilage, neonate, brain, tumour
Using the Triangle Inequality to Reduce the Number of Comparisons Required for SimilarityBased Retrieval
 Proc. of SPIE/IS&T Conf. on Storage and Retrieval for Image and Video Databases IV
, 1996
"... Dissimilarity measures, the basis of similaritybased retrieval, can be viewed as a distance and a similaritybased search as a nearest neighbor search. Though there has been extensive research on data structures and search methods to support nearestneighbor searching, these indexing and dimensionr ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
Dissimilarity measures, the basis of similaritybased retrieval, can be viewed as a distance and a similaritybased search as a nearest neighbor search. Though there has been extensive research on data structures and search methods to support nearestneighbor searching, these indexing and dimensionreduction methods are generally not applicable to noncoordinate data and nonEuclidean distance measures. In this paper we reexamine and extend previous work of other researchers on best match searching based on the triangle inequality. These methods can be used to organize both noncoordinate data and nonEuclidean metric similarity measures. The effectiveness of the indexes depends on the actual dimensionality of the feature set, data, and similarity metric used. We show that these methods provide significant performance improvements and may be of practical value in realworld databases. Keywords: image database indexing, similaritybased retrieval, best match searching, triangle inequali...