Results 1  10
of
73
An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions
 ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1994
"... Consider a set S of n data points in real ddimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q 2 R d , the closest point of S to q can be reported quickly. Given any po ..."
Abstract

Cited by 923 (32 self)
 Add to MetaCart
Consider a set S of n data points in real ddimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q 2 R d , the closest point of S to q can be reported quickly. Given any positive real ffl, a data point p is a (1 + ffl)approximate nearest neighbor of q if its distance from q is within a factor of (1 + ffl) of the distance to the true nearest neighbor. We show that it is possible to preprocess a set of n points in R d in O(dn log n) time and O(dn) space, so that given a query point q 2 R d , and ffl ? 0, a (1 + ffl)approximate nearest neighbor of q can be computed in O(c d;ffl log n) time, where c d;ffl d d1 + 6d=ffle d is a factor depending only on dimension and ffl. In general, we show that given an integer k 1, (1 + ffl)approximations to the k nearest neighbors of q can be computed in additional O(kd log n) time.
Quantization
 IEEE TRANS. INFORM. THEORY
, 1998
"... The history of the theory and practice of quantization dates to 1948, although similar ideas had appeared in the literature as long ago as 1898. The fundamental role of quantization in modulation and analogtodigital conversion was first recognized during the early development of pulsecode modula ..."
Abstract

Cited by 795 (12 self)
 Add to MetaCart
The history of the theory and practice of quantization dates to 1948, although similar ideas had appeared in the literature as long ago as 1898. The fundamental role of quantization in modulation and analogtodigital conversion was first recognized during the early development of pulsecode modulation systems, especially in the 1948 paper of Oliver, Pierce, and Shannon. Also in 1948, Bennett published the first highresolution analysis of quantization and an exact analysis of quantization noise for Gaussian processes, and Shannon published the beginnings of rate distortion theory, which would provide a theory for quantization as analogtodigital conversion and as data compression. Beginning with these three papers of fifty years ago, we trace the history of quantization from its origins through this decade, and we survey the fundamentals of the theory and many of the popular and promising techniques for quantization.
An algorithm for finding best matches in logarithmic expected time
 ACM Transactions on Mathematical Software
, 1977
"... An algorithm and data structure are presented for searching a file containing N records, each described by k real valued keys, for the m closest matches or nearest neighbors to a given query record. The computation required to organize the file is proportional to kNlogN. The expected number of recor ..."
Abstract

Cited by 724 (2 self)
 Add to MetaCart
An algorithm and data structure are presented for searching a file containing N records, each described by k real valued keys, for the m closest matches or nearest neighbors to a given query record. The computation required to organize the file is proportional to kNlogN. The expected number of records examined in each search is independent of the file size. The expected computation to perform each search is proportionalto 1ogN. Empirical evidence suggests that except for very small files, this algorithm is considerably faster than other methods.
Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces
, 1993
"... We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation is very high. Also relevant are highdim ..."
Abstract

Cited by 336 (4 self)
 Add to MetaCart
We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation is very high. Also relevant are highdimensional Euclidian settings in which the distribution of data is in some sense of lower dimension and embedded in the space. The vptree (vantage point tree) is introduced in several forms, together with associated algorithms, as an improved method for these difficult search problems. Tree construction executes in O(n log(n)) time, and search is under certain circumstances and in the limit, O(log(n)) expected time. The theoretical basis for this approach is developed and the results of several experiments are reported. In Euclidian cases, kdtree performance is compared.
The TVtree  an index structure for highdimensional data
 VLDB Journal
, 1994
"... We propose a file structure to index highdimensionality data, typically, points in some feature space. The idea is to use only a few of the features, utilizing additional features whenever the additional discriminatory power is absolutely necessary. We present in detail the design of our tree struc ..."
Abstract

Cited by 212 (8 self)
 Add to MetaCart
(Show Context)
We propose a file structure to index highdimensionality data, typically, points in some feature space. The idea is to use only a few of the features, utilizing additional features whenever the additional discriminatory power is absolutely necessary. We present in detail the design of our tree structure and the associated algorithms that handle such `varying length' feature vectors. Finally we report simulation results, comparing the proposed structure with the R tree, which is one of the most successful methods for lowdimensionality spaces. The results illustrate the superiority of our method, with up to 80% savings in disk accesses. Type of Contribution: New Index Structure, for highdimensionality feature spaces. Algorithms and performance measurements. Keywords: Spatial Index, Similarity Retrieval, Query by Content 1 Introduction Many applications require enhanced indexing, capable of performing similarity searching on several, nontraditional (`exotic') data types. The targ...
Indexdriven similarity search in metric spaces
 ACM Transactions on Database Systems
, 2003
"... Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search th ..."
Abstract

Cited by 178 (7 self)
 Add to MetaCart
Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search that make the general assumption that similarity is represented with a distance metric d. Existing methods for handling similarity search in this setting typically fall into one of two classes. The first directly indexes the objects based on distances (distancebased indexing), while the second is based on mapping to a vector space (mappingbased approach). The main part of this article is dedicated to a survey of distancebased indexing methods, but we also briefly outline how search occurs in mappingbased methods. We also present a general framework for performing search based on distances, and present algorithms for common types of queries that operate on an arbitrary “search hierarchy. ” These algorithms can be applied on each of the methods presented, provided a suitable search hierarchy is defined.
A Simple Algorithm for Nearest Neighbor Search in High Dimensions
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1997
"... Abstract—The problem of finding the closest point in highdimensional spaces is common in pattern recognition. Unfortunately, the complexity of most existing search algorithms, such as kd tree and Rtree, grows exponentially with dimension, making them impractical for dimensionality above 15. In ne ..."
Abstract

Cited by 149 (1 self)
 Add to MetaCart
(Show Context)
Abstract—The problem of finding the closest point in highdimensional spaces is common in pattern recognition. Unfortunately, the complexity of most existing search algorithms, such as kd tree and Rtree, grows exponentially with dimension, making them impractical for dimensionality above 15. In nearly all applications, the closest point is of interest only if it lies within a userspecified distance e. We present a simple and practical algorithm to efficiently search for the nearest neighbor within Euclidean distance e. The use of projection search combined with a novel data structure dramatically improves performance in high dimensions. A complexity analysis is presented which helps to automatically determine e in structured problems. A comprehensive set of benchmarks clearly shows the superiority of the proposed algorithm for a variety of structured and unstructured search problems. Object recognition is demonstrated as an example application. The simplicity of the algorithm makes it possible to construct an inexpensive hardware search engine which can be 100 times faster than its software equivalent. A C++ implementation of our algorithm is available upon request to search@cs.columbia.edu/CAVE/.
Adaptive, Template Moderated, Spatially Varying Statistical Classification
 Medical Image Analysis
, 1998
"... A novel image segmentation algorithm was developed to allow the automatic segmentation of both normal and abnormal anatomy. The new algorithm is a form of spatially varying classification (SVC), in which an explicit anatomical template is used to moderate the segmentation obtained by k Nearest Neigh ..."
Abstract

Cited by 104 (18 self)
 Add to MetaCart
(Show Context)
A novel image segmentation algorithm was developed to allow the automatic segmentation of both normal and abnormal anatomy. The new algorithm is a form of spatially varying classification (SVC), in which an explicit anatomical template is used to moderate the segmentation obtained by k Nearest Neighbour (\knnrule) statistical classification. The new algorithm consists of an iterated sequence of spatially varying classification and nonlinear registration, which creates an adaptive, template moderated (ATM), spatially varying classification (SVC). The ATM SVC algorithm was applied to several segmentation problems, involving different types of imaging and different locations in the body. Segmentation and validation experiments were carried out for problems involving the quantification of normal anatomy (MRI of brains of babies, MRI of knee cartilage of normal volunteers) and pathology of various types (MRI of patients with multiple sclerosis, MRI of patients with brain tumours, MRI of patients with damaged knee cartilage). In each case, the ATM SVC algorithm provided a better segmentation than statistical classification or elastic matching alone. \emph{Keywords:} template moderated segmentation, elastic matching, nearest neighbour classification, knee cartilage, neonate, brain, tumour
Predicting subcellular localization of proteins in a hybridization space
 Bioinformatics
, 2004
"... Motivation: The localization of a protein in a cell is closely correlated with its biological function. With the number of sequences entering into databanks rapidly increasing, the importance of developing a powerful highthroughput tool to determine protein subcellular location has become selfevid ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
(Show Context)
Motivation: The localization of a protein in a cell is closely correlated with its biological function. With the number of sequences entering into databanks rapidly increasing, the importance of developing a powerful highthroughput tool to determine protein subcellular location has become selfevident. In view of this, the Nearest Neighbour Algorithm was developed for predicting the protein subcellular location using the strategy of hybridizing the information derived from the recent development in gene ontology with that from the functional domain composition as well as the pseudo amino acid composition. Results: As a showcase, the same plant and nonplant protein datasets as investigated by the previous investigators were used for demonstration. The overall success rate of the jackknife test for the plant protein dataset was 86%, and that for the nonplant protein dataset 91.2%. These are the highest success rates achieved so far for the two datasets by following a rigorous crossvalidation test procedure, suggesting that such a hybrid approach (particularly by incorporating the knowledge of gene ontology) may become a very useful highthroughput tool in the area of bioinformatics, proteomics, as well as molecular cell biology. Availability: The software would be made available on sending a request to the authors. Contact: