Results 1  10
of
15
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality
, 1998
"... The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the ddimens ..."
Abstract

Cited by 759 (35 self)
 Add to MetaCart
The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the ddimensional Euclidean space where X = ! d under some l p norm. Despite decades of effort, the current solutions are far from satisfactory; in fact, for large d, in theory or in practice, they provide little improvement over the bruteforce algorithm which compares the query point to each data point. Of late, there has been some interest in the approximate nearest neighbors problem, which is: Find a point p 2 P that is an fflapproximate nearest neighbor of the query q in that for all p 0 2 P , d(p; q) (1 + ffl)d(p 0 ; q). We present two algorithmic results for the approximate version that significantly improve the known bounds: (a) preprocessing cost polynomial in n and d, and a trul...
Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces
, 1998
"... We address the problem of designing data structures that allow efficient search for approximate nearest neighbors. More specifically, given a database consisting of a set of vectors in some high dimensional Euclidean space, we want to construct a spaceefficient data structure that would allow us to ..."
Abstract

Cited by 196 (9 self)
 Add to MetaCart
We address the problem of designing data structures that allow efficient search for approximate nearest neighbors. More specifically, given a database consisting of a set of vectors in some high dimensional Euclidean space, we want to construct a spaceefficient data structure that would allow us to search, given a query vector, for the closest or nearly closest vector in the database. We also address this problem when distances are measured by the L 1 norm, and in the Hamming cube. Significantly improving and extending recent results of Kleinberg, we construct data structures whose size is polynomial in the size of the database, and search algorithms that run in time nearly linear or nearly quadratic in the dimension (depending on the case; the extra factors are polylogarithmic in the size of the database). Computer Science Department, Technion  IIT, Haifa 32000, Israel. Email: eyalk@cs.technion.ac.il y Bell Communications Research, MCC1C365B, 445 South Street, Morristown, NJ ...
Lower bounds for high dimensional nearest neighbor search and related problems
, 1999
"... In spite of extensive and continuing research, for various geometric search problems (such as nearest neighbor search), the best algorithms known have performance that degrades exponentially in the dimension. This phenomenon is sometimes called the curse of dimensionality. Recent results [38, 37, 40 ..."
Abstract

Cited by 47 (2 self)
 Add to MetaCart
In spite of extensive and continuing research, for various geometric search problems (such as nearest neighbor search), the best algorithms known have performance that degrades exponentially in the dimension. This phenomenon is sometimes called the curse of dimensionality. Recent results [38, 37, 40] show that in some sense it is possible to avoid the curse of dimensionality for the approximate nearest neighbor search problem. But must the exact nearest neighbor search problem suffer this curse? We provide some evidence in support of the curse. Specifically we investigate the exact nearest neighbor search problem and the related problem of exact partial match within the asymmetric communication model first used by Miltersen [43] to study data structure problems. We derive nontrivial asymptotic lower bounds for the exact problem that stand in contrast to known algorithms for approximate nearest neighbor search. 1
Approximate Dictionary Queries
, 1996
"... . Given a set of n binary strings of length m each. We consider the problem of answering dqueries. Given a binary query string ff of length m, a dquery is to report if there exists a string in the set within Hamming distance d of ff. We present a data structure of size O(nm) supporting 1queri ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
. Given a set of n binary strings of length m each. We consider the problem of answering dqueries. Given a binary query string ff of length m, a dquery is to report if there exists a string in the set within Hamming distance d of ff. We present a data structure of size O(nm) supporting 1queries in time O(m) and the reporting of all strings within Hamming distance 1 of ff in time O(m). The data structure can be constructed in time O(nm). A slightly modified version of the data structure supports the insertion of new strings in amortized time O(m). 1 Introduction Let W = fw 1 ; : : : ; wng be a set of n binary strings of length m each, i.e. w i 2 f0; 1g m . The set W is called the dictionary. We are interested in answering d queries, i.e. for any query string ff 2 f0; 1g m to decide if there is a string w i in W with at most Hamming distance d of ff. Minsky and Papert originally raised this problem in [12]. Recently a sequence of papers have considered how to solve thi...
Nearest Neighbor Search in Multidimensional Spaces
, 1999
"... The Nearest Neighbor Search problem is defined as follows: given a set P of n points, preprocess the points so as to efficiently answer queries that require finding the closest point in P to a query point q. If we are willing to settle for a point that is almost as close as the nearest neighbor, t ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
The Nearest Neighbor Search problem is defined as follows: given a set P of n points, preprocess the points so as to efficiently answer queries that require finding the closest point in P to a query point q. If we are willing to settle for a point that is almost as close as the nearest neighbor, then we can relax the problem to the approximate Nearest Neighbor Search. Nearest Neighbor Search (exact or approximate) is an integral component in a wide range of applications that include multimedia databases, computational biology, data mining, and information retrieval. The common thread in all these applications is similarity search: given a database of objects, we want to return the object in the database that is most similar to a query object. The objects are mapped onto points in a high dimensional metric space , and similarity search reduces to a nearest neighbor search. The dimension of the underlying space may be in the order of a few hundreds, or thousands; therefore, we r...
The Bit Vector Intersection Problem
, 1995
"... This paper introduces the bit vector intersection problem: given a large collection of sparse bit vectors, find all the pairs with at least t ones in common for a given input parameter t. The assumption is that the number of ones common to any two vectors is significantly less than t, except for an ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
This paper introduces the bit vector intersection problem: given a large collection of sparse bit vectors, find all the pairs with at least t ones in common for a given input parameter t. The assumption is that the number of ones common to any two vectors is significantly less than t, except for an unknown set of O(n) pairs. This problem has important applications in DNA physical mapping, clustering, and searching for approximate dictionary matches. We present two randomized algorithins that solve this problem with high probability and in subquadratic expected time. One of these algorithms is based on a recursive treesearching procedure, and the other on hashing. We analyze the tree scheme in terms of branching processes, while our analysis of the hashing scheme is based on M,arkov chains. Since both algorithms have similar asymptotic performance, we also examine experimentally their relative merits in practical situations. We conclude by showing that a fundamental problem arising in the Human Genome Project is captured by the bit vector intersection problem described above and hence can be solved by our algorithms.
Improved Bounds for Dictionary Lookup with One Error
 Information Processing Letters
, 2000
"... Given a dictionary S of n binary strings each of length m, we consider the problem of designing a data structure for S that supports dqueries; given a binary query string q of length m, a dquery reports if there exists a string in S within Hamming distance d of q. We construct a data structure for ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Given a dictionary S of n binary strings each of length m, we consider the problem of designing a data structure for S that supports dqueries; given a binary query string q of length m, a dquery reports if there exists a string in S within Hamming distance d of q. We construct a data structure for the case d = 1, that requires space O(n log m) and has query time O(1) in a cell probe model with word size m. This generalizes and improves the previous bounds of Yao and Yao for the problem in the bit probe model. The data structure can be constructed in randomized expected time O(nm). Key words: Data Structures, Dictionaries, Hashing, Hamming Distance 1 Introduction Minsky and Papert in 1969 posed the following problem, that has remained a challenge in data structure design [9]. Let S be a set of n binary strings of length m each. We want to construct a data structure for S that supports fast dqueries; that is, given a binary 1 BRICS (Basic Research in Computer Science), a Center o...
Dictionary LookUp Within Small Edit Distance
 In Proc. 8th Annual Intl. Computing and Combinatorics Conference (COCOON’02
, 2002
"... Let W be a dictionary consisting of n binary strings of length m each, represented as a trie. The usual dquery asks if there exists a string in W within Hamming distance d of a given binary query string q. We present an algorithm to determine if there is a member in W within edit distance d of ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Let W be a dictionary consisting of n binary strings of length m each, represented as a trie. The usual dquery asks if there exists a string in W within Hamming distance d of a given binary query string q. We present an algorithm to determine if there is a member in W within edit distance d of a given query string q of length m. The method takes time O(dm ) in the RAM model, independent of n, and requires O(dm) additional space.
Efficient approximate dictionary lookup over small alphabets
, 2005
"... Given a dictionary W consisting of n binary strings of length m each, a dquery asks if there exists a string in W within Hamming distance d of a given binary query string q. The problem was posed by Minsky and Papert in 1969 [10] as a challenge to data structure design. Efficient solutions have bee ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Given a dictionary W consisting of n binary strings of length m each, a dquery asks if there exists a string in W within Hamming distance d of a given binary query string q. The problem was posed by Minsky and Papert in 1969 [10] as a challenge to data structure design. Efficient solutions have been developed only for the special case when d = 1 (the 1query problem). We assume the standard RAM model of computation, and consider the case of the problem when alphabet size is arbitrary but finite, and d is small. We preprocess the dictionary, and construct an edgelabelled tree with bounded branching factor, and height. We present an algorithm to answer dictionary lookup within given distance d of a given query string q. The algorithm is efficient when the alphabet size is small, or the dictionary is sparse. In particular, for the dquery problem the algorithm takes time O(m(log 4/3 n − 1) d (log 2 n) d+1). This is an improvement over previously known algorithms for the dquery problem when d> 1. We also generalize the results for the case of the problem when edit distances are used. The algorithm can be modified such that it allows for words of different lengths as well as different lengths of query strings. 1
Stereoscopic families of permutations, and their applications
 In 5th IEEE Israel Symposium on the Theory of Computing and Systems
, 1997
"... A stereoscopic family of permutations maps anmdimensional mesh into several 1dimensional lines, in a way that jointly preserves distance information. Specifically, consider any two points and denote their distance on themdimensional mesh byd. Then the distance between their images, on the line on ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
A stereoscopic family of permutations maps anmdimensional mesh into several 1dimensional lines, in a way that jointly preserves distance information. Specifically, consider any two points and denote their distance on themdimensional mesh byd. Then the distance between their images, on the line on which these images are closest together, isO(dm). We initiate a systematic study of stereoscopic families of permutations. We show a construction of these families that involves the use ofm+1 images. We also show that under some additional restrictions (namely, adjacent points on the image lines originate at points which are not too far away on the mesh), three images are necessary in order to construct such a family for the 2dimensional mesh. We present two applications for stereoscopic families of permutations. One application is an algorithm for routing on the mesh that guarantees delivery of each packet within a number of steps that depends upon the distance between this packet’s source and somei2Fwithji(x)�i(y)j= familyF=f1;:::;jFjg annn destination, but is independent of the size of the mesh. Our algorithm is exceptionally simple, involves no queues, and can be used in dynamic settings in which packets are continuously generated. Another application is an extension of the construction of nonexpansive hash functions of Linial and Sasson (STOC 96) from the case of one dimensional metrics to arbitrary dimensions. 1