Results 1  10
of
91
Nearoptimal hashing algorithms for approximate nearest neighbor in high dimensions
, 2008
"... In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The ..."
Abstract

Cited by 265 (5 self)
 Add to MetaCart
In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The problem is of significant interest in a wide variety of areas.
Approximate Nearest Neighbors and the Fast JohnsonLindenstrauss Transform
 STOC'06
, 2006
"... We introduce a new lowdistortion embedding of ℓ d 2 into O(log n) ℓp (p = 1, 2), called the FastJohnsonLindenstraussTransform. The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized F ..."
Abstract

Cited by 103 (8 self)
 Add to MetaCart
We introduce a new lowdistortion embedding of ℓ d 2 into O(log n) ℓp (p = 1, 2), called the FastJohnsonLindenstraussTransform. The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized Fourier transform. Sparse random projections are unsuitable for lowdistortion embeddings. We overcome this handicap by exploiting the “Heisenberg principle” of the Fourier transform, ie, its localglobal duality. The FJLT can be used to speed up search algorithms based on lowdistortion embeddings in ℓ1 and ℓ2. We consider the case of approximate nearest neighbors in ℓ d 2. We provide a faster algorithm using classical projections, which we then further speed up by plugging in the FJLT. We also give a faster algorithm for searching over the hypercube.
Fast construction of nets in lowdimensional metrics and their applications
 SIAM Journal on Computing
, 2006
"... We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This datastructure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, wellseparated pair decomposition, s ..."
Abstract

Cited by 101 (12 self)
 Add to MetaCart
We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This datastructure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, wellseparated pair decomposition, spanner construction, compact representation scheme, doubling measure, and computation of the (approximate) Lipschitz constant of a function. In all cases, the running (preprocessing) time is near linear and the space being used is linear. 1
Nearest Neighbors In HighDimensional Spaces
, 2004
"... In this chapter we consider the following problem: given a set P of points in a highdimensional space, construct a data structure which given any query point q nds the point in P closest to q. This problem, called nearest neighbor search is of significant importance to several areas of computer sci ..."
Abstract

Cited by 82 (2 self)
 Add to MetaCart
In this chapter we consider the following problem: given a set P of points in a highdimensional space, construct a data structure which given any query point q nds the point in P closest to q. This problem, called nearest neighbor search is of significant importance to several areas of computer science, including pattern recognition, searching in multimedial data, vector compression [GG91], computational statistics [DW82], and data mining. Many of these applications involve data sets which are very large (e.g., a database containing Web documents could contain over one billion documents). Moreover, the dimensionality of the points is usually large as well (e.g., in the order of a few hundred). Therefore, it is crucial to design algorithms which scale well with the database size as well as with the dimension. The nearestneighbor problem is an example of a large class of proximity problems, which, roughly speaking, are problems whose definitions involve the notion of...
Faster CoreSet Constructions and Data Stream Algorithms in Fixed Dimensions
 Comput. Geom. Theory Appl
, 2003
"... We speed up previous (1 + ")factor approximation algorithms for a number of geometric optimization problems in xed dimensions: diameter, width, minimumradius enclosing cylinder, minimumwidth annulus, minimumvolume bounding box, minimumwidth cylindrical shell, etc. ..."
Abstract

Cited by 68 (3 self)
 Add to MetaCart
(Show Context)
We speed up previous (1 + ")factor approximation algorithms for a number of geometric optimization problems in xed dimensions: diameter, width, minimumradius enclosing cylinder, minimumwidth annulus, minimumvolume bounding box, minimumwidth cylindrical shell, etc.
Coresets for kMeans and kMedian Clustering and their Applications
 In Proc. 36th Annu. ACM Sympos. Theory Comput
, 2003
"... In this paper, we show the existence of small coresets for the problems of computing kmedian and kmeans clustering for points in low dimension. In other words, we show that given a point set P in IR , one can compute a weighted set S P , of size log n), such that one can compute the kmed ..."
Abstract

Cited by 48 (13 self)
 Add to MetaCart
In this paper, we show the existence of small coresets for the problems of computing kmedian and kmeans clustering for points in low dimension. In other words, we show that given a point set P in IR , one can compute a weighted set S P , of size log n), such that one can compute the kmedian/means clustering on S instead of on P , and get an (1 + ")approximation.
Fast dimension reduction using Rademacher series on dual BCH codes
, 2007
"... The Fast JohnsonLindenstrauss Transform (FJLT) was recently discovered by Ailon and Chazelle as a novel technique for performing fast dimension reduction with small distortion from ℓ d 2 to ℓ k 2 in time O(max{d log d, k 3}). For k in [Ω(log d), O(d 1/2)] this beats time O(dk) achieved by naive mul ..."
Abstract

Cited by 46 (10 self)
 Add to MetaCart
The Fast JohnsonLindenstrauss Transform (FJLT) was recently discovered by Ailon and Chazelle as a novel technique for performing fast dimension reduction with small distortion from ℓ d 2 to ℓ k 2 in time O(max{d log d, k 3}). For k in [Ω(log d), O(d 1/2)] this beats time O(dk) achieved by naive multiplication by random dense matrices, an approach followed by several authors as a variant of the seminal result by Johnson and Lindenstrauss (JL) from the mid 80’s. In this work we show how to significantly improve the running time to O(d log k) for k = O(d1/2−δ), for any arbitrary small fixed δ. This beats the better of FJLT and JL. Our analysis uses a powerful measure concentration bound due to Talagrand applied to Rademacher series in Banach spaces (sums of vectors in Banach spaces with random signs). The set of vectors used is a real embedding of dual BCH code vectors over GF (2). We also discuss the number of random bits used and reduction to ℓ1 space. The connection between geometry and discrete coding theory discussed here is interesting in its own right and may be useful in other algorithmic applications as well. 1
LinearSize Approximate Voronoi Diagrams
 In Proc. 13th ACMSIAM Sympos. Discrete Algorithms
, 2002
"... a (t; ffl)approximate Voronoi diagram (AVD) is a partition of space into constant complexity cells, where each cell c is associated with t representative points of S, such that for any point in c, one of the associated representatives approximates the nearest neighbor to within a factor of (1+ ffl) ..."
Abstract

Cited by 43 (13 self)
 Add to MetaCart
(Show Context)
a (t; ffl)approximate Voronoi diagram (AVD) is a partition of space into constant complexity cells, where each cell c is associated with t representative points of S, such that for any point in c, one of the associated representatives approximates the nearest neighbor to within a factor of (1+ ffl). The goal is to minimize the number and complexity of the cells in the AVD. We show that it is possible to construct an AVD consisting of O(n=ffl cells for t = 1, and O(n) cells for t = O(1=ffl ). In general, for a real parameter 2 fl 1=ffl, we show that it is possible to construct a (t; ffl)AVD consisting of O(nfl cells for t = O(1=(fflfl) ). The cells in these AVDs are cubes or differences of two cubes. All these structures can be used to efficiently answer approximate nearest neighbor queries. Our algorithms are based on the wellseparated pair decomposition and are very simple.
Entropy based nearest neighbor search in high dimensions
 In SODA ’06: Proceedings of the seventeenth annual ACMSIAM Symposium on Discrete Algorithms
"... In this paper we study the problem of finding the approximate nearest neighbor of a query point in the high dimensional space, focusing on the Euclidean space. The earlier approaches use localitypreserving hash functions (that tend to map nearby points to the same value) to construct several hash ta ..."
Abstract

Cited by 30 (5 self)
 Add to MetaCart
(Show Context)
In this paper we study the problem of finding the approximate nearest neighbor of a query point in the high dimensional space, focusing on the Euclidean space. The earlier approaches use localitypreserving hash functions (that tend to map nearby points to the same value) to construct several hash tables to ensure that the query point hashes to the same bucket as its nearest neighbor in at least one table. Our approach is different – we use one (or a few) hash table and hash several randomly chosen points in the neighborhood of the query point showing that at least one of them will hash to the bucket containing its nearest neighbor. We show that the number of randomly chosen points in the neighborhood of the query point q required depends on the entropy of the hash value h(p) of a random point p at the same distance from q at its nearest neighbor, given q and the locality preserving hash function h chosen randomly from the hash family. Precisely, we show that if the entropy I(h(p)q, h) = M and g is a bound on the probability that two faroff points will hash to the same bucket, then we can find the approximate nearest neighbor in O(nρ) time and near linear Õ(n) space where ρ = M / log(1/g). Alternatively we can build a data structure of size Õ(n1/(1−ρ)) to answer queries in Õ(d) time. By applying this analysis to the locality preserving hash functions in [15, 19, 6] and adjusting the parameters we show that the c nearest neighbor can be computed in time Õ(nρ) and near linear space where ρ ≈ 2.06/c as c becomes large. 1
The blackbox complexity of nearest neighbor search
 In 31st International Colloquium on Automata, Languages and Programming
, 2004
"... We define a natural notion of efficiency for approximate nearestneighbor (ANN) search in general npoint metric spaces, namely the existence of a randomized algorithm which answers (1 + ε)approximate nearest neighbor queries in polylog(n) time using only polynomial space. We then study which famil ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
We define a natural notion of efficiency for approximate nearestneighbor (ANN) search in general npoint metric spaces, namely the existence of a randomized algorithm which answers (1 + ε)approximate nearest neighbor queries in polylog(n) time using only polynomial space. We then study which families of metric spaces admit efficient ANN schemes in the blackbox model, where only oracle access to the distance function is given, and any query consistent with the triangle inequality may be asked. For ε < 2 5, we offer a complete answer to this problem. Using the notion of metric dimension defined in [GKL03] (à la [Ass83]), we show that a metric space X admits an efficient (1+ε)ANN scheme for any ε < 2 5 if and only if dim(X) = O(log log n). For coarser approximations, clearly the upper bound continues to hold, but there is a threshold at which our lower bound breaks down—this is precisely when points in the “ambient space ” may begin to affect the complexity of “hard ” subspaces S ⊆ X. Indeed, we give examples which show that dim(X) does not characterize the blackbox complexity of ANN above the threshold. Our scheme for ANN in lowdimensional metric spaces is the first to yield efficient algorithms without relying on any additional assumptions on the input. In previous approaches (e.g., [Cla99, KR02, KL04, HKMR04]), even spaces with dim(X) = O(1) sometimes required Ω(n) query times. 1