Results 1 - 10
of
57
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
, 2008
"... In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The ..."
Abstract
-
Cited by 131 (1 self)
- Add to MetaCart
In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The problem is of significant interest in a wide variety of areas.
Fast construction of nets in low dimensional metrics, and their applications
- SIAM J. Comput
, 2005
"... We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This data-structure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, well-separated pair decomposition, s ..."
Abstract
-
Cited by 75 (7 self)
- Add to MetaCart
We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This data-structure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, well-separated pair decomposition, spanner construction, compact representation scheme, doubling measure, and computation of the (approximate) Lipschitz constant of a function. In all cases, the running (preprocessing) time is near linear and the space being used is linear. 1
Nearest Neighbors In High-Dimensional Spaces
, 2004
"... In this chapter we consider the following problem: given a set P of points in a high-dimensional space, construct a data structure which given any query point q nds the point in P closest to q. This problem, called nearest neighbor search is of significant importance to several areas of computer sci ..."
Abstract
-
Cited by 63 (2 self)
- Add to MetaCart
In this chapter we consider the following problem: given a set P of points in a high-dimensional space, construct a data structure which given any query point q nds the point in P closest to q. This problem, called nearest neighbor search is of significant importance to several areas of computer science, including pattern recognition, searching in multimedial data, vector compression [GG91], computational statistics [DW82], and data mining. Many of these applications involve data sets which are very large (e.g., a database containing Web documents could contain over one billion documents). Moreover, the dimensionality of the points is usually large as well (e.g., in the order of a few hundred). Therefore, it is crucial to design algorithms which scale well with the database size as well as with the dimension. The nearest-neighbor problem is an example of a large class of proximity problems, which, roughly speaking, are problems whose definitions involve the notion of...
Faster Core-Set Constructions and Data Stream Algorithms in Fixed Dimensions
- Comput. Geom. Theory Appl
, 2003
"... We speed up previous (1 + ")-factor approximation algorithms for a number of geometric optimization problems in xed dimensions: diameter, width, minimum-radius enclosing cylinder, minimum-width annulus, minimum-volume bounding box, minimum-width cylindrical shell, etc. ..."
Abstract
-
Cited by 58 (3 self)
- Add to MetaCart
We speed up previous (1 + ")-factor approximation algorithms for a number of geometric optimization problems in xed dimensions: diameter, width, minimum-radius enclosing cylinder, minimum-width annulus, minimum-volume bounding box, minimum-width cylindrical shell, etc.
Coresets for k-Means and k-Median Clustering and their Applications
- In Proc. 36th Annu. ACM Sympos. Theory Comput
, 2003
"... In this paper, we show the existence of small coresets for the problems of computing k-median and k-means clustering for points in low dimension. In other words, we show that given a point set P in IR , one can compute a weighted set S P , of size log n), such that one can compute the k-med ..."
Abstract
-
Cited by 41 (13 self)
- Add to MetaCart
In this paper, we show the existence of small coresets for the problems of computing k-median and k-means clustering for points in low dimension. In other words, we show that given a point set P in IR , one can compute a weighted set S P , of size log n), such that one can compute the k-median/means clustering on S instead of on P , and get an (1 + ")-approximation.
Linear-Size Approximate Voronoi Diagrams
- In Proc. 13th ACM-SIAM Sympos. Discrete Algorithms
, 2002
"... a (t; ffl)-approximate Voronoi diagram (AVD) is a partition of space into constant complexity cells, where each cell c is associated with t representative points of S, such that for any point in c, one of the associated representatives approximates the nearest neighbor to within a factor of (1+ ffl) ..."
Abstract
-
Cited by 39 (11 self)
- Add to MetaCart
a (t; ffl)-approximate Voronoi diagram (AVD) is a partition of space into constant complexity cells, where each cell c is associated with t representative points of S, such that for any point in c, one of the associated representatives approximates the nearest neighbor to within a factor of (1+ ffl). The goal is to minimize the number and complexity of the cells in the AVD. We show that it is possible to construct an AVD consisting of O(n=ffl cells for t = 1, and O(n) cells for t = O(1=ffl ). In general, for a real parameter 2 fl 1=ffl, we show that it is possible to construct a (t; ffl)-AVD consisting of O(nfl cells for t = O(1=(fflfl) ). The cells in these AVDs are cubes or differences of two cubes. All these structures can be used to efficiently answer approximate nearest neighbor queries. Our algorithms are based on the well-separated pair decomposition and are very simple.
The black-box complexity of nearest neighbor search
- In 31st International Colloquium on Automata, Languages and Programming
, 2004
"... We define a natural notion of efficiency for approximate nearest-neighbor (ANN) search in general n-point metric spaces, namely the existence of a randomized algorithm which answers (1 + ε)-approximate nearest neighbor queries in polylog(n) time using only polynomial space. We then study which famil ..."
Abstract
-
Cited by 27 (2 self)
- Add to MetaCart
We define a natural notion of efficiency for approximate nearest-neighbor (ANN) search in general n-point metric spaces, namely the existence of a randomized algorithm which answers (1 + ε)-approximate nearest neighbor queries in polylog(n) time using only polynomial space. We then study which families of metric spaces admit efficient ANN schemes in the black-box model, where only oracle access to the distance function is given, and any query consistent with the triangle inequality may be asked. For ε < 2 5, we offer a complete answer to this problem. Using the notion of metric dimension defined in [GKL03] (à la [Ass83]), we show that a metric space X admits an efficient (1+ε)-ANN scheme for any ε < 2 5 if and only if dim(X) = O(log log n). For coarser approximations, clearly the upper bound continues to hold, but there is a threshold at which our lower bound breaks down—this is precisely when points in the “ambient space ” may begin to affect the complexity of “hard ” subspaces S ⊆ X. Indeed, we give examples which show that dim(X) does not characterize the black-box complexity of ANN above the threshold. Our scheme for ANN in low-dimensional metric spaces is the first to yield efficient algorithms without relying on any additional assumptions on the input. In previous approaches (e.g., [Cla99, KR02, KL04, HKMR04]), even spaces with dim(X) = O(1) sometimes required Ω(n) query times. 1
Fast dimension reduction using Rademacher series on dual BCH codes
, 2007
"... The Fast Johnson-Lindenstrauss Transform (FJLT) was recently discovered by Ailon and Chazelle as a novel technique for performing fast dimension reduction with small distortion from ℓ d 2 to ℓ k 2 in time O(max{d log d, k 3}). For k in [Ω(log d), O(d 1/2)] this beats time O(dk) achieved by naive mul ..."
Abstract
-
Cited by 26 (9 self)
- Add to MetaCart
The Fast Johnson-Lindenstrauss Transform (FJLT) was recently discovered by Ailon and Chazelle as a novel technique for performing fast dimension reduction with small distortion from ℓ d 2 to ℓ k 2 in time O(max{d log d, k 3}). For k in [Ω(log d), O(d 1/2)] this beats time O(dk) achieved by naive multiplication by random dense matrices, an approach followed by several authors as a variant of the seminal result by Johnson and Lindenstrauss (JL) from the mid 80’s. In this work we show how to significantly improve the running time to O(d log k) for k = O(d1/2−δ), for any arbitrary small fixed δ. This beats the better of FJLT and JL. Our analysis uses a powerful measure concentration bound due to Talagrand applied to Rademacher series in Banach spaces (sums of vectors in Banach spaces with random signs). The set of vectors used is a real embedding of dual BCH code vectors over GF (2). We also discuss the number of random bits used and reduction to ℓ1 space. The connection between geometry and discrete coding theory discussed here is interesting in its own right and may be useful in other algorithmic applications as well. 1
Practical Methods for Shape Fitting and Kinetic Data Structures using Core Sets
- In Proc. 20th Annu. ACM Sympos. Comput. Geom
, 2004
"... The notion of ε-kernel was introduced by Agarwal et al. [5] to set up a unified framework for computing various extent measures of a point set P approximately. Roughly speaking, a subset Q ⊆ P is an ε-kernel of P if for every slab W containing Q, the expanded slab (1 + ε)W contains P. They illustrat ..."
Abstract
-
Cited by 26 (8 self)
- Add to MetaCart
The notion of ε-kernel was introduced by Agarwal et al. [5] to set up a unified framework for computing various extent measures of a point set P approximately. Roughly speaking, a subset Q ⊆ P is an ε-kernel of P if for every slab W containing Q, the expanded slab (1 + ε)W contains P. They illustrated the significance of ε-kernel by showing that it yields approximation algorithms for a wide range of geometric optimization problems. We present a simpler and more practical algorithm for computing the ε-kernel of a set P of points in R d. We demonstrate the practicality of our algorithm by showing its empirical performance on various inputs. We then describe an incremental algorithm for fitting various shapes and use the ideas of our algorithm for computing ε-kernels to analyze the performance of this algorithm. We illustrate the versatility and practicality of this technique by implementing approximation algorithms for minimum enclosing cylinder, minimum-volume bounding box, and minimum-width annulus. Finally, we show that ε-kernels can be effectively used to expedite the algorithms for maintaining extents of moving points. 1
Space-efficient approximate Voronoi diagrams
- In Proc. 34th Annual ACM Sympos. Theory Comput
, 2002
"... Given a set S of n points in IR d, a (t, ǫ)-approximate Voronoi diagram (AVD) is a partition of space into constant complexity cells, where each cell c is associated with t representative points of S, such that for any point in c, one of the associated representatives approximates the nearest neighb ..."
Abstract
-
Cited by 23 (8 self)
- Add to MetaCart
Given a set S of n points in IR d, a (t, ǫ)-approximate Voronoi diagram (AVD) is a partition of space into constant complexity cells, where each cell c is associated with t representative points of S, such that for any point in c, one of the associated representatives approximates the nearest neighbor to within a factor of (1 + ǫ). Like the Voronoi diagram, this structure defines a spatial subdivision. It also has the desirable properties of being easy to construct and providing a simple and practical data structure for answering approximate nearest neighbor queries. The goal is to minimize the number and complexity of the cells in the AVD. We assume that the dimension d is fixed. Given a real parameter γ, where 2 ≤ γ ≤ 1/ǫ, we show that it is possible to construct a (t, ǫ)-AVD consisting of O(nǫ d−1 2 γ 3(d−1) 2 log γ) cells for t = O(1/(ǫγ) (d−1)/2). This yields a data structure of O(nγ d−1 log γ) space (including the space for representatives) that can answer ǫ-NN queries in time O(log(nγ) + 1/(ǫγ) (d−1)/2). (Hidden constants may depend exponentially on d, but do not depend on ǫ or γ). In the case γ = 1/ǫ, we show that the additional log γ factor in space can be avoided, and so we have a data structure that answers ǫ-approximate nearest neighbor queries in time O(log(n/ǫ)) with space O(n/ǫ d−1), improving upon the best known space bounds for this query time. In the case γ = 2, we have a data structure that can answer approximate nearest neighbor queries in O(log n + 1/ǫ (d−1)/2) time using optimal O(n) space. This dramatically improves the ∗ The work of the first two authors was supported by the

