Results 1  10
of
196
An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions
 ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1994
"... Consider a set S of n data points in real ddimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q 2 R d , the closest point of S to q can be reported quickly. Given any po ..."
Abstract

Cited by 818 (31 self)
 Add to MetaCart
Consider a set S of n data points in real ddimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q 2 R d , the closest point of S to q can be reported quickly. Given any positive real ffl, a data point p is a (1 + ffl)approximate nearest neighbor of q if its distance from q is within a factor of (1 + ffl) of the distance to the true nearest neighbor. We show that it is possible to preprocess a set of n points in R d in O(dn log n) time and O(dn) space, so that given a query point q 2 R d , and ffl ? 0, a (1 + ffl)approximate nearest neighbor of q can be computed in O(c d;ffl log n) time, where c d;ffl d d1 + 6d=ffle d is a factor depending only on dimension and ffl. In general, we show that given an integer k 1, (1 + ffl)approximations to the k nearest neighbors of q can be computed in additional O(kd log n) time.
Nearoptimal hashing algorithms for approximate nearest neighbor in high dimensions
, 2008
"... In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The ..."
Abstract

Cited by 265 (5 self)
 Add to MetaCart
In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The problem is of significant interest in a wide variety of areas.
Navigating nets: Simple algorithms for proximity search (Extended Abstract)
, 2004
"... Robert Krauthgamer # James R. Lee + Abstract We present a simple deterministic data structure for maintaining a set S of points in a general metric space, while supporting proximity search (nearest neighbor and range queries) and updates to S (insertions and deletions). Our data structure consists ..."
Abstract

Cited by 127 (13 self)
 Add to MetaCart
Robert Krauthgamer # James R. Lee + Abstract We present a simple deterministic data structure for maintaining a set S of points in a general metric space, while supporting proximity search (nearest neighbor and range queries) and updates to S (insertions and deletions). Our data structure consists of a sequence of progressively finer #nets of S, with pointers that allow us to navigate easily from one scale to the next.
Efficient Similarity Search and Classification Via Rank Aggregation
 In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data
, 2003
"... We propose a novel approach to performing efficient similarity search and classification in high dimensional data. In this framework, the database elements are vectors in a Euclidean space. Given a query vector in the same space, the goal is to find elements of the database that are similar to the ..."
Abstract

Cited by 123 (5 self)
 Add to MetaCart
We propose a novel approach to performing efficient similarity search and classification in high dimensional data. In this framework, the database elements are vectors in a Euclidean space. Given a query vector in the same space, the goal is to find elements of the database that are similar to the query. In our approach, a small number of independent "voters" rank the database elements based on similarity to the query. These rankings are then combined by a highly efficient aggregation algorithm. Our methodology leads both to techniques for computing approximate nearest neighbors and to a conceptually rich alternative to nearest neighbors.
Approximate Nearest Neighbors and the Fast JohnsonLindenstrauss Transform
 STOC'06
, 2006
"... We introduce a new lowdistortion embedding of ℓ d 2 into O(log n) ℓp (p = 1, 2), called the FastJohnsonLindenstraussTransform. The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized F ..."
Abstract

Cited by 103 (8 self)
 Add to MetaCart
We introduce a new lowdistortion embedding of ℓ d 2 into O(log n) ℓp (p = 1, 2), called the FastJohnsonLindenstraussTransform. The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized Fourier transform. Sparse random projections are unsuitable for lowdistortion embeddings. We overcome this handicap by exploiting the “Heisenberg principle” of the Fourier transform, ie, its localglobal duality. The FJLT can be used to speed up search algorithms based on lowdistortion embeddings in ℓ1 and ℓ2. We consider the case of approximate nearest neighbors in ℓ d 2. We provide a faster algorithm using classical projections, which we then further speed up by plugging in the FJLT. We also give a faster algorithm for searching over the hypercube.
Secure multiparty computation of approximations
, 2001
"... Approximation algorithms can sometimes provide efficient solutions when no efficient exact computation is known. In particular, approximations are often useful in a distributed setting where the inputs are held by different parties and may be extremely large. Furthermore, for some applications, the ..."
Abstract

Cited by 102 (26 self)
 Add to MetaCart
Approximation algorithms can sometimes provide efficient solutions when no efficient exact computation is known. In particular, approximations are often useful in a distributed setting where the inputs are held by different parties and may be extremely large. Furthermore, for some applications, the parties want to compute a function of their inputs securely, without revealing more information than necessary. In this work we study the question of simultaneously addressing the above efficiency and security concerns via what we call secure approximations. We start by extending standard definitions of secure (exact) computation to the setting of secure approximations. Our definitions guarantee that no additional information is revealed by the approximation beyond what follows from the output of the function being approximated. We then study the complexity of specific secure approximation problems. In particular, we obtain a sublinearcommunication protocol for securely approximating the Hamming distance and a polynomialtime protocol for securely approximating the permanent and related #Phard problems. 1
Mean Shift Based Clustering in High Dimensions: A Texture Classification Example
, 2003
"... Feature space analysis is the main module in many computer vision tasks. The most popular technique, kmeans clustering, however, has two inherent limitations: the clusters are constrained to be spherically symmetric and their number has to be known a priori. In nonparametric clustering methods, lik ..."
Abstract

Cited by 100 (2 self)
 Add to MetaCart
(Show Context)
Feature space analysis is the main module in many computer vision tasks. The most popular technique, kmeans clustering, however, has two inherent limitations: the clusters are constrained to be spherically symmetric and their number has to be known a priori. In nonparametric clustering methods, like the one based on mean shift, these limitations are eliminated but the amount of computation becomes prohibitively large as the dimension of the space increases. We exploit a recently proposed approximation technique, localitysensitive hashing (LSH), to reduce the computational complexity of adaptive mean shift. In our implementation of LSH the optimal parameters of the data structure are determined by a pilot learning procedure, and the partitions are data driven. As an application, the performance of mode and kmeans based textons are compared in a texture classification study.
A Replacement for Voronoi Diagrams of Near Linear Size
 In Proc. 42nd Annu. IEEE Sympos. Found. Comput. Sci
, 2001
"... For a set P of n points in R^d, we define a new type of space decomposition. The new diagram provides an εapproximation to the distance function associated with the Voronoi diagram of P, while being of near linear size, for d ≥ 2. This contrasts with the standard Voronoi diagram ..."
Abstract

Cited by 91 (6 self)
 Add to MetaCart
(Show Context)
For a set P of n points in R^d, we define a new type of space decomposition. The new diagram provides an &epsilon;approximation to the distance function associated with the Voronoi diagram of P, while being of near linear size, for d &ge; 2. This contrasts with the standard Voronoi diagram that has complexity &Omega;(n^&lceil;d/2&rceil;) in the worst case.
An investigation of practical approximate nearest neighbor algorithms
, 2004
"... This paper concerns approximate nearest neighbor searching algorithms, which have become increasingly important, especially in high dimensional perception areas such as computer vision, with dozens of publications in recent years. Much of this enthusiasm is due to a successful new approximate neares ..."
Abstract

Cited by 86 (2 self)
 Add to MetaCart
(Show Context)
This paper concerns approximate nearest neighbor searching algorithms, which have become increasingly important, especially in high dimensional perception areas such as computer vision, with dozens of publications in recent years. Much of this enthusiasm is due to a successful new approximate nearest neighbor approach called Locality Sensitive Hashing (LSH). In this paper we ask the question: can earlier spatial data structure approaches to exact nearest neighbor, such as metric trees, be altered to provide approximate answers to proximity queries and if so, how? We introduce a new kind of metric tree that allows overlap: certain datapoints may appear in both the children of a parent. We also introduce new approximate kNN search algorithms on this structure. We show why these structures should be able to exploit the same randomprojectionbased approximations that LSH enjoys, but with a simpler algorithm and perhaps with greater efficiency. We then provide a detailed empirical evaluation on five large, high dimensional datasets which show up to 31fold accelerations over LSH. This result holds true throughout the spectrum of approximation levels.
Sublinear Time Algorithms for Metric Space Problems
"... In this paper we give approximation algorithms for the following problems on metric spaces: Furthest Pair, k median, Minimum Routing Cost Spanning Tree, Multiple Sequence Alignment, Maximum Traveling Salesman Problem, Maximum Spanning Tree and Average Distance. The key property of our algorithms i ..."
Abstract

Cited by 82 (2 self)
 Add to MetaCart
In this paper we give approximation algorithms for the following problems on metric spaces: Furthest Pair, k median, Minimum Routing Cost Spanning Tree, Multiple Sequence Alignment, Maximum Traveling Salesman Problem, Maximum Spanning Tree and Average Distance. The key property of our algorithms is that their running time is linear in the number of metric space points. As the full specification o`f an npoint metric space is of size \Theta(n 2 ), the complexity of our algorithms is sublinear with respect to the input size. All previous algorithms (exact or approximate) for the problems we consider have running time\Omega\Gamma n 2 ). We believe that our techniques can be applied to get similar bounds for other problems. 1 Introduction In recent years there has been a dramatic growth of interest in algorithms operating on massive data sets. This poses new challenges for algorithm design, as algorithms quite efficient on small inputs (for example, having quadratic running time) ...