Results 1  10
of
88
Cover trees for nearest neighbor
 In Proceedings of the 23rd international conference on Machine learning
, 2006
"... ABSTRACT. We present a tree data structure for fast nearest neighbor operations in generalpoint metric spaces. The data structure requires space regardless of the metric’s structure. If the point set has an expansion constant � in the sense of Karger and Ruhl [KR02], the data structure can be const ..."
Abstract

Cited by 143 (0 self)
 Add to MetaCart
ABSTRACT. We present a tree data structure for fast nearest neighbor operations in generalpoint metric spaces. The data structure requires space regardless of the metric’s structure. If the point set has an expansion constant � in the sense of Karger and Ruhl [KR02], the data structure can be constructed in � time. Nearest neighbor queries obeying the expansion bound require � time. In addition, the nearest neighbor of points can be queried in time. We experimentally test the algorithm showing speedups over the brute force search varying between 1 and 2000 on natural machine learning datasets. 1.
Fast construction of nets in lowdimensional metrics and their applications
 SIAM Journal on Computing
, 2006
"... We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This datastructure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, wellseparated pair decomposition, s ..."
Abstract

Cited by 98 (11 self)
 Add to MetaCart
We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This datastructure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, wellseparated pair decomposition, spanner construction, compact representation scheme, doubling measure, and computation of the (approximate) Lipschitz constant of a function. In all cases, the running (preprocessing) time is near linear and the space being used is linear. 1
An investigation of practical approximate nearest neighbor algorithms
, 2004
"... This paper concerns approximate nearest neighbor searching algorithms, which have become increasingly important, especially in high dimensional perception areas such as computer vision, with dozens of publications in recent years. Much of this enthusiasm is due to a successful new approximate neares ..."
Abstract

Cited by 82 (2 self)
 Add to MetaCart
This paper concerns approximate nearest neighbor searching algorithms, which have become increasingly important, especially in high dimensional perception areas such as computer vision, with dozens of publications in recent years. Much of this enthusiasm is due to a successful new approximate nearest neighbor approach called Locality Sensitive Hashing (LSH). In this paper we ask the question: can earlier spatial data structure approaches to exact nearest neighbor, such as metric trees, be altered to provide approximate answers to proximity queries and if so, how? We introduce a new kind of metric tree that allows overlap: certain datapoints may appear in both the children of a parent. We also introduce new approximate kNN search algorithms on this structure. We show why these structures should be able to exploit the same randomprojectionbased approximations that LSH enjoys, but with a simpler algorithm and perhaps with greater efficiency. We then provide a detailed empirical evaluation on five large, high dimensional datasets which show up to 31fold accelerations over LSH. This result holds true throughout the spectrum of approximation levels.
LocalitySensitive Binary Codes from ShiftInvariant Kernels,” Advances in neural information processing systems
, 2009
"... This paper addresses the problem of designing binary codes for highdimensional data such that vectors that are similar in the original space map to similar binary strings. We introduce a simple distributionfree encoding scheme based on random projections, such that the expected Hamming distance be ..."
Abstract

Cited by 43 (1 self)
 Add to MetaCart
This paper addresses the problem of designing binary codes for highdimensional data such that vectors that are similar in the original space map to similar binary strings. We introduce a simple distributionfree encoding scheme based on random projections, such that the expected Hamming distance between the binary codes of two vectors is related to the value of a shiftinvariant kernel (e.g., a Gaussian kernel) between the vectors. We present a full theoretical analysis of the convergence properties of the proposed scheme, and report favorable experimental performance as compared to a recent stateoftheart method, spectral hashing. 1
Wireless scheduling with power control
 In Proc. 17th European Symposium on Algorithms (ESA
, 2009
"... We consider the scheduling of arbitrary wireless links in the physical model of interference to minimize the time for satisfying all requests. We study here the combined problem of scheduling and power control, where we seek both an assignment of power settings and a partition of the links so that e ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
We consider the scheduling of arbitrary wireless links in the physical model of interference to minimize the time for satisfying all requests. We study here the combined problem of scheduling and power control, where we seek both an assignment of power settings and a partition of the links so that each set satisfies the signaltointerferenceplusnoise (SINR) constraints. We give an algorithm that attains an approximation ratio of O(log n · log log Λ), where Λ is the ratio between the longest and the shortest linklength. Under the natural assumption that lengths are represented in binary, this gives the first polylog(n)approximation. The algorithm has the desirable property of using an oblivious power assignment, where the power assigned to a sender depends only on the length of the link. We show this dependence on Λ to be unavoidable, giving a construction for which any oblivious power assignment results in a Ω(log log Λ)approximation. We also give a simple online algorithm that yields a O(log Λ)approximation, by a reduction to the coloring of unitdisc graphs. In addition, we obtain improved approximation for a bidirectional variant of the scheduling problem, give partial answers to questions about the utility of graphs for modeling physical interference, and generalize the setting from the standard 2dimensional Euclidean plane to doubling metrics. 1
Reconstruction using witness complexes
 in Proc. 18th ACMSIAM Sympos. on Discrete Algorithms
, 2007
"... We present a novel reconstruction algorithm that, given an input point set sampled from an object S, builds a oneparameter family of complexes that approximate S at different scales. At a high level, our method is very similar in spirit to Chew’s surface meshing algorithm, with one notable differen ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
We present a novel reconstruction algorithm that, given an input point set sampled from an object S, builds a oneparameter family of complexes that approximate S at different scales. At a high level, our method is very similar in spirit to Chew’s surface meshing algorithm, with one notable difference though: the restricted Delaunay triangulation is replaced by the witness complex, which makes our algorithm applicable in any metric space. To prove its correctness on curves and surfaces, we highlight the relationship between the witness complex and the restricted Delaunay triangulation in 2d and in 3d. Specifically, we prove that both complexes are equal in 2d and closely related in 3d, under some mild sampling assumptions. 1
Disorder inequality: A combinatorial approach to nearest neighbor search
 In WSDM’08
"... We say that an algorithm for nearest neighbor search is combinatorial if only direct comparisons between two pairwise similarity values are allowed. Combinatorial algorithms for nearest neighbor search have two important advantages: (1) they do not map similarity values to artificial distance values ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
We say that an algorithm for nearest neighbor search is combinatorial if only direct comparisons between two pairwise similarity values are allowed. Combinatorial algorithms for nearest neighbor search have two important advantages: (1) they do not map similarity values to artificial distance values and do not use the triangle inequality for the latter, and (2) they work for arbitrarily complicated data representations and similarity functions. In this paper we introduce a special property of the similarity function on a set S that leads to efficient combinatorial algorithms for S. The disorder constant D(S) of a set S is defined to ensure the following inequality: if x is the a’th most similar object to z and y is the b’th most similar object to z, then x is among the D(S) · (a + b) most similar objects to y. Assuming that disorder is small we present the first two known combinatorial algorithms for nearest neighbors whose query time has logarithmic dependence on the size of S. The first one, called Ranwalk, is a randomized zeroerror algorithm that always returns the exact nearest neighbor. It uses space quadratic in the input size in preprocessing, but is very efficient in query processing. The second algorithm, called Arwalk, uses nearlinear space. It uses random choices in preprocessing, but the query processing is essentially deterministic. For an arbitrary query q, there is only a small probability that the chosen data structure does not support q. Finally, we show that for the Reuters corpus average disorder is indeed quite small and that Ranwalk efficiently computes the nearest neighbor in most cases.
Clustering Billions of Images with Large Scale Nearest Neighbor Search
 in IEEE Workshop on Applications of Computer Vision
, 2007
"... The proliferation of the web and digital photography have made large scale image collections containing billions of images a reality. Image collections on this scale make performing even the most common and simple computer vision, image processing, and machine learning tasks nontrivial. An example ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
The proliferation of the web and digital photography have made large scale image collections containing billions of images a reality. Image collections on this scale make performing even the most common and simple computer vision, image processing, and machine learning tasks nontrivial. An example is nearest neighbor search, which not only serves as a fundamental subproblem in many more sophisticated algorithms, but also has direct applications, such as image retrieval and image clustering. In this paper, we address the nearest neighbor problem as the first step towards scalable image processing. We describe a scalable version of an approximate nearest neighbor search algorithm and discuss how it can be used to find near duplicates among over a billion images. 1.
Ramsey partitions and proximity data structures
 J. European Math. Soc 9
"... This paper addresses two problems lying at the intersection of geometric analysis and theoretical computer science: The nonlinear isomorphic Dvoretzky theorem and the design of good approximate distance oracles for large distortion. We introduce the notion of Ramsey partitions of a finite metric sp ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
This paper addresses two problems lying at the intersection of geometric analysis and theoretical computer science: The nonlinear isomorphic Dvoretzky theorem and the design of good approximate distance oracles for large distortion. We introduce the notion of Ramsey partitions of a finite metric space, and show that the existence of good Ramsey partitions implies a solution to the metric Ramsey problem for large distortion (a.k.a. the nonlinear version of the isomorphic Dvoretzky theorem, as introduced by Bourgain, Figiel, and Milman in [8]). We then proceed to construct optimal Ramsey partitions, and use them to show that for everyε∈(0, 1), any npoint metric space has a subset of size n 1−ε which embeds into Hilbert space with distortion O(1/ε). This result is best possible and improves part of the metric Ramsey theorem of Bartal, Linial, Mendel and Naor [5], in addition to considerably simplifying its proof. We use our new Ramsey partitions to design the best known approximate distance oracles when the distortion is large, closing a gap left open by Thorup and Zwick in [31]. Namely, we show that for any n point metric space X, and k≥1, there exists an O(k)approximate distance oracle whose storage requirement is O ( n 1+1/k) , and whose query time is a universal constant. We also discuss applications of Ramsey partitions to various other geometric data structure problems, such as the design of efficient data structures for approximate ranking.
Combinatorial algorithms for nearest neighbors, nearduplicates and smallworld design
 In Proceedings of the 20th Annual ACMSIAM Symposium on Discrete Algorithms, SODA’09
, 2009
"... We study the so called combinatorial framework for algorithmic problems in similarity spaces. Namely, the input dataset is represented by a comparison oracle that given three points x, y, y ′ answers whether y or y ′ is closer to x. We assume that the similarity order of the dataset satisfies the fo ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
We study the so called combinatorial framework for algorithmic problems in similarity spaces. Namely, the input dataset is represented by a comparison oracle that given three points x, y, y ′ answers whether y or y ′ is closer to x. We assume that the similarity order of the dataset satisfies the four variations of the following disorder inequality: if x is the a’th most similar object to y and y is the b’th most similar object to z, then x is among the D(a + b) most similar objects to z, where D is a relatively small disorder constant. Though the oracle gives much less information compared to the standard general metric space model where distance values are given, one can still design very efficient algorithms for various fundamental computational tasks. For nearest neighbor search we present deterministic and exact algorithm with almost linear time and space complexity of preprocessing, and nearlogarithmic time complexity of search. Then, for nearduplicate detection we present the first known deterministic algorithm that requires just nearlinear time + time proportional to the size of output. Finally, we show that for any dataset satisfying the disorder inequality a visibility graph can be constructed: all outdegrees are nearlogarithmic and greedy routing deterministically converges to the nearest neighbor of a target in logarithmic number of steps. The later result is the first known workaround for Navarro’s impossibility of generalizing Delaunay graphs. The technical contribution of the paper consists of handling “false positives ” in data structures and an algorithmic technique upasidedownfilter.