Results 1  10
of
34
Cover trees for nearest neighbor
 In Proceedings of the 23rd international conference on Machine learning
, 2006
"... ABSTRACT. We present a tree data structure for fast nearest neighbor operations in generalpoint metric spaces. The data structure requires space regardless of the metric’s structure. If the point set has an expansion constant � in the sense of Karger and Ruhl [KR02], the data structure can be const ..."
Abstract

Cited by 186 (0 self)
 Add to MetaCart
(Show Context)
ABSTRACT. We present a tree data structure for fast nearest neighbor operations in generalpoint metric spaces. The data structure requires space regardless of the metric’s structure. If the point set has an expansion constant � in the sense of Karger and Ruhl [KR02], the data structure can be constructed in � time. Nearest neighbor queries obeying the expansion bound require � time. In addition, the nearest neighbor of points can be queried in time. We experimentally test the algorithm showing speedups over the brute force search varying between 1 and 2000 on natural machine learning datasets. 1.
Meridian: A Lightweight Network Location Service without Virtual Coordinates
 In SIGCOMM
, 2005
"... This paper introduces a lightweight, scalable and accurate framework, called Meridian, for performing node selection based on network location. The framework consists of an overlay network structured around multiresolution rings, query routing with direct measurements, and gossip protocols for diss ..."
Abstract

Cited by 178 (8 self)
 Add to MetaCart
This paper introduces a lightweight, scalable and accurate framework, called Meridian, for performing node selection based on network location. The framework consists of an overlay network structured around multiresolution rings, query routing with direct measurements, and gossip protocols for dissemination. We show how this framework can be used to address three commonly encountered problems, namely, closest node discovery, central leader election, and locating nodes that satisfy target latency constraints in largescale distributed systems without having to compute absolute coordinates. We show analytically that the framework is scalable with logarithmic convergence when Internet latencies are modeled as a growthconstrained metric, a lowdimensional Euclidean metric, or a metric of low doubling dimension. Large scale simulations, based on latency measurements from 6.25 million nodepairs as well as an implementation deployed on PlanetLab show that the framework is accurate and effective.
Fast construction of nets in lowdimensional metrics and their applications
 SIAM Journal on Computing
, 2006
"... We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This datastructure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, wellseparated pair decomposition, s ..."
Abstract

Cited by 120 (13 self)
 Add to MetaCart
(Show Context)
We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This datastructure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, wellseparated pair decomposition, spanner construction, compact representation scheme, doubling measure, and computation of the (approximate) Lipschitz constant of a function. In all cases, the running (preprocessing) time is near linear and the space being used is linear. 1
Searching dynamic point sets in spaces with bounded doubling dimension
 In The thirtyeighth annual ACM symposium on Theory of computing (STOC
, 2006
"... We present a new data structure that facilitates approximate nearest neighbor searches on a dynamic set of points in a metric space that has a bounded doubling dimension. Our data structure has linear size and supports insertions and deletions in O(log n) time, and finds a (1 + ɛ)approximate neares ..."
Abstract

Cited by 41 (14 self)
 Add to MetaCart
We present a new data structure that facilitates approximate nearest neighbor searches on a dynamic set of points in a metric space that has a bounded doubling dimension. Our data structure has linear size and supports insertions and deletions in O(log n) time, and finds a (1 + ɛ)approximate nearest neighbor in time O(log n) +(1/ɛ) O(1). The search and update times hide multiplicative factors that depend on the doubling dimension; the space does not. These performance times are independent of the aspect ratio (or spread) of the points. Categories and Subject Descriptors: F.2.2 [Nonnumerical Algorithms and Problems]:Sorting and searching, computations on discrete structures; E.1 [Data Structures]:Graphs and networks, trees.
Distributed approaches to triangulation and embedding
 Proceedings of the Sixteenth Annual ACMSIAM Symposium on Discrete Algorithms (SODA 2005
"... A number of recent papers in the networking community study the distance matrix defined by the nodetonode latencies in the Internet and, in particular, provide a number of quite successful distributed approaches that embed this distance into a lowdimensional Euclidean space. In such algorithms i ..."
Abstract

Cited by 30 (7 self)
 Add to MetaCart
(Show Context)
A number of recent papers in the networking community study the distance matrix defined by the nodetonode latencies in the Internet and, in particular, provide a number of quite successful distributed approaches that embed this distance into a lowdimensional Euclidean space. In such algorithms it is feasible to measure distances among only a linear or nearlinear number of node pairs; the rest of the distances are simply not available. Moreover, for applications it is desirable to spread the load evenly among the participating nodes. Indeed, several recent studies use this 'fully distributed ' approach and achieve, empirically, a low distortion for all but a small fraction of node pairs. This is concurrent with the large body of theoretical work on metric embeddings, but there is a fundamental distinction: in the theoretical pproaches tometric embeddings, full and centralized access to the distance matrix is assumed and heavily used. In this paper we present the first fully distributed embedding algorithm with provable distortion guarantees for doubling metrics (which have been proposed as a reasonable abstraction of Internet latencies), thus providing some insight into the empirical success of the recent VivaMi algorithm [5]. The main ingredient of our embedding algorithm is an improved fully distributed algorithm for a more basic problem of triangulation, where the triangle inequality is used to infer the distances that have not been measured; this problem received a considerable attention in the networking community, and has also been studied theoretically in [19]. We use our techniques to extend erelaxed embeddings and triangulations toinfinite metrics and arbitrary measures, and to improve on the approximate distance labeling scheme of Talwar [33]. I
Small hopdiameter sparse spanners for doubling metrics
 In SODA ’06: Proceedings of the seventeenth annual ACMSIAM symposium on Discrete algorithm
, 2006
"... Given a metric M = (V, d), a graph G = (V, E) is a tspanner for M if every pair of nodes in V has a “short ” path (i.e., of length at most t times their actual distance) between them in the spanner. Furthermore, this spanner has a hop diameter bounded by D if every such short path also uses at most ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
(Show Context)
Given a metric M = (V, d), a graph G = (V, E) is a tspanner for M if every pair of nodes in V has a “short ” path (i.e., of length at most t times their actual distance) between them in the spanner. Furthermore, this spanner has a hop diameter bounded by D if every such short path also uses at most D edges. We consider the problem of constructing sparse (1 + ε)spanners with small hop diameter for metrics of low doubling dimension. In this paper, we show that given any metric with constant doubling dimension k, and any 0 < ε < 1, one can find a (1 + ε)spanner for the metric with nearly linear number of edges (i.e., only O(n log ∗ n + nε −O(k)) edges) and a constant hop diameter, and also a (1 + ε)spanner with linear number of edges (i.e., only nε −O(k) edges) which achieves a hop diameter that grows like the functional inverse of the Ackermann’s function. Moreover, we prove that such tradeoffs between the number of edges and the hop diameter are asymptotically optimal. 1
Disorder inequality: A combinatorial approach to nearest neighbor search
 In WSDM’08
"... We say that an algorithm for nearest neighbor search is combinatorial if only direct comparisons between two pairwise similarity values are allowed. Combinatorial algorithms for nearest neighbor search have two important advantages: (1) they do not map similarity values to artificial distance values ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
(Show Context)
We say that an algorithm for nearest neighbor search is combinatorial if only direct comparisons between two pairwise similarity values are allowed. Combinatorial algorithms for nearest neighbor search have two important advantages: (1) they do not map similarity values to artificial distance values and do not use the triangle inequality for the latter, and (2) they work for arbitrarily complicated data representations and similarity functions. In this paper we introduce a special property of the similarity function on a set S that leads to efficient combinatorial algorithms for S. The disorder constant D(S) of a set S is defined to ensure the following inequality: if x is the a’th most similar object to z and y is the b’th most similar object to z, then x is among the D(S) · (a + b) most similar objects to y. Assuming that disorder is small we present the first two known combinatorial algorithms for nearest neighbors whose query time has logarithmic dependence on the size of S. The first one, called Ranwalk, is a randomized zeroerror algorithm that always returns the exact nearest neighbor. It uses space quadratic in the input size in preprocessing, but is very efficient in query processing. The second algorithm, called Arwalk, uses nearlinear space. It uses random choices in preprocessing, but the query processing is essentially deterministic. For an arbitrary query q, there is only a small probability that the chosen data structure does not support q. Finally, we show that for the Reuters corpus average disorder is indeed quite small and that Ranwalk efficiently computes the nearest neighbor in most cases.
Lower bounds on near neighbor search via metric expansion
 CoRR
"... In this paper we show how the complexity of performing nearest neighbor (NNS) search on a metric space is related to the expansion of the metric space. Given a metric space we look at the graph obtained by connecting every pair of points within a certain distance r. We then look at various notions o ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
(Show Context)
In this paper we show how the complexity of performing nearest neighbor (NNS) search on a metric space is related to the expansion of the metric space. Given a metric space we look at the graph obtained by connecting every pair of points within a certain distance r. We then look at various notions of expansion in this graph relating them to the cell probe complexity of NNS for randomized and deterministic, exact and approximate algorithms. For example if the graph has node expansion Φ then we show that any deterministic tprobe data structure for n points must use space S where (St/n)t> Φ. We show similar results for randomized algorithms as well. These relationships can be used to derive most of the known lower bounds in the well known metric spaces such as l1, l2, l ∞ by simply computing their expansion. In the process, we strengthen and generalize our previous results [19]. Additionally, we unify the approach in [19] and the communication complexity based approach. Our work reduces the problem of proving cell probe lower bounds of near neighbor search to computing the appropriate expansion parameter. In our results, as in all previous results, the dependence on t is weak; that is, the bound drops exponentially in t. We show a much stronger (tight) timespace tradeoff for the class of dynamic low contention data structures. These are data structures that supports updates in the data set and that do not look up any single cell too often. 1 1
Combinatorial algorithms for nearest neighbors, nearduplicates and smallworld design
 In Proceedings of the 20th Annual ACMSIAM Symposium on Discrete Algorithms, SODA’09
, 2009
"... We study the so called combinatorial framework for algorithmic problems in similarity spaces. Namely, the input dataset is represented by a comparison oracle that given three points x, y, y ′ answers whether y or y ′ is closer to x. We assume that the similarity order of the dataset satisfies the fo ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
We study the so called combinatorial framework for algorithmic problems in similarity spaces. Namely, the input dataset is represented by a comparison oracle that given three points x, y, y ′ answers whether y or y ′ is closer to x. We assume that the similarity order of the dataset satisfies the four variations of the following disorder inequality: if x is the a’th most similar object to y and y is the b’th most similar object to z, then x is among the D(a + b) most similar objects to z, where D is a relatively small disorder constant. Though the oracle gives much less information compared to the standard general metric space model where distance values are given, one can still design very efficient algorithms for various fundamental computational tasks. For nearest neighbor search we present deterministic and exact algorithm with almost linear time and space complexity of preprocessing, and nearlogarithmic time complexity of search. Then, for nearduplicate detection we present the first known deterministic algorithm that requires just nearlinear time + time proportional to the size of output. Finally, we show that for any dataset satisfying the disorder inequality a visibility graph can be constructed: all outdegrees are nearlogarithmic and greedy routing deterministically converges to the nearest neighbor of a target in logarithmic number of steps. The later result is the first known workaround for Navarro’s impossibility of generalizing Delaunay graphs. The technical contribution of the paper consists of handling “false positives ” in data structures and an algorithmic technique upasidedownfilter.
A QPTAS for TSP with Fat Weakly Disjoint Neighborhoods in Doubling Metrics
"... We consider the Traveling Salesman Problem with Neighborhoods (TSPN) in doubling metrics. The goal is to find a shortest tour that visits each of a collection of ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
We consider the Traveling Salesman Problem with Neighborhoods (TSPN) in doubling metrics. The goal is to find a shortest tour that visits each of a collection of