Results 1  10
of
79
Cover trees for nearest neighbor
 In Proceedings of the 23rd international conference on Machine learning
, 2006
"... ABSTRACT. We present a tree data structure for fast nearest neighbor operations in generalpoint metric spaces. The data structure requires space regardless of the metric’s structure. If the point set has an expansion constant � in the sense of Karger and Ruhl [KR02], the data structure can be const ..."
Abstract

Cited by 139 (0 self)
 Add to MetaCart
ABSTRACT. We present a tree data structure for fast nearest neighbor operations in generalpoint metric spaces. The data structure requires space regardless of the metric’s structure. If the point set has an expansion constant � in the sense of Karger and Ruhl [KR02], the data structure can be constructed in � time. Nearest neighbor queries obeying the expansion bound require � time. In addition, the nearest neighbor of points can be queried in time. We experimentally test the algorithm showing speedups over the brute force search varying between 1 and 2000 on natural machine learning datasets. 1.
Nearestneighbor searching and metric space dimensions
 In NearestNeighbor Methods for Learning and Vision: Theory and Practice
, 2006
"... Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distan ..."
Abstract

Cited by 87 (0 self)
 Add to MetaCart
Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distance function as a “black box”. The structure is able to speed up nearest neighbor searching in a variety of settings, for example: points in lowdimensional or structured Euclidean space, strings under Hamming and edit distance, and bit vector data from an OCR application. The data structures are observed to need linear space, with a modest constant factor. The preprocessing time needed per site is observed to match the query time. The data structure can be viewed as an application of a “kdtree ” approach in the metric space setting, using Voronoi regions of a subset in place of axisaligned boxes. 1
Distance Estimation and Object Location via Rings of Neighbors
 In 24 th Annual ACM Symposium on Principles of Distributed Computing (PODC
, 2005
"... We consider four problems on distance estimation and object location which share the common flavor of capturing global information via informative node labels: lowstretch routing schemes [47], distance labeling [24], searchable small worlds [30], and triangulationbased distance estimation [33]. Fo ..."
Abstract

Cited by 64 (4 self)
 Add to MetaCart
We consider four problems on distance estimation and object location which share the common flavor of capturing global information via informative node labels: lowstretch routing schemes [47], distance labeling [24], searchable small worlds [30], and triangulationbased distance estimation [33]. Focusing on metrics of low doubling dimension, we approach these problems with a common technique called rings of neighbors, which refers to a sparse distributed data structure that underlies all our constructions. Apart from improving the previously known bounds for these problems, our contributions include extending Kleinberg’s small world model to doubling metrics, and a short proof of the main result in Chan et al. [14]. Doubling dimension is a notion of dimensionality for general metrics that has recently become a useful algorithmic concept in the theoretical computer science literature. 1
On Hierarchical Routing in Doubling Metrics
, 2005
"... We study the problem of routing in doubling metrics, and show how to perform hierarchical routing in such metrics with small stretch and compact routing tables (i.e., with small amount of routing information stored at each vertex). We say that a metric (X, d) has doubling dimension dim(X) at most α ..."
Abstract

Cited by 58 (8 self)
 Add to MetaCart
We study the problem of routing in doubling metrics, and show how to perform hierarchical routing in such metrics with small stretch and compact routing tables (i.e., with small amount of routing information stored at each vertex). We say that a metric (X, d) has doubling dimension dim(X) at most α if every set of diameter D can be covered by 2 α sets of diameter D/2. (A doubling metric is one whose doubling dimension dim(X) is a constant.) We show how to perform (1 + τ)stretch routing on metrics for any 0 < τ ≤ 1 with routing tables of size at most (α/τ) O(α) log 2 ∆ bits with only (α/τ) O(α) log ∆ entries, where ∆ is the diameter of the graph; hence the number of routing table entries is just τ −O(1) log ∆ for doubling metrics. These results extend and improve on those of Talwar (2004). We also give better constructions of sparse spanners for doubling metrics than those obtained from the routing tables above; for τ> 0, we give algorithms to construct (1 + τ)stretch spanners for a metric (X, d) with maximum degree at most (2 + 1/τ) O(dim(X)) , matching the results of Das et al. for Euclidean metrics.
Distributed Approaches to Triangulation and Embedding
 In Proceedings 16th ACMSIAM Symposium on Discrete Algorithms (SODA
, 2005
"... A number of recent papers in the networking community study the distance matrix defined by the nodetonode latencies in the Internet and, in particular, provide a number of quite successful distributed approaches that embed this distance into a lowdimensional Euclidean space. In such algorithms it ..."
Abstract

Cited by 30 (6 self)
 Add to MetaCart
A number of recent papers in the networking community study the distance matrix defined by the nodetonode latencies in the Internet and, in particular, provide a number of quite successful distributed approaches that embed this distance into a lowdimensional Euclidean space. In such algorithms it is feasible to measure distances among only a linear or nearlinear number of node pairs; the rest of the distances are simply not available. Moreover, for applications it is desirable to spread the load evenly among the participating nodes. Indeed, several recent studies use this ’fully distributed ’ approach and achieve, empirically, a low distortion for all but a small fraction of node pairs. This is concurrent with the large body of theoretical work on metric embeddings, but there is a fundamental distinction: in the theoretical approaches to metric embeddings, full and centralized access to the distance matrix is assumed and heavily used. In this paper we present the first fully distributed embedding algorithm with provable distortion guarantees for doubling metrics (which have been proposed as a reasonable abstraction of Internet latencies), thus providing some insight into the empirical success of the recent Vivaldi algorithm [7]. The main ingredient of our embedding algorithm is an improved fully distributed algorithm for a more basic problem of triangulation, where the triangle inequality is used to infer the distances that have not been measured; this problem received a considerable attention in the networking community, and has also been studied theoretically in [19]. We use our techniques to extend ɛrelaxed embeddings and triangulations to infinite metrics and arbitrary measures, and to improve on the approximate distance labeling scheme of Talwar [36]. 1
Searching dynamic point sets in spaces with bounded doubling dimension
 In The thirtyeighth annual ACM symposium on Theory of computing (STOC
, 2006
"... We present a new data structure that facilitates approximate nearest neighbor searches on a dynamic set of points in a metric space that has a bounded doubling dimension. Our data structure has linear size and supports insertions and deletions in O(log n) time, and finds a (1 + ɛ)approximate neares ..."
Abstract

Cited by 29 (8 self)
 Add to MetaCart
We present a new data structure that facilitates approximate nearest neighbor searches on a dynamic set of points in a metric space that has a bounded doubling dimension. Our data structure has linear size and supports insertions and deletions in O(log n) time, and finds a (1 + ɛ)approximate nearest neighbor in time O(log n) +(1/ɛ) O(1). The search and update times hide multiplicative factors that depend on the doubling dimension; the space does not. These performance times are independent of the aspect ratio (or spread) of the points. Categories and Subject Descriptors: F.2.2 [Nonnumerical Algorithms and Problems]:Sorting and searching, computations on discrete structures; E.1 [Data Structures]:Graphs and networks, trees.
Metric embeddings with relaxed guarantees
 IN PROCEEDINGS OF THE 46TH IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE
, 2005
"... We consider the problem of embedding finite metrics with slack: we seek to produce embeddings with small dimension and distortion while allowing a (small) constant fraction of all distances to be arbitrarily distorted. This definition is motivated by recent research in the networking community, whic ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
We consider the problem of embedding finite metrics with slack: we seek to produce embeddings with small dimension and distortion while allowing a (small) constant fraction of all distances to be arbitrarily distorted. This definition is motivated by recent research in the networking community, which achieved striking empirical success at embedding Internet latencies with low distortion into lowdimensional Euclidean space, provided that some small slack is allowed. Answering an open question of Kleinberg, Slivkins, and Wexler [29], we show that provable guarantees of this type can in fact be achieved in general: any finite metric can be embedded, with constant slack and constant distortion, into constantdimensional Euclidean space. We then show that there exist stronger embeddings into ℓ1 which exhibit
Optimalstretch nameindependent compact routing in doubling metrics
 In PODC
, 2006
"... We consider the problem of nameindependent routing in doubling metrics. A doubling metric is a metric space whose doubling dimension is a constant, where the doubling dimension of a metric space is the least value α such that any ball of radius r can be covered by at most 2 α balls of radius r/2. G ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
We consider the problem of nameindependent routing in doubling metrics. A doubling metric is a metric space whose doubling dimension is a constant, where the doubling dimension of a metric space is the least value α such that any ball of radius r can be covered by at most 2 α balls of radius r/2. Given any δ> 0 and a weighted undirected network G whose shortest path metric d is a doubling metric with doubling dimension α, we present a nameindependent routing scheme for G with (9+δ)stretch, (2+ 1 δ)O(α) (log ∆) 2 (log n)bit routing information at each node, and packet headers of size O(log n), where ∆ is the ratio of the largest to the smallest shortest path distance in G. In addition, we prove that for any ǫ ∈ (0, 8), there is a doubling metric network G with n nodes, doubling dimension α ≤ 6 − log ǫ, and ∆ = O(2 1/ǫ n) such that any nameindependent routing scheme on G with routing information at each node of size o(n (ǫ/60)2)bits has stretch larger than 9 − ǫ. Therefore assuming that ∆ is bounded by a polynomial on n, our algorithm basically achieves optimal stretch for nameindependent routing in doubling metrics with packet header size and routing information at each node both bounded by a polylogarithmic function of n.
Small hopdiameter sparse spanners for doubling metrics
 In SODA ’06: Proceedings of the seventeenth annual ACMSIAM symposium on Discrete algorithm
, 2006
"... Given a metric M = (V, d), a graph G = (V, E) is a tspanner for M if every pair of nodes in V has a “short ” path (i.e., of length at most t times their actual distance) between them in the spanner. Furthermore, this spanner has a hop diameter bounded by D if every such short path also uses at most ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
Given a metric M = (V, d), a graph G = (V, E) is a tspanner for M if every pair of nodes in V has a “short ” path (i.e., of length at most t times their actual distance) between them in the spanner. Furthermore, this spanner has a hop diameter bounded by D if every such short path also uses at most D edges. We consider the problem of constructing sparse (1 + ε)spanners with small hop diameter for metrics of low doubling dimension. In this paper, we show that given any metric with constant doubling dimension k, and any 0 < ε < 1, one can find a (1 + ε)spanner for the metric with nearly linear number of edges (i.e., only O(n log ∗ n + nε −O(k)) edges) and a constant hop diameter, and also a (1 + ε)spanner with linear number of edges (i.e., only nε −O(k) edges) which achieves a hop diameter that grows like the functional inverse of the Ackermann’s function. Moreover, we prove that such tradeoffs between the number of edges and the hop diameter are asymptotically optimal. 1
Towards small world emergence
 In Proceedings of 18th ACM Symposium on Parallelism in Algorithms and Architectures
, 2006
"... We investigate the problem of optimizing the routing performances of a virtual network by adding extra random links. Our asynchronous and distributed algorithm ensures, by adding a single extra link per node, that the resulting network is a navigable small world, i.e., in which greedy routing, using ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
We investigate the problem of optimizing the routing performances of a virtual network by adding extra random links. Our asynchronous and distributed algorithm ensures, by adding a single extra link per node, that the resulting network is a navigable small world, i.e., in which greedy routing, using the distance in the original network, computes paths of polylogarithmic length between any pair of nodes with probability 1 − O(1/n). Previously known small world augmentation processes require the global knowledge of the network and centralized computations, which is unrealistic for large decentralized networks. Our algorithm, based on a careful multilayer sampling of the nodes and the construction of a light overlay network, bypasses these limitations. For bounded growth graphs, i.e., graphs where, for any node u and any radius r the number of nodes within distance 2r from u is at most a constant times the number of nodes within distance r, our augmentation process proceeds with high probability in O(log n log D) communication rounds, with O(log n log D) messages of size O(log n) bits sent per node and requiring only O(log n log D) bit space in each node, where n is the number of nodes, and D the diameter. In particular, with the only knowledge of original distances, greedy routing computes,