Results 1 - 10
of
55
Cover trees for nearest neighbor
- ICML
, 2006
"... We present a tree data structure for fast nearest neighbor operations in general npoint metric spaces (where the data set consists of n points). The data structure requires O(n) space regardless of the metric’s structure yet maintains all performance properties of a navigating net (Krauthgamer & Lee ..."
Abstract
-
Cited by 85 (0 self)
- Add to MetaCart
We present a tree data structure for fast nearest neighbor operations in general npoint metric spaces (where the data set consists of n points). The data structure requires O(n) space regardless of the metric’s structure yet maintains all performance properties of a navigating net (Krauthgamer & Lee, 2004b). If the point set has a bounded expansion constant c, which is a measure of the intrinsic dimensionality, as defined in (Karger & Ruhl, 2002), the cover tree data structure can be constructed in O � c 6 n log n � time. Furthermore, nearest neighbor queries require time only logarithmic in n, in particular O � c 12 log n � time. Our experimental results show speedups over the brute force search varying between one and several orders of magnitude on natural machine learning datasets. 1.
Nearest-neighbor searching and metric space dimensions
- In Nearest-Neighbor Methods for Learning and Vision: Theory and Practice
, 2006
"... Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distan ..."
Abstract
-
Cited by 63 (0 self)
- Add to MetaCart
Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distance function as a “black box”. The structure is able to speed up nearest neighbor searching in a variety of settings, for example: points in low-dimensional or structured Euclidean space, strings under Hamming and edit distance, and bit vector data from an OCR application. The data structures are observed to need linear space, with a modest constant factor. The preprocessing time needed per site is observed to match the query time. The data structure can be viewed as an application of a “kd-tree ” approach in the metric space setting, using Voronoi regions of a subset in place of axis-aligned boxes. 1
Distance Estimation and Object Location via Rings of Neighbors
- In 24 th Annual ACM Symposium on Principles of Distributed Computing (PODC
, 2005
"... We consider four problems on distance estimation and object location which share the common flavor of capturing global information via informative node labels: low-stretch routing schemes [47], distance labeling [24], searchable small worlds [30], and triangulation-based distance estimation [33]. Fo ..."
Abstract
-
Cited by 49 (3 self)
- Add to MetaCart
We consider four problems on distance estimation and object location which share the common flavor of capturing global information via informative node labels: low-stretch routing schemes [47], distance labeling [24], searchable small worlds [30], and triangulation-based distance estimation [33]. Focusing on metrics of low doubling dimension, we approach these problems with a common technique called rings of neighbors, which refers to a sparse distributed data structure that underlies all our constructions. Apart from improving the previously known bounds for these problems, our contributions include extending Kleinberg’s small world model to doubling metrics, and a short proof of the main result in Chan et al. [14]. Doubling dimension is a notion of dimensionality for general metrics that has recently become a useful algorithmic concept in the theoretical computer science literature. 1
On Hierarchical Routing in Doubling Metrics
, 2005
"... We study the problem of routing in doubling metrics, and show how to perform hierarchical routing in such metrics with small stretch and compact routing tables (i.e., with small amount of routing information stored at each vertex). We say that a metric (X, d) has doubling dimension dim(X) at most α ..."
Abstract
-
Cited by 49 (8 self)
- Add to MetaCart
We study the problem of routing in doubling metrics, and show how to perform hierarchical routing in such metrics with small stretch and compact routing tables (i.e., with small amount of routing information stored at each vertex). We say that a metric (X, d) has doubling dimension dim(X) at most α if every set of diameter D can be covered by 2 α sets of diameter D/2. (A doubling metric is one whose doubling dimension dim(X) is a constant.) We show how to perform (1 + τ)-stretch routing on metrics for any 0 < τ ≤ 1 with routing tables of size at most (α/τ) O(α) log 2 ∆ bits with only (α/τ) O(α) log ∆ entries, where ∆ is the diameter of the graph; hence the number of routing table entries is just τ −O(1) log ∆ for doubling metrics. These results extend and improve on those of Talwar (2004). We also give better constructions of sparse spanners for doubling metrics than those obtained from the routing tables above; for τ> 0, we give algorithms to construct (1 + τ)stretch spanners for a metric (X, d) with maximum degree at most (2 + 1/τ) O(dim(X)) , matching the results of Das et al. for Euclidean metrics.
Distributed Approaches to Triangulation and Embedding
- In Proceedings 16th ACM-SIAM Symposium on Discrete Algorithms (SODA
, 2005
"... A number of recent papers in the networking community study the distance matrix defined by the node-to-node latencies in the Internet and, in particular, provide a number of quite successful distributed approaches that embed this distance into a low-dimensional Euclidean space. In such algorithms it ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
A number of recent papers in the networking community study the distance matrix defined by the node-to-node latencies in the Internet and, in particular, provide a number of quite successful distributed approaches that embed this distance into a low-dimensional Euclidean space. In such algorithms it is feasible to measure distances among only a linear or near-linear number of node pairs; the rest of the distances are simply not available. Moreover, for applications it is desirable to spread the load evenly among the participating nodes. Indeed, several recent studies use this ’fully distributed ’ approach and achieve, empirically, a low distortion for all but a small fraction of node pairs. This is concurrent with the large body of theoretical work on metric embeddings, but there is a fundamental distinction: in the theoretical approaches to metric embeddings, full and centralized access to the distance matrix is assumed and heavily used. In this paper we present the first fully distributed embedding algorithm with provable distortion guarantees for doubling metrics (which have been proposed as a reasonable abstraction of Internet latencies), thus providing some insight into the empirical success of the recent Vivaldi algorithm [7]. The main ingredient of our embedding algorithm is an improved fully distributed algorithm for a more basic problem of triangulation, where the triangle inequality is used to infer the distances that have not been measured; this problem received a considerable attention in the networking community, and has also been studied theoretically in [19]. We use our techniques to extend ɛ-relaxed embeddings and triangulations to infinite metrics and arbitrary measures, and to improve on the approximate distance labeling scheme of Talwar [36]. 1
Metric embeddings with relaxed guarantees
- In Proceedings of the 46th IEEE Symposium on Foundations of Computer Science
, 2005
"... We consider the problem of embedding finite metrics with slack: we seek to produce embeddings with small dimension and distortion while allowing a (small) constant fraction of all distances to be arbitrarily distorted. This definition is motivated by recent research in the networking community, whic ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
We consider the problem of embedding finite metrics with slack: we seek to produce embeddings with small dimension and distortion while allowing a (small) constant fraction of all distances to be arbitrarily distorted. This definition is motivated by recent research in the networking community, which achieved striking empirical success at embedding Internet latencies with low distortion into low-dimensional Euclidean space, provided that some small slack is allowed. Answering an open question of Kleinberg, Slivkins, and Wexler [29], we show that provable guarantees of this type can in fact be achieved in general: any finite metric can be embedded, with constant slack and constant distortion, into constant-dimensional Euclidean space. We then show that there exist stronger embeddings into ℓ1 which exhibit
Searching dynamic point sets in spaces with bounded doubling dimension
- In ACM Symposium on Theory of Computing
, 2006
"... We present a new data structure that facilitates approximate nearest neighbor searches on a dynamic set of points in a metric space that has a bounded doubling dimension. Our data structure has linear size and supports insertions and deletions in O(log n) time, and finds a (1 + ǫ)-approximate neares ..."
Abstract
-
Cited by 21 (5 self)
- Add to MetaCart
We present a new data structure that facilitates approximate nearest neighbor searches on a dynamic set of points in a metric space that has a bounded doubling dimension. Our data structure has linear size and supports insertions and deletions in O(log n) time, and finds a (1 + ǫ)-approximate nearest neighbor in time O(log n) + (1/ǫ) O(1). The search and update times hide multiplicative factors that depend on the doubling dimension; the space does not. These performance times are independent of the aspect ratio (or spread) of the points. Categories and Subject Descriptors: F.2.2 [Nonnumerical Algorithms and Problems]:Sorting and searching, computations on discrete structures; E.1 [Data Structures]:Graphs and networks, trees.
Towards small world emergence
- In Proceedings of 18th ACM Symposium on Parallelism in Algorithms and Architectures
, 2006
"... We investigate the problem of optimizing the routing performances of a virtual network by adding extra random links. Our asynchronous and distributed algorithm ensures, by adding a single extra link per node, that the resulting network is a navigable small world, i.e., in which greedy routing, using ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
We investigate the problem of optimizing the routing performances of a virtual network by adding extra random links. Our asynchronous and distributed algorithm ensures, by adding a single extra link per node, that the resulting network is a navigable small world, i.e., in which greedy routing, using the distance in the original network, computes paths of polylogarithmic length between any pair of nodes with probability 1 − O(1/n). Previously known small world augmentation processes require the global knowledge of the network and centralized computations, which is unrealistic for large decentralized networks. Our algorithm, based on a careful multi-layer sampling of the nodes and the construction of a light overlay network, bypasses these limitations. For bounded growth graphs, i.e., graphs where, for any node u and any radius r the number of nodes within distance 2r from u is at most a constant times the number of nodes within distance r, our augmentation process proceeds with high probability in O(log n log D) communication rounds, with O(log n log D) messages of size O(log n) bits sent per node and requiring only O(log n log D) bit space in each node, where n is the number of nodes, and D the diameter. In particular, with the only knowledge of original distances, greedy routing computes,
Optimal-stretch name-independent compact routing in doubling metrics
- In PODC
, 2006
"... We consider the problem of name-independent routing in doubling metrics. A doubling metric is a metric space whose doubling dimension is a constant, where the doubling dimension of a metric space is the least value α such that any ball of radius r can be covered by at most 2 α balls of radius r/2. G ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
We consider the problem of name-independent routing in doubling metrics. A doubling metric is a metric space whose doubling dimension is a constant, where the doubling dimension of a metric space is the least value α such that any ball of radius r can be covered by at most 2 α balls of radius r/2. Given any δ> 0 and a weighted undirected network G whose shortest path metric d is a doubling metric with doubling dimension α, we present a name-independent routing scheme for G with (9+δ)-stretch, (2+ 1 δ)O(α) (log ∆) 2 (log n)bit routing information at each node, and packet headers of size O(log n), where ∆ is the ratio of the largest to the smallest shortest path distance in G. In addition, we prove that for any ǫ ∈ (0, 8), there is a doubling metric network G with n nodes, doubling dimension α ≤ 6 − log ǫ, and ∆ = O(2 1/ǫ n) such that any name-independent routing scheme on G with routing information at each node of size o(n (ǫ/60)2)-bits has stretch larger than 9 − ǫ. Therefore assuming that ∆ is bounded by a polynomial on n, our algorithm basically achieves optimal stretch for name-independent routing in doubling metrics with packet header size and routing information at each node both bounded by a polylogarithmic function of n.
A doubling dimension threshold Θ(log log n) for augmented graph navigability
- In 14th European Symposium on Algorithm (ESA), LNCS 4168
, 2006
"... Abstract. In his seminal work, Kleinberg showed how to augment meshes using random edges, so that they become navigable; that is, greedy routing computes paths of polylogarithmic expected length between any pairs of nodes. This yields the crucial question of determining wether such an augmentation i ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Abstract. In his seminal work, Kleinberg showed how to augment meshes using random edges, so that they become navigable; that is, greedy routing computes paths of polylogarithmic expected length between any pairs of nodes. This yields the crucial question of determining wether such an augmentation is possible for all graphs. In this paper, we answer negatively to this question by exhibiting a threshold on the doubling dimension, above which an infinite family of graphs cannot be augmented to become navigable whatever the distribution of random edges is. Precisely, it was known that graphs of doubling dimension at most O(log log n) are navigable. We show that for doubling dimension ≫ log log n, an infinite family of graphs cannot be augmented to become navigable. Finally, we complete our result by studying the special case of square meshes, that we prove to always be augmentable to become navigable.

