Results 1  10
of
116
Cover trees for nearest neighbor
 In Proceedings of the 23rd international conference on Machine learning
, 2006
"... ABSTRACT. We present a tree data structure for fast nearest neighbor operations in generalpoint metric spaces. The data structure requires space regardless of the metric’s structure. If the point set has an expansion constant � in the sense of Karger and Ruhl [KR02], the data structure can be const ..."
Abstract

Cited by 139 (0 self)
 Add to MetaCart
ABSTRACT. We present a tree data structure for fast nearest neighbor operations in generalpoint metric spaces. The data structure requires space regardless of the metric’s structure. If the point set has an expansion constant � in the sense of Karger and Ruhl [KR02], the data structure can be constructed in � time. Nearest neighbor queries obeying the expansion bound require � time. In addition, the nearest neighbor of points can be queried in time. We experimentally test the algorithm showing speedups over the brute force search varying between 1 and 2000 on natural machine learning datasets. 1.
Meridian: A Lightweight Network Location Service without Virtual Coordinates
 In SIGCOMM
, 2005
"... This paper introduces a lightweight, scalable and accurate framework, called Meridian, for performing node selection based on network location. The framework consists of an overlay network structured around multiresolution rings, query routing with direct measurements, and gossip protocols for diss ..."
Abstract

Cited by 139 (7 self)
 Add to MetaCart
This paper introduces a lightweight, scalable and accurate framework, called Meridian, for performing node selection based on network location. The framework consists of an overlay network structured around multiresolution rings, query routing with direct measurements, and gossip protocols for dissemination. We show how this framework can be used to address three commonly encountered problems, namely, closest node discovery, central leader election, and locating nodes that satisfy target latency constraints in largescale distributed systems without having to compute absolute coordinates. We show analytically that the framework is scalable with logarithmic convergence when Internet latencies are modeled as a growthconstrained metric, a lowdimensional Euclidean metric, or a metric of low doubling dimension. Large scale simulations, based on latency measurements from 6.25 million nodepairs as well as an implementation deployed on PlanetLab show that the framework is accurate and effective.
Fast construction of nets in lowdimensional metrics and their applications
 SIAM Journal on Computing
, 2006
"... We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This datastructure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, wellseparated pair decomposition, s ..."
Abstract

Cited by 98 (10 self)
 Add to MetaCart
We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This datastructure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, wellseparated pair decomposition, spanner construction, compact representation scheme, doubling measure, and computation of the (approximate) Lipschitz constant of a function. In all cases, the running (preprocessing) time is near linear and the space being used is linear. 1
Nearoptimal sensor placements: Maximizing information while minimizing communication cost
 In IPSN
, 2006
"... When monitoring spatial phenomena with wireless sensor networks, selecting the best sensor placements is a fundamental task. Not only should the sensors be informative, but they should also be able to communicate efficiently. In this paper, we present a datadriven approach that addresses the three ..."
Abstract

Cited by 89 (16 self)
 Add to MetaCart
When monitoring spatial phenomena with wireless sensor networks, selecting the best sensor placements is a fundamental task. Not only should the sensors be informative, but they should also be able to communicate efficiently. In this paper, we present a datadriven approach that addresses the three central aspects of this problem: measuring the predictive quality of a set of sensor locations (regardless of whether sensors were ever placed at these locations), predicting the communication cost involved with these placements, and designing an algorithm with provable quality guarantees that optimizes the NPhard tradeoff. Specifically, we use data from a pilot deployment to build nonparametric probabilistic models called Gaussian Processes (GPs) both for the spatial phenomena of interest and for the spatial variability of link qualities, which allows us to estimate predictive power and communication cost of unsensed locations. Surprisingly, uncertainty in the representation of link qualities plays an important role in estimating communication costs. Using these models, we present a novel, polynomialtime, datadriven algorithm, pSPIEL, which selects Sensor Placements at Informative and costEffective Locations. Our approach exploits two important properties of this problem: submodularity, formalizing the intuition that adding a node to a small deployment can help more than adding a node to a large deployment; and locality, under which nodes that are far from each other provide almost independent information. Exploiting these properties, we prove strong approximation guarantees for our pSPIEL approach. We also provide extensive experimental validation of this practical approach on several realworld placement problems, and built a complete system implementation on 46 Tmote Sky motes, demonstrating significant advantages over existing methods.
Measured descent: A new embedding method for finite metrics
 In Proc. 45th FOCS
, 2004
"... We devise a new embedding technique, which we call measured descent, based on decomposing a metric space locally, at varying speeds, according to the density of some probability measure. This provides a refined and unified framework for the two primary methods of constructing Fréchet embeddings for ..."
Abstract

Cited by 84 (26 self)
 Add to MetaCart
We devise a new embedding technique, which we call measured descent, based on decomposing a metric space locally, at varying speeds, according to the density of some probability measure. This provides a refined and unified framework for the two primary methods of constructing Fréchet embeddings for finite metrics, due to [Bourgain, 1985] and [Rao, 1999]. We prove that any npoint metric space (X, d) embeds in Hilbert space with distortion O ( √ αX · log n), where αX is a geometric estimate on the decomposability of X. As an immediate corollary, we obtain an O ( √ (log λX)log n) distortion embedding, where λX is the doubling constant of X. Since λX ≤ n, this result recovers Bourgain’s theorem, but when the metric X is, in a sense, “lowdimensional, ” improved bounds are achieved. Our embeddings are volumerespecting for subsets of arbitrary size. One consequence is the existence of (k, O(log n)) volumerespecting embeddings for all 1 ≤ k ≤ n, which is the best possible, and answers positively a question posed by U. Feige. Our techniques are also used to answer positively a question of Y. Rabinovich, showing that any weighted npoint planar graph O(log n) embeds in ℓ∞ with O(1) distortion. The O(log n) bound on the dimension is optimal, and improves upon the previously known bound of O((log n) 2). 1
Complex Networks and Decentralized Search Algorithms
 In Proceedings of the International Congress of Mathematicians (ICM
, 2006
"... The study of complex networks has emerged over the past several years as a theme spanning many disciplines, ranging from mathematics and computer science to the social and biological sciences. A significant amount of recent work in this area has focused on the development of random graph models that ..."
Abstract

Cited by 73 (1 self)
 Add to MetaCart
The study of complex networks has emerged over the past several years as a theme spanning many disciplines, ranging from mathematics and computer science to the social and biological sciences. A significant amount of recent work in this area has focused on the development of random graph models that capture some of the qualitative properties observed in largescale network data; such models have the potential to help us reason, at a general level, about the ways in which realworld networks are organized. We survey one particular line of network research, concerned with smallworld phenomena and decentralized search algorithms, that illustrates this style of analysis. We begin by describing a wellknown experiment that provided the first empirical basis for the "six degrees of separation" phenomenon in social networks; we then discuss some probabilistic network models motivated by this work, illustrating how these models lead to novel algorithmic and graphtheoretic questions, and how they are supported by recent empirical studies of large social networks.
Bypassing the embedding: Algorithms for lowdimensional metrics
 In Proceedings of the 36th ACM Symposium on the Theory of Computing (STOC
, 2004
"... The doubling dimension of a metric is the smallest k such that any ball of radius 2r can be covered using 2 k balls of radius r. This concept for abstract metrics has been proposed as a natural analog to the dimension of a Euclidean space. If we could embed metrics with low doubling dimension into l ..."
Abstract

Cited by 65 (4 self)
 Add to MetaCart
The doubling dimension of a metric is the smallest k such that any ball of radius 2r can be covered using 2 k balls of radius r. This concept for abstract metrics has been proposed as a natural analog to the dimension of a Euclidean space. If we could embed metrics with low doubling dimension into low dimensional Euclidean spaces, they would inherit several algorithmic and structural properties of the Euclidean spaces. Unfortunately however, such a restriction on dimension does not suffice to guarantee embeddibility in a normed space. In this paper we explore the option of bypassing the embedding. In particular we show the following for low dimensional metrics: • Quasipolynomial time (1+ɛ)approximation algorithm for various optimization problems such as TSP, kmedian and facility location. • (1 + ɛ)approximate distance labeling scheme with optimal label length. • (1+ɛ)stretch polylogarithmic storage routing scheme.
Distance Estimation and Object Location via Rings of Neighbors
 In 24 th Annual ACM Symposium on Principles of Distributed Computing (PODC
, 2005
"... We consider four problems on distance estimation and object location which share the common flavor of capturing global information via informative node labels: lowstretch routing schemes [47], distance labeling [24], searchable small worlds [30], and triangulationbased distance estimation [33]. Fo ..."
Abstract

Cited by 64 (4 self)
 Add to MetaCart
We consider four problems on distance estimation and object location which share the common flavor of capturing global information via informative node labels: lowstretch routing schemes [47], distance labeling [24], searchable small worlds [30], and triangulationbased distance estimation [33]. Focusing on metrics of low doubling dimension, we approach these problems with a common technique called rings of neighbors, which refers to a sparse distributed data structure that underlies all our constructions. Apart from improving the previously known bounds for these problems, our contributions include extending Kleinberg’s small world model to doubling metrics, and a short proof of the main result in Chan et al. [14]. Doubling dimension is a notion of dimensionality for general metrics that has recently become a useful algorithmic concept in the theoretical computer science literature. 1
Routing in networks with low doubling dimension
 In 26 th International Conference on Distributed Computing Systems (ICDCS). IEEE Computer
, 2006
"... This paper studies compact routing schemes for networks with low doubling dimension. Two variants are explored, nameindependent routing and labeled routing. The key results obtained for this model are the following. First, we provide the first nameindependent solution. Specifically, we achieve con ..."
Abstract

Cited by 63 (8 self)
 Add to MetaCart
This paper studies compact routing schemes for networks with low doubling dimension. Two variants are explored, nameindependent routing and labeled routing. The key results obtained for this model are the following. First, we provide the first nameindependent solution. Specifically, we achieve constant stretch and polylogarithmic storage. Second, we obtain the first truly scalefree solutions, namely, the network’s aspect ratio is not a factor in the stretch. Scalefree schemes are given for three problem models: nameindependent routing on graphs, labeled routing on metric spaces, and labeled routing on graphs. Third, we prove a lower bound requiring linear storage for stretch < 3 schemes. This has the important ramification of separating for the first time the nameindependent problem model from the labeled model for these networks, since compact stretch1+ε labeled schemes are known to be possible.
On Hierarchical Routing in Doubling Metrics
, 2005
"... We study the problem of routing in doubling metrics, and show how to perform hierarchical routing in such metrics with small stretch and compact routing tables (i.e., with small amount of routing information stored at each vertex). We say that a metric (X, d) has doubling dimension dim(X) at most α ..."
Abstract

Cited by 58 (8 self)
 Add to MetaCart
We study the problem of routing in doubling metrics, and show how to perform hierarchical routing in such metrics with small stretch and compact routing tables (i.e., with small amount of routing information stored at each vertex). We say that a metric (X, d) has doubling dimension dim(X) at most α if every set of diameter D can be covered by 2 α sets of diameter D/2. (A doubling metric is one whose doubling dimension dim(X) is a constant.) We show how to perform (1 + τ)stretch routing on metrics for any 0 < τ ≤ 1 with routing tables of size at most (α/τ) O(α) log 2 ∆ bits with only (α/τ) O(α) log ∆ entries, where ∆ is the diameter of the graph; hence the number of routing table entries is just τ −O(1) log ∆ for doubling metrics. These results extend and improve on those of Talwar (2004). We also give better constructions of sparse spanners for doubling metrics than those obtained from the routing tables above; for τ> 0, we give algorithms to construct (1 + τ)stretch spanners for a metric (X, d) with maximum degree at most (2 + 1/τ) O(dim(X)) , matching the results of Das et al. for Euclidean metrics.