Results 1  10
of
209
Cover trees for nearest neighbor
 In Proceedings of the 23rd international conference on Machine learning
, 2006
"... ABSTRACT. We present a tree data structure for fast nearest neighbor operations in generalpoint metric spaces. The data structure requires space regardless of the metric’s structure. If the point set has an expansion constant � in the sense of Karger and Ruhl [KR02], the data structure can be const ..."
Abstract

Cited by 212 (0 self)
 Add to MetaCart
(Show Context)
ABSTRACT. We present a tree data structure for fast nearest neighbor operations in generalpoint metric spaces. The data structure requires space regardless of the metric’s structure. If the point set has an expansion constant � in the sense of Karger and Ruhl [KR02], the data structure can be constructed in � time. Nearest neighbor queries obeying the expansion bound require � time. In addition, the nearest neighbor of points can be queried in time. We experimentally test the algorithm showing speedups over the brute force search varying between 1 and 2000 on natural machine learning datasets. 1.
Meridian: A Lightweight Network Location Service without Virtual Coordinates
 In SIGCOMM
, 2005
"... This paper introduces a lightweight, scalable and accurate framework, called Meridian, for performing node selection based on network location. The framework consists of an overlay network structured around multiresolution rings, query routing with direct measurements, and gossip protocols for diss ..."
Abstract

Cited by 187 (8 self)
 Add to MetaCart
(Show Context)
This paper introduces a lightweight, scalable and accurate framework, called Meridian, for performing node selection based on network location. The framework consists of an overlay network structured around multiresolution rings, query routing with direct measurements, and gossip protocols for dissemination. We show how this framework can be used to address three commonly encountered problems, namely, closest node discovery, central leader election, and locating nodes that satisfy target latency constraints in largescale distributed systems without having to compute absolute coordinates. We show analytically that the framework is scalable with logarithmic convergence when Internet latencies are modeled as a growthconstrained metric, a lowdimensional Euclidean metric, or a metric of low doubling dimension. Large scale simulations, based on latency measurements from 6.25 million nodepairs as well as an implementation deployed on PlanetLab show that the framework is accurate and effective.
Navigating nets: Simple algorithms for proximity search (Extended Abstract)
, 2004
"... Robert Krauthgamer # James R. Lee + Abstract We present a simple deterministic data structure for maintaining a set S of points in a general metric space, while supporting proximity search (nearest neighbor and range queries) and updates to S (insertions and deletions). Our data structure consists ..."
Abstract

Cited by 154 (18 self)
 Add to MetaCart
(Show Context)
Robert Krauthgamer # James R. Lee + Abstract We present a simple deterministic data structure for maintaining a set S of points in a general metric space, while supporting proximity search (nearest neighbor and range queries) and updates to S (insertions and deletions). Our data structure consists of a sequence of progressively finer #nets of S, with pointers that allow us to navigate easily from one scale to the next.
Nearoptimal sensor placements: Maximizing information while minimizing communication cost
 In IPSN
, 2006
"... When monitoring spatial phenomena with wireless sensor networks, selecting the best sensor placements is a fundamental task. Not only should the sensors be informative, but they should also be able to communicate efficiently. In this paper, we present a datadriven approach that addresses the three ..."
Abstract

Cited by 149 (19 self)
 Add to MetaCart
(Show Context)
When monitoring spatial phenomena with wireless sensor networks, selecting the best sensor placements is a fundamental task. Not only should the sensors be informative, but they should also be able to communicate efficiently. In this paper, we present a datadriven approach that addresses the three central aspects of this problem: measuring the predictive quality of a set of sensor locations (regardless of whether sensors were ever placed at these locations), predicting the communication cost involved with these placements, and designing an algorithm with provable quality guarantees that optimizes the NPhard tradeoff. Specifically, we use data from a pilot deployment to build nonparametric probabilistic models called Gaussian Processes (GPs) both for the spatial phenomena of interest and for the spatial variability of link qualities, which allows us to estimate predictive power and communication cost of unsensed locations. Surprisingly, uncertainty in the representation of link qualities plays an important role in estimating communication costs. Using these models, we present a novel, polynomialtime, datadriven algorithm, pSPIEL, which selects Sensor Placements at Informative and costEffective Locations. Our approach exploits two important properties of this problem: submodularity, formalizing the intuition that adding a node to a small deployment can help more than adding a node to a large deployment; and locality, under which nodes that are far from each other provide almost independent information. Exploiting these properties, we prove strong approximation guarantees for our pSPIEL approach. We also provide extensive experimental validation of this practical approach on several realworld placement problems, and built a complete system implementation on 46 Tmote Sky motes, demonstrating significant advantages over existing methods.
Fast construction of nets in lowdimensional metrics and their applications
 SIAM Journal on Computing
, 2006
"... We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This datastructure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, wellseparated pair decomposition, s ..."
Abstract

Cited by 130 (14 self)
 Add to MetaCart
(Show Context)
We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This datastructure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, wellseparated pair decomposition, spanner construction, compact representation scheme, doubling measure, and computation of the (approximate) Lipschitz constant of a function. In all cases, the running (preprocessing) time is near linear and the space being used is linear. 1
Complex Networks and Decentralized Search Algorithms
 In Proceedings of the International Congress of Mathematicians (ICM
, 2006
"... The study of complex networks has emerged over the past several years as a theme spanning many disciplines, ranging from mathematics and computer science to the social and biological sciences. A significant amount of recent work in this area has focused on the development of random graph models that ..."
Abstract

Cited by 112 (1 self)
 Add to MetaCart
(Show Context)
The study of complex networks has emerged over the past several years as a theme spanning many disciplines, ranging from mathematics and computer science to the social and biological sciences. A significant amount of recent work in this area has focused on the development of random graph models that capture some of the qualitative properties observed in largescale network data; such models have the potential to help us reason, at a general level, about the ways in which realworld networks are organized. We survey one particular line of network research, concerned with smallworld phenomena and decentralized search algorithms, that illustrates this style of analysis. We begin by describing a wellknown experiment that provided the first empirical basis for the "six degrees of separation" phenomenon in social networks; we then discuss some probabilistic network models motivated by this work, illustrating how these models lead to novel algorithmic and graphtheoretic questions, and how they are supported by recent empirical studies of large social networks.
Measured descent: A new embedding method for finite metrics
 In Proc. 45th FOCS
, 2004
"... We devise a new embedding technique, which we call measured descent, based on decomposing a metric space locally, at varying speeds, according to the density of some probability measure. This provides a refined and unified framework for the two primary methods of constructing Fréchet embeddings for ..."
Abstract

Cited by 98 (32 self)
 Add to MetaCart
(Show Context)
We devise a new embedding technique, which we call measured descent, based on decomposing a metric space locally, at varying speeds, according to the density of some probability measure. This provides a refined and unified framework for the two primary methods of constructing Fréchet embeddings for finite metrics, due to [Bourgain, 1985] and [Rao, 1999]. We prove that any npoint metric space (X, d) embeds in Hilbert space with distortion O ( √ αX · log n), where αX is a geometric estimate on the decomposability of X. As an immediate corollary, we obtain an O ( √ (log λX)log n) distortion embedding, where λX is the doubling constant of X. Since λX ≤ n, this result recovers Bourgain’s theorem, but when the metric X is, in a sense, “lowdimensional, ” improved bounds are achieved. Our embeddings are volumerespecting for subsets of arbitrary size. One consequence is the existence of (k, O(log n)) volumerespecting embeddings for all 1 ≤ k ≤ n, which is the best possible, and answers positively a question posed by U. Feige. Our techniques are also used to answer positively a question of Y. Rabinovich, showing that any weighted npoint planar graph O(log n) embeds in ℓ∞ with O(1) distortion. The O(log n) bound on the dimension is optimal, and improves upon the previously known bound of O((log n) 2). 1
Triangulation and Embedding using Small Sets of Beacons
, 2008
"... Concurrent with recent theoretical interest in the problem of metric embedding, a growing body of research in the networking community has studied the distance matrix defined by nodetonode latencies in the Internet, resulting in a number of recent approaches that approximately embed this distance ..."
Abstract

Cited by 98 (11 self)
 Add to MetaCart
Concurrent with recent theoretical interest in the problem of metric embedding, a growing body of research in the networking community has studied the distance matrix defined by nodetonode latencies in the Internet, resulting in a number of recent approaches that approximately embed this distance matrix into lowdimensional Euclidean space. There is a fundamental distinction, however, between the theoretical approaches to the embedding problem and this recent Internetrelated work: in addition to computational limitations, Internet measurement algorithms operate under the constraint that it is only feasible to measure distances for a linear (or nearlinear) number of node pairs, and typically in a highly structured way. Indeed, the most common framework for Internet measurements of this type is a beaconbased approach: one chooses uniformly at random a constant number of nodes (‘beacons’) in the network, each node measures its distance to all beacons, and one then has access to only these measurements for the remainder of the algorithm. Moreover, beaconbased algorithms are often designed not for embedding but for the more basic problem of triangulation, in which one uses the triangle inequality to infer the distances that have not been measured. Here we give algorithms with provable performance guarantees for beaconbased triangulation and
Bypassing the embedding: Algorithms for lowdimensional metrics
 In Proceedings of the 36th ACM Symposium on the Theory of Computing (STOC
, 2004
"... The doubling dimension of a metric is the smallest k such that any ball of radius 2r can be covered using 2 k balls of radius r. This concept for abstract metrics has been proposed as a natural analog to the dimension of a Euclidean space. If we could embed metrics with low doubling dimension into l ..."
Abstract

Cited by 82 (3 self)
 Add to MetaCart
(Show Context)
The doubling dimension of a metric is the smallest k such that any ball of radius 2r can be covered using 2 k balls of radius r. This concept for abstract metrics has been proposed as a natural analog to the dimension of a Euclidean space. If we could embed metrics with low doubling dimension into low dimensional Euclidean spaces, they would inherit several algorithmic and structural properties of the Euclidean spaces. Unfortunately however, such a restriction on dimension does not suffice to guarantee embeddibility in a normed space. In this paper we explore the option of bypassing the embedding. In particular we show the following for low dimensional metrics: • Quasipolynomial time (1+ɛ)approximation algorithm for various optimization problems such as TSP, kmedian and facility location. • (1 + ɛ)approximate distance labeling scheme with optimal label length. • (1+ɛ)stretch polylogarithmic storage routing scheme.