Results 1  10
of
351
Similarity estimation techniques from rounding algorithms
 In Proc. of 34th STOC
, 2002
"... A locality sensitive hashing scheme is a distribution on a family F of hash functions operating on a collection of objects, such that for two objects x, y, Prh∈F[h(x) = h(y)] = sim(x,y), where sim(x,y) ∈ [0, 1] is some similarity function defined on the collection of objects. Such a scheme leads ..."
Abstract

Cited by 449 (6 self)
 Add to MetaCart
(Show Context)
A locality sensitive hashing scheme is a distribution on a family F of hash functions operating on a collection of objects, such that for two objects x, y, Prh∈F[h(x) = h(y)] = sim(x,y), where sim(x,y) ∈ [0, 1] is some similarity function defined on the collection of objects. Such a scheme leads to a compact representation of objects so that similarity of objects can be estimated from their compact sketches, and also leads to efficient algorithms for approximate nearest neighbor search and clustering. Minwise independent permutations provide an elegant construction of such a locality sensitive hashing scheme for a collection of subsets with the set similarity measure sim(A, B) = A∩B A∪B . We show that rounding algorithms for LPs and SDPs used in the context of approximation algorithms can be viewed as locality sensitive hashing schemes for several interesting collections of objects. Based on this insight, we construct new locality sensitive hashing schemes for: 1. A collection of vectors with the distance between ⃗u and ⃗v measured by θ(⃗u,⃗v)/π, where θ(⃗u,⃗v) is the angle between ⃗u and ⃗v. This yields a sketching scheme for estimating the cosine similarity measure between two vectors, as well as a simple alternative to minwise independent permutations for estimating set similarity. 2. A collection of distributions on n points in a metric space, with distance between distributions measured by the Earth Mover Distance (EMD), (a popular distance measure in graphics and vision). Our hash functions map distributions to points in the metric space such that, for distributions P and Q,
Approximation algorithms for metric facility location and kmedian problems using the . . .
"... ..."
A Tight Bound on Approximating Arbitrary Metrics by Tree Metrics
 In Proceedings of the 35th Annual ACM Symposium on Theory of Computing
, 2003
"... In this paper, we show that any n point metric space can be embedded into a distribution over dominating tree metrics such that the expected stretch of any edge is O(log n). This improves upon the result of Bartal who gave a bound of O(log n log log n). Moreover, our result is existentially tight; t ..."
Abstract

Cited by 306 (8 self)
 Add to MetaCart
(Show Context)
In this paper, we show that any n point metric space can be embedded into a distribution over dominating tree metrics such that the expected stretch of any edge is O(log n). This improves upon the result of Bartal who gave a bound of O(log n log log n). Moreover, our result is existentially tight; there exist metric spaces where any tree embedding must have distortion#sto n)distortion. This problem lies at the heart of numerous approximation and online algorithms including ones for group Steiner tree, metric labeling, buyatbulk network design and metrical task system. Our result improves the performance guarantees for all of these problems.
IDMaps: A Global Internet Host Distance Estimation Service
 IN PROCEEDINGS OF IEEE INFOCOM
, 2000
"... There is an increasing need to quickly and efficiently learn network distances, in terms of metrics such as latency or bandwidth, between Internet hosts. For example, Internet content providers often place data and server mirrors throughout the Internet to improve access latency for clients, and it ..."
Abstract

Cited by 302 (13 self)
 Add to MetaCart
There is an increasing need to quickly and efficiently learn network distances, in terms of metrics such as latency or bandwidth, between Internet hosts. For example, Internet content providers often place data and server mirrors throughout the Internet to improve access latency for clients, and it is necessary to direct clients to the closest mirrors based on some distance metric in order to realize the benefit of mirrors. We suggest a scalable Internetwide architecture, called IDMaps, which measures and disseminates distance information on the global Internet. Higherlevel services can collect such distance information to build a virtual distance map of the Internet and estimate the distance between any pair of IP addresses. We present our solutions to the measurement server placement and distance map construction problems in IDMaps. We show that IDMaps can indeed provide useful distance estimations to applications such as closestmirror selection.
On Approximating Arbitrary Metrics by Tree Metrics
 In Proceedings of the 30th Annual ACM Symposium on Theory of Computing
, 1998
"... This paper is concerned with probabilistic approximation of metric spaces. In previous work we introduced the method of ecient approximation of metrics by more simple families of metrics in a probabilistic fashion. In particular we study probabilistic approximations of arbitrary metric spaces by \hi ..."
Abstract

Cited by 266 (14 self)
 Add to MetaCart
(Show Context)
This paper is concerned with probabilistic approximation of metric spaces. In previous work we introduced the method of ecient approximation of metrics by more simple families of metrics in a probabilistic fashion. In particular we study probabilistic approximations of arbitrary metric spaces by \hierarchically wellseparated tree" metric spaces. This has proved as a useful technique for simplifying the solutions to various problems.
A constantfactor approximation algorithm for the kmedian problem
 In Proceedings of the 31st Annual ACM Symposium on Theory of Computing
, 1999
"... We present the first constantfactor approximation algorithm for the metric kmedian problem. The kmedian problem is one of the most wellstudied clustering problems, i.e., those problems in which the aim is to partition a given set of points into clusters so that the points within a cluster are re ..."
Abstract

Cited by 249 (13 self)
 Add to MetaCart
(Show Context)
We present the first constantfactor approximation algorithm for the metric kmedian problem. The kmedian problem is one of the most wellstudied clustering problems, i.e., those problems in which the aim is to partition a given set of points into clusters so that the points within a cluster are relatively close with respect to some measure. For the metric kmedian problem, we are given n points in a metric space. We select k of these to be cluster centers, and then assign each point to its closest selected center. If point j is assigned to a center i, the cost incurred is proportional to the distance between i and j. The goal is to select the k centers that minimize the sum of the assignment costs. We give a 6 2 3approximation algorithm for this problem. This improves upon the best previously known result of O(log k log log k), which was obtained by refining and derandomizing a randomized O(log n log log n)approximation algorithm of Bartal. 1
Improved Combinatorial Algorithms for the Facility Location and kMedian Problems
 In Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science
, 1999
"... We present improved combinatorial approximation algorithms for the uncapacitated facility location and kmedian problems. Two central ideas in most of our results are cost scaling and greedy improvement. We present a simple greedy local search algorithm which achieves an approximation ratio of 2:414 ..."
Abstract

Cited by 225 (12 self)
 Add to MetaCart
(Show Context)
We present improved combinatorial approximation algorithms for the uncapacitated facility location and kmedian problems. Two central ideas in most of our results are cost scaling and greedy improvement. We present a simple greedy local search algorithm which achieves an approximation ratio of 2:414 + in ~ O(n 2 =) time. This also yields a bicriteria approximation tradeoff of (1 +; 1+ 2=) for facility cost versus service cost which is better than previously known tradeoffs and close to the best possible. Combining greedy improvement and cost scaling with a recent primal dual algorithm for facility location due to Jain and Vazirani, we get an approximation ratio of 1.853 in ~ O(n 3 ) time. This is already very close to the approximation guarantee of the best known algorithm which is LPbased. Further, combined with the best known LPbased algorithm for facility location, we get a very slight improvement in the approximation factor for facility location, achieving 1.728....
Network Topology Generators: DegreeBased vs. Structural
, 2002
"... Following the longheld belief that the Internet is hierarchical, the network topology generators most widely used by the Internet research community, TransitStub and Tiers, create networks with a deliberately hierarchical structure. However, in 1999 a seminal paper by Faloutsos et al. revealed tha ..."
Abstract

Cited by 207 (17 self)
 Add to MetaCart
(Show Context)
Following the longheld belief that the Internet is hierarchical, the network topology generators most widely used by the Internet research community, TransitStub and Tiers, create networks with a deliberately hierarchical structure. However, in 1999 a seminal paper by Faloutsos et al. revealed that the Internet's degree distribution is a powerlaw. Because the degree distributions produced by the TransitStub and Tiers generators are not powerlaws, the research community has largely dismissed them as inadequate and proposed new network generators that attempt to generate graphs with powerlaw degree distributions.
Approximation Algorithms for Classification Problems with Pairwise Relationships: Metric Labeling and Markov Random Fields
 IN IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE
, 1999
"... In a traditional classification problem, we wish to assign one of k labels (or classes) to each of n objects, in a way that is consistent with some observed data that we have about the problem. An active line of research in this area is concerned with classification when one has information about pa ..."
Abstract

Cited by 197 (2 self)
 Add to MetaCart
In a traditional classification problem, we wish to assign one of k labels (or classes) to each of n objects, in a way that is consistent with some observed data that we have about the problem. An active line of research in this area is concerned with classification when one has information about pairwise relationships among the objects to be classified; this issue is one of the principal motivations for the framework of Markov random fields, and it arises in areas such as image processing, biometry, and document analysis. In its most basic form, this style of analysis seeks a classification that optimizes a combinatorial function consisting of assignment costs  based on the individual choice of label we make for each object  and separation costs  based on the pair of choices we make for two "related" objects. We formulate a general classification problem of this type, the metric labeling problem; we show that it contains as special cases a number of standard classification f...
Approximation Algorithms for Directed Steiner Problems
 Journal of Algorithms
, 1998
"... We give the first nontrivial approximation algorithms for the Steiner tree problem and the generalized Steiner network problem on general directed graphs. These problems have several applications in network design and multicast routing. For both problems, the best ratios known before our work we ..."
Abstract

Cited by 178 (8 self)
 Add to MetaCart
We give the first nontrivial approximation algorithms for the Steiner tree problem and the generalized Steiner network problem on general directed graphs. These problems have several applications in network design and multicast routing. For both problems, the best ratios known before our work were the trivial O(k)approximations. For the directed Steiner tree problem, we design a family of algorithms that achieves an approximation ratio of i(i \Gamma 1)k 1=i in time O(n i k 2i ) for any fixed i ? 1, where k is the number of terminals. Thus, an O(k ffl ) approximation ratio can be achieved in polynomial time for any fixed ffl ? 0. Setting i = log k, we obtain an O(log 2 k) approximation ratio in quasipolynomial time. For the directed generalized Steiner network problem, we give an algorithm that achieves an approximation ratio of O(k 2=3 log 1=3 k), where k is the number of pairs of vertices that are to be connected. Related problems including the group Steiner...