Results 1 - 10
of
46
Improved Combinatorial Algorithms for the Facility Location and k-Median Problems
- In Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science
, 1999
"... We present improved combinatorial approximation algorithms for the uncapacitated facility location and k-median problems. Two central ideas in most of our results are cost scaling and greedy improvement. We present a simple greedy local search algorithm which achieves an approximation ratio of 2:414 ..."
Abstract
-
Cited by 187 (12 self)
- Add to MetaCart
We present improved combinatorial approximation algorithms for the uncapacitated facility location and k-median problems. Two central ideas in most of our results are cost scaling and greedy improvement. We present a simple greedy local search algorithm which achieves an approximation ratio of 2:414 + in ~ O(n 2 =) time. This also yields a bicriteria approximation tradeoff of (1 +; 1+ 2=) for facility cost versus service cost which is better than previously known tradeoffs and close to the best possible. Combining greedy improvement and cost scaling with a recent primal dual algorithm for facility location due to Jain and Vazirani, we get an approximation ratio of 1.853 in ~ O(n 3 ) time. This is already very close to the approximation guarantee of the best known algorithm which is LP-based. Further, combined with the best known LP-based algorithm for facility location, we get a very slight improvement in the approximation factor for facility location, achieving 1.728....
Approximate Distance Oracles
, 2001
"... Let G = (V; E) be an undirected weighted graph with jV j = n and jEj = m. Let k 1 be an integer. We show that G = (V; E) can be preprocessed in O(kmn ) expected time, constructing a data structure of size O(kn ), such that any subsequent distance query can be answered, approximately, in O(k ..."
Abstract
-
Cited by 154 (6 self)
- Add to MetaCart
Let G = (V; E) be an undirected weighted graph with jV j = n and jEj = m. Let k 1 be an integer. We show that G = (V; E) can be preprocessed in O(kmn ) expected time, constructing a data structure of size O(kn ), such that any subsequent distance query can be answered, approximately, in O(k) time. The approximate distance returned is of stretch at most 2k \Gamma 1, i.e., the quotient obtained by dividing the estimated distance by the actual distance lies between 1 and 2k \Gamma 1. We show that a 1963 girth conjecture of Erdos, implies ) space is needed in the worst case for any real stretch strictly smaller than 2k + 1. The space requirement of our algorithm is, therefore, essentially optimal.
Clustering data streams: Theory and practice
- IEEE TKDE
, 2003
"... Abstract—The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little ..."
Abstract
-
Cited by 75 (2 self)
- Add to MetaCart
Abstract—The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little memory, is crucial. We describe such a streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithm’s performance on synthetic and real data streams. Index Terms—Clustering, data streams, approximation algorithms. 1
The Online Median Problem
- In Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science
, 2000
"... We introduce a natural variant of the (metric uncapacitated) k-median problem that we call the online median problem. Whereas the k-median problem involves optimizing the simultaneous placement of k facilities, the online median problem imposes the following additional constraints: the facilities ar ..."
Abstract
-
Cited by 69 (2 self)
- Add to MetaCart
We introduce a natural variant of the (metric uncapacitated) k-median problem that we call the online median problem. Whereas the k-median problem involves optimizing the simultaneous placement of k facilities, the online median problem imposes the following additional constraints: the facilities are placed one at a time; a facility cannot be moved once it is placed, and the total number of facilities to be placed, k, is not known in advance. The objective of an online median algorithm is to minimize the competitive ratio, that is, the worst-case ratio of the cost of an online placement to that of an optimal offline placement. Our main result is a linear-time constant-competitive algorithm for the online median problem. In addition, we present a related, though substantially simpler, linear-time constant-factor approximation algorithm for the (metric uncapacitated) facility location problem. The latter algorithm is similar in spirit to the recent primal-dual-based facility location algorithm of Jain and Vazirani, but our approach is more elementary and yields an improved running time.
Better Streaming Algorithms for Clustering Problems
- In Proc. of 35th ACM Symposium on Theory of Computing (STOC
, 2003
"... We study cluster ng pr blems in the str aming model, wher e the goal is to cluster a set of points by making one pass (or a few passes) over the data using a small amount of storSD space.Our mainr esult is a r ndomized algor ithm for k--Median prE lem which p duces a constant factor a ..."
Abstract
-
Cited by 63 (1 self)
- Add to MetaCart
We study cluster ng pr blems in the str aming model, wher e the goal is to cluster a set of points by making one pass (or a few passes) over the data using a small amount of storSD space.Our mainr esult is a r ndomized algor ithm for k--Median prE lem which p duces a constant factor appr oximation in one pass using storR4 space O(kpolylog n). This is a significant imp r vement of the prS ious best algor5 hm which yielded a 2 appr ximation using O(n )space.
Maintaining Variance and k-Medians over Data Stream Windows
- In PODS
, 2003
"... The sliding window model is useful for discounting stale data in data stream applications. In this model, data elements arrive continually and only the most recent N elements are used when answering queries. We present a novel technique for solving two important and related problems in the sliding w ..."
Abstract
-
Cited by 60 (0 self)
- Add to MetaCart
The sliding window model is useful for discounting stale data in data stream applications. In this model, data elements arrive continually and only the most recent N elements are used when answering queries. We present a novel technique for solving two important and related problems in the sliding window model --- maintaining variance and maintaining a k-- median clustering. Our solution to the problem of maintaining variance provides a continually updated estimate of the variance of the last N values in a data stream with relative error of at most # using O( # 2 log N) memory. We present a constant-factor approximation algorithm which maintains an approximate k--median solution for the last N data points using O( N) memory, where # < 1/2 is a parameter which trades o# the space bound with the approximation factor of O(2 ).
Coresets for k-Means and k-Median Clustering and their Applications
- In Proc. 36th Annu. ACM Sympos. Theory Comput
, 2003
"... In this paper, we show the existence of small coresets for the problems of computing k-median and k-means clustering for points in low dimension. In other words, we show that given a point set P in IR , one can compute a weighted set S P , of size log n), such that one can compute the k-med ..."
Abstract
-
Cited by 41 (13 self)
- Add to MetaCart
In this paper, we show the existence of small coresets for the problems of computing k-median and k-means clustering for points in low dimension. In other words, we show that given a point set P in IR , one can compute a weighted set S P , of size log n), such that one can compute the k-median/means clustering on S instead of on P , and get an (1 + ")-approximation.
A Sublinear Time Approximation Scheme for Clustering in Metric Spaces
- in Metric Spaces, Proc. 40th IEEE FOCS
"... The metric 2-clustering problem is defined as follows: given a metric (X; d), partition X into two sets S 1 and S 2 in order to minimize the value of X i X fu;vgaeS i d(u; v) In this paper we show an approximation scheme for this problem. 1 Introduction In this paper we consider the following ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
The metric 2-clustering problem is defined as follows: given a metric (X; d), partition X into two sets S 1 and S 2 in order to minimize the value of X i X fu;vgaeS i d(u; v) In this paper we show an approximation scheme for this problem. 1 Introduction In this paper we consider the following k- clustering problem: given a weighted graph G = (X; d) on N vertices, where d(\Delta; \Delta) is a weight function, partition X into k sets S 1 : : : S k such that the value of X i X fu;vgaeS i d(u; v) is minimized. This problem was first formally posed by Sahni and Gonzalez [7]. They observed that the problem is NP-complete (for k 2) and by reduction from k-coloring showed that it cannot be approximated up to any constant (for k 3). Instead, they proposed a 1=k-approximation algorithm for the dual version of this problem where the goal is to maximize the weight of edges which do not belong to any cluster (i.e. k-max cut). Unfortunately, the latter result does not (and cannot) impl...
Optimal Time Bounds for Approximate Clustering
, 2002
"... Clusteringisafundamentalprobleminunsuper-vised learning, andhasbeenstudiedwidelyboth asaproblemoflearningmixture modelsandasanoptimizationproblem. Inthispaper, we studyclusteringwithrespectthe k-median objectivefunction, anaturalformulationofclusteringin whichweattempttominimize the average distance ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
Clusteringisafundamentalprobleminunsuper-vised learning, andhasbeenstudiedwidelyboth asaproblemoflearningmixture modelsandasanoptimizationproblem. Inthispaper, we studyclusteringwithrespectthe k-median objectivefunction, anaturalformulationofclusteringin whichweattempttominimize the average distancetoclustercenters. Oneofthe maincontributionsofthispaperisasimplebutpowerful samplingtechniquethatwecall successivesampling thatcouldbeofindependentinterest. Weshowthatoursamplingprocedurecan rapidlyidentify asmallsetofpoints(ofsizejust O(k log n/k))thatsummarizetheinputpoints forthepurposeofclustering. Usingsuccessive sampling, we develop analgorithmforthe k-medianproblemthatrunsin O(nk) timeforawiderangeof valuesof k andisguaranteed, with high probability, to return a solution with cost at most a constant factor times optimal. We also establish a lower bound of \Omega ( nk) onanyrandom-izedconstant-factorapproximation algorithm for the k-median problem that succeeds with even a negligible (say

