Results 1 - 10
of
69
Improved Combinatorial Algorithms for the Facility Location and k-Median Problems
- In Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science
, 1999
"... We present improved combinatorial approximation algorithms for the uncapacitated facility location and k-median problems. Two central ideas in most of our results are cost scaling and greedy improvement. We present a simple greedy local search algorithm which achieves an approximation ratio of 2:414 ..."
Abstract
-
Cited by 187 (12 self)
- Add to MetaCart
We present improved combinatorial approximation algorithms for the uncapacitated facility location and k-median problems. Two central ideas in most of our results are cost scaling and greedy improvement. We present a simple greedy local search algorithm which achieves an approximation ratio of 2:414 + in ~ O(n 2 =) time. This also yields a bicriteria approximation tradeoff of (1 +; 1+ 2=) for facility cost versus service cost which is better than previously known tradeoffs and close to the best possible. Combining greedy improvement and cost scaling with a recent primal dual algorithm for facility location due to Jain and Vazirani, we get an approximation ratio of 1.853 in ~ O(n 3 ) time. This is already very close to the approximation guarantee of the best known algorithm which is LP-based. Further, combined with the best known LP-based algorithm for facility location, we get a very slight improvement in the approximation factor for facility location, achieving 1.728....
Greedy Facility Location Algorithms analyzed using Dual Fitting with Factor-Revealing LP
- Journal of the ACM
, 2001
"... We present a natural greedy algorithm for the metric uncapacitated facility location problem and use the method of dual fitting to analyze its approximation ratio, which turns out to be 1.861. The running time of our algorithm is O(m log m), where m is the total number of edges in the underlying c ..."
Abstract
-
Cited by 83 (12 self)
- Add to MetaCart
We present a natural greedy algorithm for the metric uncapacitated facility location problem and use the method of dual fitting to analyze its approximation ratio, which turns out to be 1.861. The running time of our algorithm is O(m log m), where m is the total number of edges in the underlying complete bipartite graph between cities and facilities. We use our algorithm to improve recent results for some variants of the problem, such as the fault tolerant and outlier versions. In addition, we introduce a new variant which can be seen as a special case of the concave cost version of this problem.
Clustering data streams: Theory and practice
- IEEE TKDE
, 2003
"... Abstract—The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little ..."
Abstract
-
Cited by 75 (2 self)
- Add to MetaCart
Abstract—The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little memory, is crucial. We describe such a streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithm’s performance on synthetic and real data streams. Index Terms—Clustering, data streams, approximation algorithms. 1
Packing Steiner trees
"... The Steiner packing problem is to find the maximum number of edge-disjoint subgraphs of a given graph G that connect a given set of required points S. This problem is motivated by practical applications in VLSI-layout and broadcasting, as well as theoretical reasons. In this paper, we study this p ..."
Abstract
-
Cited by 71 (5 self)
- Add to MetaCart
The Steiner packing problem is to find the maximum number of edge-disjoint subgraphs of a given graph G that connect a given set of required points S. This problem is motivated by practical applications in VLSI-layout and broadcasting, as well as theoretical reasons. In this paper, we study this problem and present an algorithm with an asymptotic approximation factor of |S|/4. This gives a sufficient condition for the existence of k edge-disjoint Steiner trees in a graph in terms of the edge-connectivity of the graph. We will show that this condition is the best possible if the number of terminals is 3. At the end, we consider the fractional version of this problem, and observe that it can be reduced to the minimum Steiner tree problem via the ellipsoid algorithm.
Adwords and generalized on-line matching
- In FOCS ’05: Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
, 2005
"... How does a search engine company decide what ads to display with each query so as to maximize its revenue? This turns out to be a generalization of the online bipartite matching problem. We introduce the notion of a tradeoff revealing LP and use it to derive two optimal algorithms achieving competit ..."
Abstract
-
Cited by 68 (3 self)
- Add to MetaCart
How does a search engine company decide what ads to display with each query so as to maximize its revenue? This turns out to be a generalization of the online bipartite matching problem. We introduce the notion of a tradeoff revealing LP and use it to derive two optimal algorithms achieving competitive ratios of 1 − 1/e for this problem. 1
Allocating online advertisement space with unreliable estimates
- In Proceedings of the 8th ACM Conference on Electronic Commerce (EC
, 2007
"... We study the problem of optimally allocating online advertisement space to budget-constrained advertisers. This problem was defined and studied from the perspective of worst-case online competitive analysis by Mehta et al. Our objective is to find an algorithm that takes advantage of the given estim ..."
Abstract
-
Cited by 35 (6 self)
- Add to MetaCart
We study the problem of optimally allocating online advertisement space to budget-constrained advertisers. This problem was defined and studied from the perspective of worst-case online competitive analysis by Mehta et al. Our objective is to find an algorithm that takes advantage of the given estimates of the frequencies of keywords to compute a near optimal solution when the estimates are accurate, while at the same time maintaining a good worst-case competitive ratio in case the estimates are totally incorrect. This is motivated by real-world situations where search engines have stochastic information that provide reasonably accurate estimates of the frequency of search queries except in certain highly unpredictable yet economically valuable spikes in the search pattern. Our approach is a black-box approach: we assume we have access to an oracle that uses the given estimates to recommend an advertiser every time a query arrives. We use this oracle to design an algorithm that provides two performance guarantees: the performance guarantee in the case that the oracle gives an accurate estimate, and its worst-case performance guarantee. Our algorithm can be fine tuned by adjusting a parameter α, giving a tradeoff curve between the two performance measures with the best competitive ratio for the worst-case scenario at one end of the curve and the optimal solution for the scenario where estimates are accurate at the other end. Finally, we demonstrate the applicability of our framework by applying it to two classical online problems, namely the lost cow and the ski rental problems.
Approximate k-MSTs and k-Steiner trees via the primal-dual method and Lagrangean relaxation
- MATHEMATICAL PROGRAMMING
, 2001
"... Garg [10] gives two approximation algorithms for the minimum-cost tree spanning k vertices in an undirected graph. Recently Jain and Vazirani [16] discovered primal-dual approximation ..."
Abstract
-
Cited by 34 (4 self)
- Add to MetaCart
Garg [10] gives two approximation algorithms for the minimum-cost tree spanning k vertices in an undirected graph. Recently Jain and Vazirani [16] discovered primal-dual approximation
Strategyproof Cost-sharing Mechanisms for Set Cover and Facility Location Games
, 2003
"... this paper, we obtain strategyproof cost allocations for two fundamental games whose underlying optimization problems are NP-hard, the set cover game and the facility location game. For the latter game, this is made possible by new approximation algorithms for the underlying optimization problem usi ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
this paper, we obtain strategyproof cost allocations for two fundamental games whose underlying optimization problems are NP-hard, the set cover game and the facility location game. For the latter game, this is made possible by new approximation algorithms for the underlying optimization problem using the technique of dual fitting [7]. In retrospect, the natural greedy algorithm for the set cover problem (see [17]) can also analyzed using this technique -- we utilize this viewpoint for handling the set cover game. The facility location game was studied in [9, 4], who left the open problem of obtaining a group strategyproof mechanism based on a constant factor approximation algorithm. Our paper partially answers this question. We give a strategyproof mechanism, but cannot achieve group strategyproofness. More recently, Pal and Tardos [15] have announced a 3-approximately budget balanced crossmonotonic cost-sharing method for the facility location problem. This gives a group strategyproof mechanism for the facility location game that recovers 3 rd of the cost
The pipelined set cover problem
, 2003
"... Abstract. A classical problem in query optimization is to find the optimal ordering of a set of possibly correlated selections. We provide an abstraction of this problem as a generalization of set cover called pipelined set cover, where the sets are applied sequentially to the elements to be covered ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
Abstract. A classical problem in query optimization is to find the optimal ordering of a set of possibly correlated selections. We provide an abstraction of this problem as a generalization of set cover called pipelined set cover, where the sets are applied sequentially to the elements to be covered and the elements covered at each stage are discarded. We show that several natural heuristics for this NP-hard problem, such as the greedy set-cover heuristic and a local-search heuristic, can be analyzed using a linear-programming framework. These heuristics lead to efficient algorithms for pipelined set cover that can be applied to order possibly correlated selections in conventional database systems as well as datastream processing systems. We use our linear-programming framework to show that the greedy and local-search algorithms are 4-approximations for pipelined set cover. We extend our analysis to minimize the lp-norm of the costs paid by the sets, where p ≥ 2 is an integer, to examine the improvement in performance when the total cost has increasing contribution from initial sets in the pipeline. Finally, we consider the online version of pipelined set cover and present a competitive algorithm with a logarithmic performance guarantee. Our analysis framework may be applicable to other problems in query optimization where it is important to account for correlations. 1
Optimal Time Bounds for Approximate Clustering
, 2002
"... Clusteringisafundamentalprobleminunsuper-vised learning, andhasbeenstudiedwidelyboth asaproblemoflearningmixture modelsandasanoptimizationproblem. Inthispaper, we studyclusteringwithrespectthe k-median objectivefunction, anaturalformulationofclusteringin whichweattempttominimize the average distance ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
Clusteringisafundamentalprobleminunsuper-vised learning, andhasbeenstudiedwidelyboth asaproblemoflearningmixture modelsandasanoptimizationproblem. Inthispaper, we studyclusteringwithrespectthe k-median objectivefunction, anaturalformulationofclusteringin whichweattempttominimize the average distancetoclustercenters. Oneofthe maincontributionsofthispaperisasimplebutpowerful samplingtechniquethatwecall successivesampling thatcouldbeofindependentinterest. Weshowthatoursamplingprocedurecan rapidlyidentify asmallsetofpoints(ofsizejust O(k log n/k))thatsummarizetheinputpoints forthepurposeofclustering. Usingsuccessive sampling, we develop analgorithmforthe k-medianproblemthatrunsin O(nk) timeforawiderangeof valuesof k andisguaranteed, with high probability, to return a solution with cost at most a constant factor times optimal. We also establish a lower bound of \Omega ( nk) onanyrandom-izedconstant-factorapproximation algorithm for the k-median problem that succeeds with even a negligible (say

