Results 1 -
9 of
9
Greedy Facility Location Algorithms analyzed using Dual Fitting with Factor-Revealing LP
- Journal of the ACM
, 2001
"... We present a natural greedy algorithm for the metric uncapacitated facility location problem and use the method of dual fitting to analyze its approximation ratio, which turns out to be 1.861. The running time of our algorithm is O(m log m), where m is the total number of edges in the underlying c ..."
Abstract
-
Cited by 83 (12 self)
- Add to MetaCart
We present a natural greedy algorithm for the metric uncapacitated facility location problem and use the method of dual fitting to analyze its approximation ratio, which turns out to be 1.861. The running time of our algorithm is O(m log m), where m is the total number of edges in the underlying complete bipartite graph between cities and facilities. We use our algorithm to improve recent results for some variants of the problem, such as the fault tolerant and outlier versions. In addition, we introduce a new variant which can be seen as a special case of the concave cost version of this problem.
Clustering data streams: Theory and practice
- IEEE TKDE
, 2003
"... Abstract—The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little ..."
Abstract
-
Cited by 75 (2 self)
- Add to MetaCart
Abstract—The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little memory, is crucial. We describe such a streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithm’s performance on synthetic and real data streams. Index Terms—Clustering, data streams, approximation algorithms. 1
A local search approximation algorithm for k-means clustering
, 2004
"... In k-means clustering we are given a set of n data points in d-dimensional space ℜd and an integer k, and the problem is to determine a set of k points in ℜd, called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomial-time algorithms are kno ..."
Abstract
-
Cited by 47 (1 self)
- Add to MetaCart
In k-means clustering we are given a set of n data points in d-dimensional space ℜd and an integer k, and the problem is to determine a set of k points in ℜd, called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomial-time algorithms are known for this problem. Although asymptotically efficient approximation algorithms exist, these algorithms are not practical due to the very high constant factors involved. There are many heuristics that are used in practice, but we know of no bounds on their performance. We consider the question of whether there exists a simple and practical approximation algorithm for k-means clustering. We present a local improvement heuristic based on swapping centers in and out. We prove that this yields a (9 + ε)-approximation algorithm. We present an example showing that any approach based on performing a fixed number of swaps achieves an approximation factor of at least (9 − ε) in all sufficiently high dimensions. Thus, our approximation factor is almost tight for algorithms based on performing a fixed number of swaps. To establish the practical value of the heuristic, we present an empirical study that shows that, when combined with
Optimal Time Bounds for Approximate Clustering
, 2002
"... Clusteringisafundamentalprobleminunsuper-vised learning, andhasbeenstudiedwidelyboth asaproblemoflearningmixture modelsandasanoptimizationproblem. Inthispaper, we studyclusteringwithrespectthe k-median objectivefunction, anaturalformulationofclusteringin whichweattempttominimize the average distance ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
Clusteringisafundamentalprobleminunsuper-vised learning, andhasbeenstudiedwidelyboth asaproblemoflearningmixture modelsandasanoptimizationproblem. Inthispaper, we studyclusteringwithrespectthe k-median objectivefunction, anaturalformulationofclusteringin whichweattempttominimize the average distancetoclustercenters. Oneofthe maincontributionsofthispaperisasimplebutpowerful samplingtechniquethatwecall successivesampling thatcouldbeofindependentinterest. Weshowthatoursamplingprocedurecan rapidlyidentify asmallsetofpoints(ofsizejust O(k log n/k))thatsummarizetheinputpoints forthepurposeofclustering. Usingsuccessive sampling, we develop analgorithmforthe k-medianproblemthatrunsin O(nk) timeforawiderangeof valuesof k andisguaranteed, with high probability, to return a solution with cost at most a constant factor times optimal. We also establish a lower bound of \Omega ( nk) onanyrandom-izedconstant-factorapproximation algorithm for the k-median problem that succeeds with even a negligible (say
Self-improving algorithms
- in SODA ’06: Proceedings of the seventeenth annual ACMSIAM symposium on Discrete algorithm
"... We investigate ways in which an algorithm can improve its expected performance by finetuning itself automatically with respect to an arbitrary, unknown input distribution. We give such self-improving algorithms for sorting and computing Delaunay triangulations. The highlights of this work: (i) an al ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
We investigate ways in which an algorithm can improve its expected performance by finetuning itself automatically with respect to an arbitrary, unknown input distribution. We give such self-improving algorithms for sorting and computing Delaunay triangulations. The highlights of this work: (i) an algorithm to sort a list of numbers with optimal expected limiting complexity; and (ii) an algorithm to compute the Delaunay triangulation of a set of points with optimal expected limiting complexity. In both cases, the algorithm begins with a training phase during which it adjusts itself to the input distribution, followed by a stationary regime in which the algorithm settles to its optimized incarnation. 1
On the Implementation of a Swap-Based Local Search Procedure for the p-Median Problem
, 2002
"... We present a new implementation of a widely used swap-based local search procedure for the p-median problem. It produces the same output as the best implementation described in the literature and has the same worst-case complexity, but, through the use of extra memory, it can be significantly faster ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
We present a new implementation of a widely used swap-based local search procedure for the p-median problem. It produces the same output as the best implementation described in the literature and has the same worst-case complexity, but, through the use of extra memory, it can be significantly faster in practice: speedups of up to three orders of magnitude were observed.
A 2-Approximation Algorithm for the Soft-Capacitated Facility Location Problem
- Proceedings of the 6th International Workshop on Approximation Algorithms for Combinatorial Optimization (APPROX), LNCS 2764
, 2003
"... This paper is divided into two parts. In the first part of this paper, we present a 2-approximation algorithm for the soft-capacitated facility location problem. This achieves the integrality gap of the natural LP relaxation of the problem. The algorithm is based on an improved analysis of an algo ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
This paper is divided into two parts. In the first part of this paper, we present a 2-approximation algorithm for the soft-capacitated facility location problem. This achieves the integrality gap of the natural LP relaxation of the problem. The algorithm is based on an improved analysis of an algorithm for the linear facility location problem, and a bifactor approximate-reduction from this problem to the soft-capacitated facility location problem. We will define and use the concept of bifactor approximate reductions to improve the approximation factor of several other variants of the facility location problem. In the second part of the paper, we present an alternative analysis of the authors' 1.52approximation algorithm for the uncapacitated facility location problem, using a single factor-revealing LP. This answers an open question of [18]. Furthermore, this analysis, combined with a recent result of Thorup [25] shows that our algorithm can be implemented in quasi-linear time, achieving the best known approximation factor in the best possible running time.
Hierarchical Traffic Grooming in Large-Scale WDM Networks
, 2005
"... The advances in fiber optics and wavelength division multiplexing (WDM) technology are viewed as the key to satisfying the data-driven bandwidth demand of today’s Internet. The mismatch of bandwidths between user needs and wavelength capacity makes it clear that some multiplexing should be done to u ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The advances in fiber optics and wavelength division multiplexing (WDM) technology are viewed as the key to satisfying the data-driven bandwidth demand of today’s Internet. The mismatch of bandwidths between user needs and wavelength capacity makes it clear that some multiplexing should be done to use the wavelength capacity efficiently, which will result in reduction on the cost of line terminating equipment (LTE). The technique is referred to as traffic grooming. Previous studies have concentrated on different objectives, or on some special network topologies such as rings. In our study, we aim at minimizing the LTE cost to directly target on minimizing the network cost. We look into the grooming problem in elemental topologies as a starting point. First, we conduct proofs to show that traffic grooming in path, ring and star topology networks with the cost function we consider is NP-Complete. We also show the same complexity results for a Min-Max objective that has not been considered before, on the two elementary topologies. We then design polynomialtime heuristic algorithms for the grooming problem in rings (thus implicitly paths) and stars for networks of larger size. Experiments on various network sizes and traffic patterns
Categories and Subject Descriptors F.2.2 [Theory of Computation]: Analysis of Algorithms and Problem Complexity--computations on discrete structures General Terms Algorithms,Theory
"... ABSTRACT We study clustering problems in the streaming model, where the goal is to cluster a set of points by making one pass (or a few passes) over the data using a small amount of storage space. Our main result is a randomized algorithm for the k-Median problem which produces a constant factor app ..."
Abstract
- Add to MetaCart
ABSTRACT We study clustering problems in the streaming model, where the goal is to cluster a set of points by making one pass (or a few passes) over the data using a small amount of storage space. Our main result is a randomized algorithm for the k-Median problem which produces a constant factor approximation in one pass using storage space O(k poly log n). This is a significant improvement of the previous best algorithm which yielded a 2O(1/ffl) approximation using O(nffl) space. Next we give a streaming algorithm for the k-Median problem with an arbitrary distance function. We also study algorithms for clustering problems with outliers in the streaming model. Here, we give bicriterion guarantees, producing constant factor approximations by increasing the allowed fraction of outliers slightly.

