Improved Combinatorial Algorithms for the Facility Location and kMedian Problems
 In Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science
, 1999
"... We present improved combinatorial approximation algorithms for the uncapacitated facility location and kmedian problems. Two central ideas in most of our results are cost scaling and greedy improvement. We present a simple greedy local search algorithm which achieves an approximation ratio of 2:414 ..."
We present improved combinatorial approximation algorithms for the uncapacitated facility location and kmedian problems. Two central ideas in most of our results are cost scaling and greedy improvement. We present a simple greedy local search algorithm which achieves an approximation ratio of 2:414 + in ~ O(n 2 =) time. This also yields a bicriteria approximation tradeoff of (1 +; 1+ 2=) for facility cost versus service cost which is better than previously known tradeoffs and close to the best possible. Combining greedy improvement and cost scaling with a recent primal dual algorithm for facility location due to Jain and Vazirani, we get an approximation ratio of 1.853 in ~ O(n 3 ) time. This is already very close to the approximation guarantee of the best known algorithm which is LPbased. Further, combined with the best known LPbased algorithm for facility location, we get a very slight improvement in the approximation factor for facility location, achieving 1.728....
Improved Approximation Algorithms for Metric Facility Location Problems
 In Proceedings of the 5th International Workshop on Approximation Algorithms for Combinatorial Optimization
, 2002
"... In this paper we present a 1.52approximation algorithm for the metric uncapacitated facility location problem, and a 2approximation algorithm for the metric capacitated facility location problem with soft capacities. Both these algorithms improve the best previously known approximation factor for ..."
In this paper we present a 1.52approximation algorithm for the metric uncapacitated facility location problem, and a 2approximation algorithm for the metric capacitated facility location problem with soft capacities. Both these algorithms improve the best previously known approximation factor for the corresponding problem, and our softcapacitated facility location algorithm achieves the integrality gap of the standard LP relaxation of the problem. Furthermore, we will show, using a result of Thorup, that our algorithms can be implemented in quasilinear time.
Clustering data streams: Theory and practice
 IEEE TKDE
, 2003
"... Abstract—The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little ..."
Abstract—The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little memory, is crucial. We describe such a streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithm’s performance on synthetic and real data streams. Index Terms—Clustering, data streams, approximation algorithms. 1
Greedy Facility Location Algorithms analyzed using Dual Fitting with FactorRevealing LP
 Journal of the ACM
, 2001
"... We present a natural greedy algorithm for the metric uncapacitated facility location problem and use the method of dual fitting to analyze its approximation ratio, which turns out to be 1.861. The running time of our algorithm is O(m log m), where m is the total number of edges in the underlying c ..."
We present a natural greedy algorithm for the metric uncapacitated facility location problem and use the method of dual fitting to analyze its approximation ratio, which turns out to be 1.861. The running time of our algorithm is O(m log m), where m is the total number of edges in the underlying complete bipartite graph between cities and facilities. We use our algorithm to improve recent results for some variants of the problem, such as the fault tolerant and outlier versions. In addition, we introduce a new variant which can be seen as a special case of the concave cost version of this problem.
A local search approximation algorithm for kmeans clustering
, 2004
"... In kmeans clustering we are given a set of n data points in ddimensional space ℜd and an integer k, and the problem is to determine a set of k points in ℜd, called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomialtime algorithms are kno ..."
In kmeans clustering we are given a set of n data points in ddimensional space ℜd and an integer k, and the problem is to determine a set of k points in ℜd, called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomialtime algorithms are known for this problem. Although asymptotically efficient approximation algorithms exist, these algorithms are not practical due to the very high constant factors involved. There are many heuristics that are used in practice, but we know of no bounds on their performance. We consider the question of whether there exists a simple and practical approximation algorithm for kmeans clustering. We present a local improvement heuristic based on swapping centers in and out. We prove that this yields a (9 + ε)approximation algorithm. We present an example showing that any approach based on performing a fixed number of swaps achieves an approximation factor of at least (9 − ε) in all sufficiently high dimensions. Thus, our approximation factor is almost tight for algorithms based on performing a fixed number of swaps. To establish the practical value of the heuristic, we present an empirical study that shows that, when combined with
Better Streaming Algorithms for Clustering Problems
, 2003
"... We study clustering problems in the streaming model, where the goal is to cluster a set of points by making one pass (or a few passes) over the data using a small amount of storage space. Our main result is a randomized algorithm for the k–Median problem which produces a constant factor approximatio ..."
We study clustering problems in the streaming model, where the goal is to cluster a set of points by making one pass (or a few passes) over the data using a small amount of storage space. Our main result is a randomized algorithm for the k–Median problem which produces a constant factor approximation in one pass using storage space O(kpolylog n). This is a significant improvement of the previous best algorithm which yielded a 2 O(1/ɛ) approximation using O(n ɛ)space. Next we give a streaming algorithm for the k–Median problem with an arbitrary distance function. We also study algorithms for clustering problems with outliers in the streaming model. Here, we give bicriterion guarantees, producing constant factor approximations by increasing the allowed fraction of outliers slightly.
On hierarchical traffic grooming in WDM networks
 IEEE/ACM Transactions on Networking
, 2008
"... Abstract—The traffic grooming problem is of high practical importance in emerging widearea wavelength division multiplexing (WDM) optical networks, yet it is intractable for any but trivial network topologies. In this work, we present an effective and efficient hierarchical traffic grooming framewo ..."
Abstract—The traffic grooming problem is of high practical importance in emerging widearea wavelength division multiplexing (WDM) optical networks, yet it is intractable for any but trivial network topologies. In this work, we present an effective and efficient hierarchical traffic grooming framework for WDM networks of general topology, with the objective of minimizing the total number of electronic ports. At the first level of hierarchy, we decompose the network into clusters and designate one node in each cluster as the hub for grooming traffic. At the second level, the hubs form another cluster for grooming intercluster traffic. We view each (firstor secondlevel) cluster as a virtual star, and we present an efficient nearoptimal algorithm for determining the logical topology of lightpaths to carry the traffic within each cluster. Routing and wavelength assignment is then performed directly on the underlying physical topology. We demonstrate the effectiveness of our approach by applying it to two networks of realistic size, a 32node, 53link topology and a 47node, 96link network. Comparisons to lower bounds indicate that hierarchical grooming is efficient in its use of the network resources of interest, namely, electronic ports and wavelengths. In addition to scaling to large network sizes, our hierarchical approach also facilitates the control and management of multigranular networks. Index Terms—Hierarchical traffic grooming, Kcenter, optical networks, wavelength division multiplexing (WDM).
A New Conceptual Clustering Framework
 MACHINE LEARNING
, 2004
"... We propose a new formulation of the conceptual clustering problem where the goal is to explicitly output a collection of simple and meaningful conjunctions of attributes that define the clusters. The formulation differs from previous approaches since the clusters discovered may overlap and also may ..."
We propose a new formulation of the conceptual clustering problem where the goal is to explicitly output a collection of simple and meaningful conjunctions of attributes that define the clusters. The formulation differs from previous approaches since the clusters discovered may overlap and also may not cover all the points. In addition, a point may be assigned to a cluster description even if it only satisfies most, and not necessarily all, of the attributes in the conjunction. Connections between this conceptual clustering problem and the maximum edge biclique problem are made. Simple, randomized algorithms are given that discover a collection of approximate conjunctive cluster descriptions in sublinear time.
Experimental Evaluation of a New Shortest Path Algorithm (Extended Abstract)
, 2002
"... We evaluate the practical eciency of a new shortest path algorithm for undirected graphs which was developed by the rst two authors. This algorithm works on the fundamental comparisonaddition model. Theoretically, this new algorithm outperforms Dijkstra's algorithm on sparse graphs for t ..."
We evaluate the practical eciency of a new shortest path algorithm for undirected graphs which was developed by the rst two authors. This algorithm works on the fundamental comparisonaddition model. Theoretically, this new algorithm outperforms Dijkstra's algorithm on sparse graphs for the allpairs shortest path problem, and more generally, for the problem of computing singlesource shortest paths from !(1) different sources. Our extensive experimental analysis demonstrates that this is also the case in practice. We present results which show the new algorithm to run faster than Dijkstra's on a variety of sparse graphs when the number of vertices ranges from a few thousand to a few million, and when computing singlesource shortest paths from as few as three different sources.
Sublineartime algorithms
 In Oded Goldreich, editor, Property Testing, volume 6390 of Lecture Notes in Computer Science
, 2010
"... In this paper we survey recent (up to end of 2009) advances in the area of sublineartime algorithms. 1 ..."
In this paper we survey recent (up to end of 2009) advances in the area of sublineartime algorithms. 1