Analysis of a local search heuristic for facility location problems
 IN PROCEEDINGS OF THE 9TH ANNUAL ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1998
Cited by 148 (5 self)
In this paper, we study approximation algorithms for several NPhard facility location problems. We prove that a simple local search heuristic yields polynomialtime constantfactor approximation bounds for the metric versions of the uncapacitated kmedian problem and the uncapacitated facility location problem. (For the kmedian problem, our algorithms require a constantfactor blowup in the parameter k.) This local search heuristic was rst proposed several decades ago, and has been shown to exhibit good practical performance in empirical studies. We also extend the above results to obtain constantfactor approximation bounds for the metric versions of capacitated kmedian and facility location problems.
Clustering data streams: Theory and practice
 IEEE TKDE
, 2003
Cited by 106 (2 self)
Abstract—The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little memory, is crucial. We describe such a streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithm’s performance on synthetic and real data streams. Index Terms—Clustering, data streams, approximation algorithms. 1
An Impossibility Theorem for Clustering
, 2002
Cited by 93 (0 self)
Although the study of clustering is centered around an intuitively compelling goal, it has been very di#cult to develop a unified framework for reasoning about it at a technical level, and profoundly diverse approaches to clustering abound in the research community. Here we suggest a formal perspective on the difficulty in finding such a unification, in the form of an impossibility theorem: for a set of three simple properties, we show that there is no clustering function satisfying all three. Relaxations of these properties expose some of the interesting (and unavoidable) tradeoffs at work in wellstudied clustering techniques such as singlelinkage, sumofpairs, kmeans, and kmedian.
Active SemiSupervision for Pairwise Constrained Clustering
 Proc. 4th SIAM Intl. Conf. on Data Mining (SDM2004
Cited by 90 (10 self)
Semisupervised clustering uses a small amount of supervised data to aid unsupervised learning. One typical approach specifies a limited number of mustlink and cannotlink constraints between pairs of examples. This paper presents a pairwise constrained clustering framework and a new method for actively selecting informative pairwise constraints to get improved clustering performance. The clustering and active learning methods are both easily scalable to large datasets, and can handle very high dimensional data. Experimental and theoretical results confirm that this active querying of pairwise constraints significantly improves the accuracy of clustering when given a relatively small amount of supervision. 1
The Online Median Problem
 In Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science
, 2000
Cited by 75 (2 self)
We introduce a natural variant of the (metric uncapacitated) kmedian problem that we call the online median problem. Whereas the kmedian problem involves optimizing the simultaneous placement of k facilities, the online median problem imposes the following additional constraints: the facilities are placed one at a time; a facility cannot be moved once it is placed, and the total number of facilities to be placed, k, is not known in advance. The objective of an online median algorithm is to minimize the competitive ratio, that is, the worstcase ratio of the cost of an online placement to that of an optimal offline placement. Our main result is a lineartime constantcompetitive algorithm for the online median problem. In addition, we present a related, though substantially simpler, lineartime constantfactor approximation algorithm for the (metric uncapacitated) facility location problem. The latter algorithm is similar in spirit to the recent primaldualbased facility location algorithm of Jain and Vazirani, but our approach is more elementary and yields an improved running time.
EnergyEfficient Algorithms for . . .
, 2007
Cited by 59 (2 self)
We study scheduling problems in batteryoperated computing devices, aiming at schedules with low total energy consumption. While most of the previous work has focused on finding feasible schedules in deadlinebased settings, in this article we are interested in schedules that guarantee good response times. More specifically, our goal is to schedule a sequence of jobs on a variablespeed processor so as to minimize the total cost consisting of the energy consumption and the total flow time of all jobs. We first show that when the amount of work, for any job, may take an arbitrary value, then no online algorithm can achieve a constant competitive ratio. Therefore, most of the article is concerned with unitsize jobs. We devise a deterministic constant competitive online algorithm and show that
Placement Algorithms for Hierarchical Cooperative Caching
, 1999
Cited by 54 (7 self)
Consider a hierarchical network in which each node periodically issues a request for an object drawn from a fixed set of unitsize objects. Suppose further that the following conditions are satisfied: the frequency with which each node accesses each object is known; each node has a cache of known capacity; any cache can be accessed by any node; any request is satisfied by the closest node with a copy of the desired object, at a cost proportional to the distance between the accessing node and the closest copy. In such an environment, it is desirable to fill the available cache space with copies of objects in such a way that the average access cost is minimized. We provide both exact and approximate polynomialtime algorithms for this hierarchical placement problem. Our exact algorithm is based on a reduction to mincost flow, and does not appear to be practical for large problem sizes. Thus we are motivated to search for a faster approximation algorithm. Our main result is a simple constantfactor approximation algorithm for the hierarchical placement problem that admits an efficient distributed implementation.
Cooperative Facility Location Games
 Journal of Algorithms
, 2000
Cited by 50 (1 self)
The location of facilities in order to provide service for customers is a wellstudied problem in the operations research literature. In the basic model, there is a predefined cost for opening a facility and also for connecting a customer to a facility, the goal being to minimize the total cost. Often, both in the case of public facilities (such as libraries, municipal swimming pools, fire stations, ...) and private facilities (such as distribution centers, switching stations, ...), we may want to find a `fair' allocation of the total cost to the customers  this is known as the cost allocation problem. A central question in cooperative game theory is whether the total cost can be allocated to the customers such that no coalition of customers has any incentive to build their own facility or to ask a competitor to service them.
Optimal Allocation of Electronic Content
, 2002
Cited by 48 (3 self)
The delivery of large files toin)((P2fi( users, such as videoon deman orapplication programs to theenM9((92fi nenM9 computers is expected byman to be on of themain tasks ofbroadban communfi(((9 nommun This requires highban2BPP) capacity as well as fastan den2 storage servers. This motivates multimedia service providers to optimize the delivery nlivery as well as the electronfi conctr allocation A hierarchical architecture for thedistribution of multimediacontim was in((PM2fi by Nussbaumer, Patel, Scha#a, an Sterben (INFOCOM 94). They addressed the tradeo#between baneen2 an storagerequiremenN that results from theplacemen of theconNMP serversin the hierarchy tree. Theypresen)N acenPW2fiPW algorithm to compute the best level of the hierarchy for the serverlocation tominMP9 thecombin( cost ofcommun2fiP) an storage. In this work, we solve agenqq case where serverscan be placed atdi#eren levels of the hierarchy. We develop a distributed optimallocation algorithm that requires small nall memory capacityan computationpower. Previous results for related problems forcachin systemdesign are of higher complexity. Previous results for related classic operation research problems are limited tocenBW2fiPP algorithms, basedon lind2 programminfi that are ne easy to conMB) in distributed algorithms.Ingorit toobtain our results, we observed that the use ofdynq)P programmin nprogram leno itself to a distributedimplemened2NW( For the specific problem athan9 we alsoman2PN to fin anPWMBB funMBB2 (agenBB2fiNBNq) of the problem) that simplifies the combin2fiNW operation usedin thedesign of adyn9P( program.
Selfish Caching in Distributed Systems: A GameTheoretic Analysis
 in Proc. ACM Symposium on Principles of Distributed Computing (ACM PODC
, 2004
Cited by 47 (2 self)
We analyze replication of resources by server nodes that act selfishly, using a gametheoretic approach. We refer to this as the selfish caching problem. In our model, nodes incur either cost for replicating resources or cost for access to a remote replica. We show the existence of pure strategy Nash equilibria and investigate the price of anarchy, which is the relative cost of the lack of coordination. The price of anarchy can be high due to undersupply problems, but with certain network topologies it has better bounds. With a payment scheme the game can always implement the social optimum in the best case by giving servers incentive to replicate.