Results 1  10
of
88
Approximation algorithms for metric facility location and kmedian problems using the . . .
"... ..."
Incremental Clustering and Dynamic Information Retrieval
, 1997
"... Motivated by applications such as document and image classification in information retrieval, we consider the problem of clustering dynamic point sets in a metric space. We propose a model called incremental clustering which is based on a careful analysis of the requirements of the information retri ..."
Abstract

Cited by 188 (4 self)
 Add to MetaCart
(Show Context)
Motivated by applications such as document and image classification in information retrieval, we consider the problem of clustering dynamic point sets in a metric space. We propose a model called incremental clustering which is based on a careful analysis of the requirements of the information retrieval application, and which should also be useful in other applications. The goal is to efficiently maintain clusters of small diameter as new points are inserted. We analyze several natural greedy algorithms and demonstrate that they perform poorly. We propose new deterministic and randomized incremental clustering algorithms which have a provably good performance. We complement our positive results with lower bounds on the performance of incremental algorithms. Finally, we consider the dual clustering problem where the clusters are of fixed diameter, and the goal is to minimize the number of clusters.
Greedy Facility Location Algorithms analyzed using Dual Fitting with FactorRevealing LP
 Journal of the ACM
, 2001
"... We present a natural greedy algorithm for the metric uncapacitated facility location problem and use the method of dual fitting to analyze its approximation ratio, which turns out to be 1.861. The running time of our algorithm is O(m log m), where m is the total number of edges in the underlying c ..."
Abstract

Cited by 149 (12 self)
 Add to MetaCart
(Show Context)
We present a natural greedy algorithm for the metric uncapacitated facility location problem and use the method of dual fitting to analyze its approximation ratio, which turns out to be 1.861. The running time of our algorithm is O(m log m), where m is the total number of edges in the underlying complete bipartite graph between cities and facilities. We use our algorithm to improve recent results for some variants of the problem, such as the fault tolerant and outlier versions. In addition, we introduce a new variant which can be seen as a special case of the concave cost version of this problem.
Better Streaming Algorithms for Clustering Problems
 In Proc. of 35th ACM Symposium on Theory of Computing (STOC
, 2003
"... We study cluster ng pr blems in the str aming model, wher e the goal is to cluster a set of points by making one pass (or a few passes) over the data using a small amount of storSD space.Our mainr esult is a r ndomized algor ithm for kMedian prE lem which p duces a constant factor a ..."
Abstract

Cited by 91 (1 self)
 Add to MetaCart
We study cluster ng pr blems in the str aming model, wher e the goal is to cluster a set of points by making one pass (or a few passes) over the data using a small amount of storSD space.Our mainr esult is a r ndomized algor ithm for kMedian prE lem which p duces a constant factor appr oximation in one pass using storR4 space O(kpolylog n). This is a significant imp r vement of the prS ious best algor5 hm which yielded a 2 appr ximation using O(n )space.
Fairness Measures for Resource Allocation
 Proceedings of 41st IEEE Symposium on Foundations of Computer Science
, 2000
"... In many optimization problems, one seeks to allocate a limited set of resources to a set of individuals with demands. Thus, such allocations can naturally be viewed as vectors, with one coordinate representing each individual. Motivated by work in network routing and bandwidth assignment, we conside ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
(Show Context)
In many optimization problems, one seeks to allocate a limited set of resources to a set of individuals with demands. Thus, such allocations can naturally be viewed as vectors, with one coordinate representing each individual. Motivated by work in network routing and bandwidth assignment, we consider the problem of producing solutions that simultaneously approximate all feasible allocations in a coordinatewise sense. This is a very strong type of "global" approximation guarantee, and we explore its consequences in a range of discrete optimization problems, including facility location, scheduling, and bandwidth assignment in networks. A fundamental issue  one not encountered in the traditional design of approximation algorithms  is that good approximations in this global sense need not exist for every problem instance; there is no a priori reason why there should be an allocation that simultaneously approximates all others. As a result, the existential questions concerning such g...
Approximation algorithms for clustering uncertain data
 in PODS Conference
, 2008
"... There is an increasing quantity of data with uncertainty arising from applications such as sensor network measurements, record linkage, and as output of mining algorithms. This uncertainty is typically formalized as probability density functions over tuple values. Beyond storing and processing such ..."
Abstract

Cited by 37 (1 self)
 Add to MetaCart
(Show Context)
There is an increasing quantity of data with uncertainty arising from applications such as sensor network measurements, record linkage, and as output of mining algorithms. This uncertainty is typically formalized as probability density functions over tuple values. Beyond storing and processing such data in a DBMS, it is necessary to perform other data analysis tasks such as data mining. We study the core mining problem of clustering on uncertain data, and define appropriate natural generalizations of standard clustering optimization criteria. Two variations arise, depending on whether a point is automatically associated with its optimal center, or whether it must be assigned to a fixed cluster no matter where it is actually located. For uncertain versions of kmeans and kmedian, we show reductions to their corresponding weighted versions on data with no uncertainties. These are simple in the unassigned case, but require some care for the assigned version. Our most interesting results are for uncertain kcenter, which generalizes both traditional kcenter and kmedian objectives. We show a variety of bicriteria approximation algorithms. One picks O(kɛ −1 log 2 n) centers and achieves a (1 + ɛ) approximation to the best uncertain kcenters. Another picks 2k centers and achieves a constant factor approximation. Collectively, these results are the first known guaranteed approximation algorithms for the problems of clustering uncertain data.
Shape Fitting with Outliers
 SIAM J. Comput
, 2003
"... we present an algorithm that "approximates the extent between the top and bottom k levels of the arrangement of H in time O(n+(k=") ), where c is a constant depending on d. The algorithm relies on computing a subset of H of size O(k=" ), in near linear time, such that the k ..."
Abstract

Cited by 34 (10 self)
 Add to MetaCart
we present an algorithm that "approximates the extent between the top and bottom k levels of the arrangement of H in time O(n+(k=") ), where c is a constant depending on d. The algorithm relies on computing a subset of H of size O(k=" ), in near linear time, such that the klevel of the arrangement of the subset approximates that of the original arrangement. Using this algorithm, we propose ecient approximation algorithms for shape tting with outliers for various shapes. This is the rst algorithms to handle outliers eciently for the shape tting problems considered.
Asymmetric kcenter is log ∗ nhard to Approximate
 In Proc. SODA
, 2005
"... In the Asymmetric kCenter problem, the input is an integer k and a complete digraph over n points together with a distance function obeying the directed triangle inequality. The goal is to choose a set of k points to serve as centers and to assign all the points to the centers, so that the maximum ..."
Abstract

Cited by 34 (4 self)
 Add to MetaCart
(Show Context)
In the Asymmetric kCenter problem, the input is an integer k and a complete digraph over n points together with a distance function obeying the directed triangle inequality. The goal is to choose a set of k points to serve as centers and to assign all the points to the centers, so that the maximum distance of any point to its center is as small as possible. We show that the Asymmetric kCenter problem is hard to approximate up to a factor of log ∗ n − Θ(1) unless NP ⊆ DTIME(n log log n). Since an O(log ∗ n)approximation algorithm is known for this problem, this essentially resolves the approximability of this problem. This is the first natural problem whose approximability threshold does not polynomially relate to the known approximation classes. We also resolve the approximability threshold of the metric kCenter problem with costs.