Results 1  10
of
37
A Threshold of ln n for Approximating Set Cover
 JOURNAL OF THE ACM
, 1998
"... Given a collection F of subsets of S = f1; : : : ; ng, set cover is the problem of selecting as few as possible subsets from F such that their union covers S, and max kcover is the problem of selecting k subsets from F such that their union has maximum cardinality. Both these problems are NPhar ..."
Abstract

Cited by 658 (5 self)
 Add to MetaCart
Given a collection F of subsets of S = f1; : : : ; ng, set cover is the problem of selecting as few as possible subsets from F such that their union covers S, and max kcover is the problem of selecting k subsets from F such that their union has maximum cardinality. Both these problems are NPhard. We prove that (1 \Gamma o(1)) ln n is a threshold below which set cover cannot be approximated efficiently, unless NP has slightly superpolynomial time algorithms. This closes the gap (up to low order terms) between the ratio of approximation achievable by the greedy algorithm (which is (1 \Gamma o(1)) ln n), and previous results of Lund and Yannakakis, that showed hardness of approximation within a ratio of (log 2 n)=2 ' 0:72 lnn. For max kcover we show an approximation threshold of (1 \Gamma 1=e) (up to low order terms), under the assumption that P != NP .
A constantfactor approximation algorithm for the kmedian problem
 In Proceedings of the 31st Annual ACM Symposium on Theory of Computing
, 1999
"... We present the first constantfactor approximation algorithm for the metric kmedian problem. The kmedian problem is one of the most wellstudied clustering problems, i.e., those problems in which the aim is to partition a given set of points into clusters so that the points within a cluster are re ..."
Abstract

Cited by 220 (13 self)
 Add to MetaCart
We present the first constantfactor approximation algorithm for the metric kmedian problem. The kmedian problem is one of the most wellstudied clustering problems, i.e., those problems in which the aim is to partition a given set of points into clusters so that the points within a cluster are relatively close with respect to some measure. For the metric kmedian problem, we are given n points in a metric space. We select k of these to be cluster centers, and then assign each point to its closest selected center. If point j is assigned to a center i, the cost incurred is proportional to the distance between i and j. The goal is to select the k centers that minimize the sum of the assignment costs. We give a 6 2 3approximation algorithm for this problem. This improves upon the best previously known result of O(log k log log k), which was obtained by refining and derandomizing a randomized O(log n log log n)approximation algorithm of Bartal. 1
Improved Combinatorial Algorithms for the Facility Location and kMedian Problems
 In Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science
, 1999
"... We present improved combinatorial approximation algorithms for the uncapacitated facility location and kmedian problems. Two central ideas in most of our results are cost scaling and greedy improvement. We present a simple greedy local search algorithm which achieves an approximation ratio of 2:414 ..."
Abstract

Cited by 209 (13 self)
 Add to MetaCart
We present improved combinatorial approximation algorithms for the uncapacitated facility location and kmedian problems. Two central ideas in most of our results are cost scaling and greedy improvement. We present a simple greedy local search algorithm which achieves an approximation ratio of 2:414 + in ~ O(n 2 =) time. This also yields a bicriteria approximation tradeoff of (1 +; 1+ 2=) for facility cost versus service cost which is better than previously known tradeoffs and close to the best possible. Combining greedy improvement and cost scaling with a recent primal dual algorithm for facility location due to Jain and Vazirani, we get an approximation ratio of 1.853 in ~ O(n 3 ) time. This is already very close to the approximation guarantee of the best known algorithm which is LPbased. Further, combined with the best known LPbased algorithm for facility location, we get a very slight improvement in the approximation factor for facility location, achieving 1.728....
Achieving Anonymity via Clustering
 In PODS
, 2006
"... Publishing data for analysis from a table containing personal records, while maintaining individual privacy, is a problem of increasing importance today. The traditional approach of deidentifying records is to remove identifying fields such as social security number, name etc. However, recent resea ..."
Abstract

Cited by 83 (2 self)
 Add to MetaCart
Publishing data for analysis from a table containing personal records, while maintaining individual privacy, is a problem of increasing importance today. The traditional approach of deidentifying records is to remove identifying fields such as social security number, name etc. However, recent research has shown that a large fraction of the US population can be identified using nonkey attributes (called quasiidentifiers) such as date of birth, gender, and zip code [15]. Sweeney [16] proposed the kanonymity model for privacy where nonkey attributes that leak information are suppressed or generalized so that, for every record in the modified table, there are at least k−1 other records having exactly the same values for quasiidentifiers. We propose a new method for anonymizing data records, where quasiidentifiers of data records are first clustered and then cluster centers are published. To ensure privacy of the data records, we impose the constraint
Better Streaming Algorithms for Clustering Problems
, 2003
"... We study clustering problems in the streaming model, where the goal is to cluster a set of points by making one pass (or a few passes) over the data using a small amount of storage space. Our main result is a randomized algorithm for the k–Median problem which produces a constant factor approximatio ..."
Abstract

Cited by 77 (1 self)
 Add to MetaCart
We study clustering problems in the streaming model, where the goal is to cluster a set of points by making one pass (or a few passes) over the data using a small amount of storage space. Our main result is a randomized algorithm for the k–Median problem which produces a constant factor approximation in one pass using storage space O(kpolylog n). This is a significant improvement of the previous best algorithm which yielded a 2 O(1/ɛ) approximation using O(n ɛ)space. Next we give a streaming algorithm for the k–Median problem with an arbitrary distance function. We also study algorithms for clustering problems with outliers in the streaming model. Here, we give bicriterion guarantees, producing constant factor approximations by increasing the allowed fraction of outliers slightly.
Clustering to Minimize the Sum of Cluster Diameters
, 2001
"... We study the problem of clustering points in a metric space so as to minimize the sumof cluster diameters or the sum of cluster radii. Significantly improving on previous results, we present ..."
Abstract

Cited by 34 (2 self)
 Add to MetaCart
We study the problem of clustering points in a metric space so as to minimize the sumof cluster diameters or the sum of cluster radii. Significantly improving on previous results, we present
Computing NearOptimal Solutions to Combinatorial Optimization Problems
 IN COMBINATORIAL OPTIMIZATION, DIMACS SERIES IN DISCRETE MATHEMATICS AND THEORETICAL COMPUTER SCIENCE
, 1995
"... In the past few years, there has been significant progress in our understanding of the extent to which nearoptimal solutions can be efficiently computed for NPhard combinatorial optimization problems. This paper surveys these recent developments, while concentrating on the advances made in the ..."
Abstract

Cited by 32 (0 self)
 Add to MetaCart
In the past few years, there has been significant progress in our understanding of the extent to which nearoptimal solutions can be efficiently computed for NPhard combinatorial optimization problems. This paper surveys these recent developments, while concentrating on the advances made in the design and analysis of approximation algorithms, and in particular, on those results that rely on linear programming and its generalizations.
TopologyInvariant Similarity of Nonrigid Shapes
, 2009
"... This paper explores the problem of similarity criteria between nonrigid shapes. Broadly speaking, such criteria are divided into intrinsic and extrinsic, the first referring to the metric structure of the object and the latter to how it is laid out in the Euclidean space. Both criteria have their ..."
Abstract

Cited by 21 (3 self)
 Add to MetaCart
This paper explores the problem of similarity criteria between nonrigid shapes. Broadly speaking, such criteria are divided into intrinsic and extrinsic, the first referring to the metric structure of the object and the latter to how it is laid out in the Euclidean space. Both criteria have their advantages and disadvantages: extrinsic similarity is sensitive to nonrigid deformations, while intrinsic similarity is sensitive to topological noise. In this paper, we approach the problem from the perspective of metric geometry. We show that by unifying the extrinsic and intrinsic similarity criteria, it is possible to obtain a stronger topologyinvariant similarity, suitable for comparing deformed shapes with different topology. We construct this new joint criterion as a tradeoff between the extrinsic and intrinsic similarity and use it as a setvalued distance. Numerical results demonstrate the efficiency of our approach in cases where using either extrinsic or intrinsic criteria alone would fail.
Detecting Anomalous Access Patterns in Relational Databases
"... Abstract A considerable effort has been recently devoted ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
Abstract A considerable effort has been recently devoted