Results 1  10
of
322
A Threshold of ln n for Approximating Set Cover
 JOURNAL OF THE ACM
, 1998
"... Given a collection F of subsets of S = f1; : : : ; ng, set cover is the problem of selecting as few as possible subsets from F such that their union covers S, and max kcover is the problem of selecting k subsets from F such that their union has maximum cardinality. Both these problems are NPhar ..."
Abstract

Cited by 776 (5 self)
 Add to MetaCart
Given a collection F of subsets of S = f1; : : : ; ng, set cover is the problem of selecting as few as possible subsets from F such that their union covers S, and max kcover is the problem of selecting k subsets from F such that their union has maximum cardinality. Both these problems are NPhard. We prove that (1 \Gamma o(1)) ln n is a threshold below which set cover cannot be approximated efficiently, unless NP has slightly superpolynomial time algorithms. This closes the gap (up to low order terms) between the ratio of approximation achievable by the greedy algorithm (which is (1 \Gamma o(1)) ln n), and previous results of Lund and Yannakakis, that showed hardness of approximation within a ratio of (log 2 n)=2 ' 0:72 lnn. For max kcover we show an approximation threshold of (1 \Gamma 1=e) (up to low order terms), under the assumption that P != NP .
Automatic Subspace Clustering of High Dimensional Data
 Data Mining and Knowledge Discovery
, 2005
"... Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, enduser comprehensibility of the results, nonpresumption of any canonical data distribution, and insensitivity to the or ..."
Abstract

Cited by 724 (12 self)
 Add to MetaCart
(Show Context)
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, enduser comprehensibility of the results, nonpresumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate clusters in large high dimensional datasets.
Zero Knowledge and the Chromatic Number
 Journal of Computer and System Sciences
, 1996
"... We present a new technique, inspired by zeroknowledge proof systems, for proving lower bounds on approximating the chromatic number of a graph. To illustrate this technique we present simple reductions from max3coloring and max3sat, showing that it is hard to approximate the chromatic number wi ..."
Abstract

Cited by 196 (6 self)
 Add to MetaCart
(Show Context)
We present a new technique, inspired by zeroknowledge proof systems, for proving lower bounds on approximating the chromatic number of a graph. To illustrate this technique we present simple reductions from max3coloring and max3sat, showing that it is hard to approximate the chromatic number within \Omega\Gamma N ffi ), for some ffi ? 0. We then apply our technique in conjunction with the probabilistically checkable proofs of Hastad, and show that it is hard to approximate the chromatic number to within\Omega\Gamma N 1\Gammaffl ) for any ffl ? 0, assuming NP 6` ZPP. Here, ZPP denotes the class of languages decidable by a random expected polynomialtime algorithm that makes no errors. Our result matches (up to low order terms) the known gap for approximating the size of the largest independent set. Previous O(N ffi ) gaps for approximating the chromatic number (such as those by Lund and Yannakakis, and by Furer) did not match the gap for independent set, and do not extend...
Learning in the Presence of Malicious Errors
 SIAM Journal on Computing
, 1993
"... In this paper we study an extension of the distributionfree model of learning introduced by Valiant [23] (also known as the probably approximately correct or PAC model) that allows the presence of malicious errors in the examples given to a learning algorithm. Such errors are generated by an advers ..."
Abstract

Cited by 184 (12 self)
 Add to MetaCart
(Show Context)
In this paper we study an extension of the distributionfree model of learning introduced by Valiant [23] (also known as the probably approximately correct or PAC model) that allows the presence of malicious errors in the examples given to a learning algorithm. Such errors are generated by an adversary with unbounded computational power and access to the entire history of the learning algorithm's computation. Thus, we study a worstcase model of errors. Our results include general methods for bounding the rate of error tolerable by any learning algorithm, efficient algorithms tolerating nontrivial rates of malicious errors, and equivalences between problems of learning with errors and standard combinatorial optimization problems. 1 Introduction In this paper, we study a practical extension to Valiant's distributionfree model of learning: the presence of errors (possibly maliciously generated by an adversary) in the sample data. The distributionfree model typically makes the idealize...
Approximation Algorithms for Disjoint Paths Problems
, 1996
"... The construction of disjoint paths in a network is a basic issue in combinatorial optimization: given a network, and specified pairs of nodes in it, we are interested in finding disjoint paths between as many of these pairs as possible. This leads to a variety of classical NPcomplete problems for w ..."
Abstract

Cited by 166 (0 self)
 Add to MetaCart
The construction of disjoint paths in a network is a basic issue in combinatorial optimization: given a network, and specified pairs of nodes in it, we are interested in finding disjoint paths between as many of these pairs as possible. This leads to a variety of classical NPcomplete problems for which very little is known from the point of view of approximation algorithms. It has recently been brought into focus in work on problems such as VLSI layout and routing in highspeed networks; in these settings, the current lack of understanding of the disjoint paths problem is often an obstacle to the design of practical heuristics.
Reachability and Distance Queries via 2Hop Labels
, 2002
"... Reachability and distance queries in graphs are fundamental to numerous applications, ranging from geographic navigation systems to Internet routing. Some of these applications involve huge graphs and yet require fast query answering. We propose a new data structure for representing all distances in ..."
Abstract

Cited by 148 (1 self)
 Add to MetaCart
(Show Context)
Reachability and distance queries in graphs are fundamental to numerous applications, ranging from geographic navigation systems to Internet routing. Some of these applications involve huge graphs and yet require fast query answering. We propose a new data structure for representing all distances in a graph. The data structure is distributed in the sense that it may be viewed as assigning labels to the vertices, such that a query involving vertices u and v may be answered using only the labels of u and v.
Greedy Facility Location Algorithms analyzed using Dual Fitting with FactorRevealing LP
 Journal of the ACM
, 2001
"... We present a natural greedy algorithm for the metric uncapacitated facility location problem and use the method of dual fitting to analyze its approximation ratio, which turns out to be 1.861. The running time of our algorithm is O(m log m), where m is the total number of edges in the underlying c ..."
Abstract

Cited by 146 (12 self)
 Add to MetaCart
(Show Context)
We present a natural greedy algorithm for the metric uncapacitated facility location problem and use the method of dual fitting to analyze its approximation ratio, which turns out to be 1.861. The running time of our algorithm is O(m log m), where m is the total number of edges in the underlying complete bipartite graph between cities and facilities. We use our algorithm to improve recent results for some variants of the problem, such as the fault tolerant and outlier versions. In addition, we introduce a new variant which can be seen as a special case of the concave cost version of this problem.
A nearly bestpossible approximation algorithm for nodeweighted Steiner trees
, 1993
"... We give the first approximation algorithm for the nodeweighted Steiner tree problem. Its performance guarantee is within a constant factor of the best possible unless ~ P ' NP . Our algorithm generalizes to handle other network design problems. ..."
Abstract

Cited by 138 (9 self)
 Add to MetaCart
We give the first approximation algorithm for the nodeweighted Steiner tree problem. Its performance guarantee is within a constant factor of the best possible unless ~ P ' NP . Our algorithm generalizes to handle other network design problems.
ConstantTime Distributed Dominating Set Approximation
 In Proc. of the 22 nd ACM Symposium on the Principles of Distributed Computing (PODC
, 2003
"... Finding a small dominating set is one of the most fundamental problems of traditional graph theory. In this paper, we present a new fully distributed approximation algorithm based on LP relaxation techniques. For an arbitrary parameter k and maximum degree #, our algorithm computes a dominating set ..."
Abstract

Cited by 134 (22 self)
 Add to MetaCart
(Show Context)
Finding a small dominating set is one of the most fundamental problems of traditional graph theory. In this paper, we present a new fully distributed approximation algorithm based on LP relaxation techniques. For an arbitrary parameter k and maximum degree #, our algorithm computes a dominating set of expected size O k# log #DSOPT rounds where each node has to send O k messages of size O(log #). This is the first algorithm which achieves a nontrivial approximation ratio in a constant number of rounds.
A Tight Analysis of the Greedy Algorithm for Set Cover
, 1995
"... We establish significantly improved bounds on the performance of the greedy algorithm for approximating set cover. In particular, we provide the first substantial improvement of the 20 year old classical harmonic upper bound, H(m), of Johnson, Lovasz, and Chv'atal, by showing that the performan ..."
Abstract

Cited by 122 (0 self)
 Add to MetaCart
We establish significantly improved bounds on the performance of the greedy algorithm for approximating set cover. In particular, we provide the first substantial improvement of the 20 year old classical harmonic upper bound, H(m), of Johnson, Lovasz, and Chv'atal, by showing that the performance ratio of the greedy algorithm is, in fact, exactly ln m \Gamma ln ln m+ \Theta(1), where m is the size of the ground set. The difference between the upper and lower bounds turns out to be less than 1:1. This provides the first tight analysis of the greedy algorithm, as well as the first upper bound that lies below H(m) by a function going to infinity with m. We also show that the approximation guarantee for the greedy algorithm is better than the guarantee recently established by Srinivasan for the randomized rounding technique, thus improving the bounds on the integrality gap. Our improvements result from a new approach which might be generally useful for attacking other similar problems. ...