Results 1  10
of
222
Automatic Subspace Clustering of High Dimensional Data
 Data Mining and Knowledge Discovery
, 2005
"... Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, enduser comprehensibility of the results, nonpresumption of any canonical data distribution, and insensitivity to the or ..."
Abstract

Cited by 564 (12 self)
 Add to MetaCart
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, enduser comprehensibility of the results, nonpresumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate clusters in large high dimensional datasets.
Zero Knowledge and the Chromatic Number
 Journal of Computer and System Sciences
, 1996
"... We present a new technique, inspired by zeroknowledge proof systems, for proving lower bounds on approximating the chromatic number of a graph. To illustrate this technique we present simple reductions from max3coloring and max3sat, showing that it is hard to approximate the chromatic number wi ..."
Abstract

Cited by 177 (8 self)
 Add to MetaCart
We present a new technique, inspired by zeroknowledge proof systems, for proving lower bounds on approximating the chromatic number of a graph. To illustrate this technique we present simple reductions from max3coloring and max3sat, showing that it is hard to approximate the chromatic number within \Omega\Gamma N ffi ), for some ffi ? 0. We then apply our technique in conjunction with the probabilistically checkable proofs of Hastad, and show that it is hard to approximate the chromatic number to within\Omega\Gamma N 1\Gammaffl ) for any ffl ? 0, assuming NP 6` ZPP. Here, ZPP denotes the class of languages decidable by a random expected polynomialtime algorithm that makes no errors. Our result matches (up to low order terms) the known gap for approximating the size of the largest independent set. Previous O(N ffi ) gaps for approximating the chromatic number (such as those by Lund and Yannakakis, and by Furer) did not match the gap for independent set, and do not extend...
Efficient probabilistically checkable proofs and applications to approximation
 In Proceedings of STOC93
, 1993
"... 1 ..."
Learning in the Presence of Malicious Errors
 SIAM Journal on Computing
, 1993
"... In this paper we study an extension of the distributionfree model of learning introduced by Valiant [23] (also known as the probably approximately correct or PAC model) that allows the presence of malicious errors in the examples given to a learning algorithm. Such errors are generated by an advers ..."
Abstract

Cited by 167 (12 self)
 Add to MetaCart
In this paper we study an extension of the distributionfree model of learning introduced by Valiant [23] (also known as the probably approximately correct or PAC model) that allows the presence of malicious errors in the examples given to a learning algorithm. Such errors are generated by an adversary with unbounded computational power and access to the entire history of the learning algorithm's computation. Thus, we study a worstcase model of errors. Our results include general methods for bounding the rate of error tolerable by any learning algorithm, efficient algorithms tolerating nontrivial rates of malicious errors, and equivalences between problems of learning with errors and standard combinatorial optimization problems. 1 Introduction In this paper, we study a practical extension to Valiant's distributionfree model of learning: the presence of errors (possibly maliciously generated by an adversary) in the sample data. The distributionfree model typically makes the idealize...
Approximation Algorithms for Disjoint Paths Problems
, 1996
"... The construction of disjoint paths in a network is a basic issue in combinatorial optimization: given a network, and specified pairs of nodes in it, we are interested in finding disjoint paths between as many of these pairs as possible. This leads to a variety of classical NPcomplete problems for w ..."
Abstract

Cited by 139 (0 self)
 Add to MetaCart
The construction of disjoint paths in a network is a basic issue in combinatorial optimization: given a network, and specified pairs of nodes in it, we are interested in finding disjoint paths between as many of these pairs as possible. This leads to a variety of classical NPcomplete problems for which very little is known from the point of view of approximation algorithms. It has recently been brought into focus in work on problems such as VLSI layout and routing in highspeed networks; in these settings, the current lack of understanding of the disjoint paths problem is often an obstacle to the design of practical heuristics.
Compact Routing with Minimum Stretch
 Journal of Algorithms
"... We present the first universal compact routing algorithm with maximum stretch bounded by 3 that uses sublinear space at every vertex. The algorithm uses local routing tables of size O(n 2=3 log 4=3 n) and achieves paths that are most 3 times the length of the shortest path distances for all node ..."
Abstract

Cited by 111 (5 self)
 Add to MetaCart
We present the first universal compact routing algorithm with maximum stretch bounded by 3 that uses sublinear space at every vertex. The algorithm uses local routing tables of size O(n 2=3 log 4=3 n) and achieves paths that are most 3 times the length of the shortest path distances for all nodes in an arbitrary weighted undirected network. This answers an open question of Gavoille and Gengler who showed that any universal compact routing algorithm with maximum stretch strictly less than 3 must use\Omega\Gamma n) local space at some vertex. 1 Introduction Let G = (V; E) with jV j = n be a labeled undirected network. Assuming that a positive cost, or distance is assigned with each edge, the stretch of path p(u; v) from node u to node v is defined as jp(u;v)j jd(u;v)j , where jd(u; v)j is the length of the shortest u \Gamma v path. The approximate allpairs shortest path problem involves a tradeoff of stretch against time short paths with stretch bounded by a constant are com...
ConstantTime Distributed Dominating Set Approximation
 In Proc. of the 22 nd ACM Symposium on the Principles of Distributed Computing (PODC
, 2003
"... Finding a small dominating set is one of the most fundamental problems of traditional graph theory. In this paper, we present a new fully distributed approximation algorithm based on LP relaxation techniques. For an arbitrary parameter k and maximum degree #, our algorithm computes a dominating set ..."
Abstract

Cited by 111 (24 self)
 Add to MetaCart
Finding a small dominating set is one of the most fundamental problems of traditional graph theory. In this paper, we present a new fully distributed approximation algorithm based on LP relaxation techniques. For an arbitrary parameter k and maximum degree #, our algorithm computes a dominating set of expected size O k# log #DSOPT rounds where each node has to send O k messages of size O(log #). This is the first algorithm which achieves a nontrivial approximation ratio in a constant number of rounds.
Greedy Facility Location Algorithms analyzed using Dual Fitting with FactorRevealing LP
 Journal of the ACM
, 2001
"... We present a natural greedy algorithm for the metric uncapacitated facility location problem and use the method of dual fitting to analyze its approximation ratio, which turns out to be 1.861. The running time of our algorithm is O(m log m), where m is the total number of edges in the underlying c ..."
Abstract

Cited by 101 (13 self)
 Add to MetaCart
We present a natural greedy algorithm for the metric uncapacitated facility location problem and use the method of dual fitting to analyze its approximation ratio, which turns out to be 1.861. The running time of our algorithm is O(m log m), where m is the total number of edges in the underlying complete bipartite graph between cities and facilities. We use our algorithm to improve recent results for some variants of the problem, such as the fault tolerant and outlier versions. In addition, we introduce a new variant which can be seen as a special case of the concave cost version of this problem.