Results 1 - 10
of
180
Automatic Subspace Clustering of High Dimensional Data
- Data Mining and Knowledge Discovery
, 2005
"... Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the or ..."
Abstract
-
Cited by 461 (11 self)
- Add to MetaCart
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate clusters in large high dimensional datasets.
Efficient probabilistically checkable proofs and applications to approximation
- In Proceedings of STOC93
, 1993
"... 1 ..."
Learning in the Presence of Malicious Errors
- SIAM Journal on Computing
, 1993
"... In this paper we study an extension of the distribution-free model of learning introduced by Valiant [23] (also known as the probably approximately correct or PAC model) that allows the presence of malicious errors in the examples given to a learning algorithm. Such errors are generated by an advers ..."
Abstract
-
Cited by 158 (12 self)
- Add to MetaCart
In this paper we study an extension of the distribution-free model of learning introduced by Valiant [23] (also known as the probably approximately correct or PAC model) that allows the presence of malicious errors in the examples given to a learning algorithm. Such errors are generated by an adversary with unbounded computational power and access to the entire history of the learning algorithm's computation. Thus, we study a worst-case model of errors. Our results include general methods for bounding the rate of error tolerable by any learning algorithm, efficient algorithms tolerating nontrivial rates of malicious errors, and equivalences between problems of learning with errors and standard combinatorial optimization problems. 1 Introduction In this paper, we study a practical extension to Valiant's distribution-free model of learning: the presence of errors (possibly maliciously generated by an adversary) in the sample data. The distribution-free model typically makes the idealize...
Zero Knowledge and the Chromatic Number
- Journal of Computer and System Sciences
, 1996
"... We present a new technique, inspired by zero-knowledge proof systems, for proving lower bounds on approximating the chromatic number of a graph. To illustrate this technique we present simple reductions from max-3-coloring and max-3-sat, showing that it is hard to approximate the chromatic number wi ..."
Abstract
-
Cited by 152 (7 self)
- Add to MetaCart
We present a new technique, inspired by zero-knowledge proof systems, for proving lower bounds on approximating the chromatic number of a graph. To illustrate this technique we present simple reductions from max-3-coloring and max-3-sat, showing that it is hard to approximate the chromatic number within \Omega\Gamma N ffi ), for some ffi ? 0. We then apply our technique in conjunction with the probabilistically checkable proofs of Hastad, and show that it is hard to approximate the chromatic number to within\Omega\Gamma N 1\Gammaffl ) for any ffl ? 0, assuming NP 6` ZPP. Here, ZPP denotes the class of languages decidable by a random expected polynomial-time algorithm that makes no errors. Our result matches (up to low order terms) the known gap for approximating the size of the largest independent set. Previous O(N ffi ) gaps for approximating the chromatic number (such as those by Lund and Yannakakis, and by Furer) did not match the gap for independent set, and do not extend...
Approximation Algorithms for Disjoint Paths Problems
, 1996
"... The construction of disjoint paths in a network is a basic issue in combinatorial optimization: given a network, and specified pairs of nodes in it, we are interested in finding disjoint paths between as many of these pairs as possible. This leads to a variety of classical NP-complete problems for w ..."
Abstract
-
Cited by 122 (0 self)
- Add to MetaCart
The construction of disjoint paths in a network is a basic issue in combinatorial optimization: given a network, and specified pairs of nodes in it, we are interested in finding disjoint paths between as many of these pairs as possible. This leads to a variety of classical NP-complete problems for which very little is known from the point of view of approximation algorithms. It has recently been brought into focus in work on problems such as VLSI layout and routing in high-speed networks; in these settings, the current lack of understanding of the disjoint paths problem is often an obstacle to the design of practical heuristics.
Constant-Time Distributed Dominating Set Approximation
- In Proc. of the 22 nd ACM Symposium on the Principles of Distributed Computing (PODC
, 2003
"... Finding a small dominating set is one of the most fundamental problems of traditional graph theory. In this paper, we present a new fully distributed approximation algorithm based on LP relaxation techniques. For an arbitrary parameter k and maximum degree #, our algorithm computes a dominating set ..."
Abstract
-
Cited by 100 (23 self)
- Add to MetaCart
Finding a small dominating set is one of the most fundamental problems of traditional graph theory. In this paper, we present a new fully distributed approximation algorithm based on LP relaxation techniques. For an arbitrary parameter k and maximum degree #, our algorithm computes a dominating set of expected size O k# log #|DSOPT rounds where each node has to send O k messages of size O(log #). This is the first algorithm which achieves a non-trivial approximation ratio in a constant number of rounds.
Compact Routing with Minimum Stretch
- Journal of Algorithms
"... We present the first universal compact routing algorithm with maximum stretch bounded by 3 that uses sublinear space at every vertex. The algorithm uses local routing tables of size O(n 2=3 log 4=3 n) and achieves paths that are most 3 times the length of the shortest path distances for all node ..."
Abstract
-
Cited by 90 (5 self)
- Add to MetaCart
We present the first universal compact routing algorithm with maximum stretch bounded by 3 that uses sublinear space at every vertex. The algorithm uses local routing tables of size O(n 2=3 log 4=3 n) and achieves paths that are most 3 times the length of the shortest path distances for all nodes in an arbitrary weighted undirected network. This answers an open question of Gavoille and Gengler who showed that any universal compact routing algorithm with maximum stretch strictly less than 3 must use\Omega\Gamma n) local space at some vertex. 1 Introduction Let G = (V; E) with jV j = n be a labeled undirected network. Assuming that a positive cost, or distance is assigned with each edge, the stretch of path p(u; v) from node u to node v is defined as jp(u;v)j jd(u;v)j , where jd(u; v)j is the length of the shortest u \Gamma v path. The approximate all-pairs shortest path problem involves a tradeoff of stretch against time-- short paths with stretch bounded by a constant are com...
Greedy Facility Location Algorithms analyzed using Dual Fitting with Factor-Revealing LP
- Journal of the ACM
, 2001
"... We present a natural greedy algorithm for the metric uncapacitated facility location problem and use the method of dual fitting to analyze its approximation ratio, which turns out to be 1.861. The running time of our algorithm is O(m log m), where m is the total number of edges in the underlying c ..."
Abstract
-
Cited by 83 (12 self)
- Add to MetaCart
We present a natural greedy algorithm for the metric uncapacitated facility location problem and use the method of dual fitting to analyze its approximation ratio, which turns out to be 1.861. The running time of our algorithm is O(m log m), where m is the total number of edges in the underlying complete bipartite graph between cities and facilities. We use our algorithm to improve recent results for some variants of the problem, such as the fault tolerant and outlier versions. In addition, we introduce a new variant which can be seen as a special case of the concave cost version of this problem.

