Results 1  10
of
57
Approximation Algorithms for Projective Clustering
 Proceedings of the ACM SIGMOD International Conference on Management of data, Philadelphia
, 2000
"... We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyperstrips (resp. hypercylinders) so that the maximum width of a hyperstrip (resp., the maximum diameter of a hypercylinder) is minimized. Let w ..."
Abstract

Cited by 251 (21 self)
 Add to MetaCart
We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyperstrips (resp. hypercylinders) so that the maximum width of a hyperstrip (resp., the maximum diameter of a hypercylinder) is minimized. Let w be the smallest value so that S can be covered by k hyperstrips (resp. hypercylinders), each of width (resp. diameter) at most w : In the plane, the two problems are equivalent. It is NPHard to compute k planar strips of width even at most Cw ; for any constant C ? 0 [50]. This paper contains four main results related to projective clustering: (i) For d = 2, we present a randomized algorithm that computes O(k log k) strips of width at most 6w that cover S. Its expected running time is O(nk 2 log 4 n) if k 2 log k n; it also works for larger values of k, but then the expected running time is O(n 2=3 k 8=3 log 4 n). We also propose another algorithm that computes a c...
A Minmax Cut Algorithm for Graph Partitioning and Data Clustering
, 2001
"... An important application of graph partitioning is data clustering using a graph model  the pairwise similarities between all data objects form a weighted graph adjacency matrix that contains all necessary information for clustering. Here we propose a new algorithm for graph partition with an object ..."
Abstract

Cited by 158 (12 self)
 Add to MetaCart
An important application of graph partitioning is data clustering using a graph model  the pairwise similarities between all data objects form a weighted graph adjacency matrix that contains all necessary information for clustering. Here we propose a new algorithm for graph partition with an objective function that follows the minmax clustering principle. The relaxed version of the optimization of the minmax cut objective function leads to the Fiedler vector in spectral graph partition. Theoretical analyses of minmax cut indicate that it leads to balanced partitions, and lower bonds are derived. The minmax cut algorithm is tested on newsgroup datasets and is found to outperform other current popular partitioning/clustering methods. The linkagebased re nements in the algorithm further improve the quality of clustering substantially. We also demonstrate that the linearized search order based on linkage di erential is better than that based on the Fiedler vector, providing another e ective partition method.
Approximate clustering via coresets
 In Proc. 34th Annu. ACM Sympos. Theory Comput
, 2002
"... In this paper, we show that for several clustering problems one can extract a small set of points, so that using those coresets enable us to perform approximate clustering efficiently. The surprising property of those coresets is that their size is independent of the dimension. Using those, we pre ..."
Abstract

Cited by 112 (14 self)
 Add to MetaCart
In this paper, we show that for several clustering problems one can extract a small set of points, so that using those coresets enable us to perform approximate clustering efficiently. The surprising property of those coresets is that their size is independent of the dimension. Using those, we present a ¡ 1 ¢ ε £approximation algorithms for the kcenter clustering and kmedian clustering problems in Euclidean space. The running time of the new algorithms has linear or near linear dependency on the number of points and the dimension, and exponential dependency on 1 ¤ ε and k. As such, our results are a substantial improvement over what was previously known. We also present some other clustering results including ¡ 1 ¢ ε £approximate 1cylinder clustering, and kcenter clustering with outliers. 1
A local search approximation algorithm for kmeans clustering
, 2004
"... In kmeans clustering we are given a set of n data points in ddimensional space ℜd and an integer k, and the problem is to determine a set of k points in ℜd, called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomialtime algorithms are kno ..."
Abstract

Cited by 72 (1 self)
 Add to MetaCart
In kmeans clustering we are given a set of n data points in ddimensional space ℜd and an integer k, and the problem is to determine a set of k points in ℜd, called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomialtime algorithms are known for this problem. Although asymptotically efficient approximation algorithms exist, these algorithms are not practical due to the very high constant factors involved. There are many heuristics that are used in practice, but we know of no bounds on their performance. We consider the question of whether there exists a simple and practical approximation algorithm for kmeans clustering. We present a local improvement heuristic based on swapping centers in and out. We prove that this yields a (9 + ε)approximation algorithm. We present an example showing that any approach based on performing a fixed number of swaps achieves an approximation factor of at least (9 − ε) in all sufficiently high dimensions. Thus, our approximation factor is almost tight for algorithms based on performing a fixed number of swaps. To establish the practical value of the heuristic, we present an empirical study that shows that, when combined with
Geometric approximation via coresets
 COMBINATORIAL AND COMPUTATIONAL GEOMETRY, MSRI
, 2005
"... The paradigm of coresets has recently emerged as a powerful tool for efficiently approximating various extent measures of a point set P. Using this paradigm, one quickly computes a small subset Q of P, called a coreset, that approximates the original set P and and then solves the problem on Q usin ..."
Abstract

Cited by 62 (8 self)
 Add to MetaCart
The paradigm of coresets has recently emerged as a powerful tool for efficiently approximating various extent measures of a point set P. Using this paradigm, one quickly computes a small subset Q of P, called a coreset, that approximates the original set P and and then solves the problem on Q using a relatively inefficient algorithm. The solution for Q is then translated to an approximate solution to the original point set P. This paper describes the ways in which this paradigm has been successfully applied to various optimization and extent measure problems.
Polynomial Time Approximation Schemes for Geometric kClustering
 J. OF THE ACM
, 2001
"... The JohnsonLindenstrauss lemma states that n points in a high dimensional Hilbert space can be embedded with small distortion of the distances into an O(log n) dimensional space by applying a random linear transformation. We show that similar (though weaker) properties hold for certain random linea ..."
Abstract

Cited by 31 (5 self)
 Add to MetaCart
The JohnsonLindenstrauss lemma states that n points in a high dimensional Hilbert space can be embedded with small distortion of the distances into an O(log n) dimensional space by applying a random linear transformation. We show that similar (though weaker) properties hold for certain random linear transformations over the Hamming cube. We use these transformations to solve NPhard clustering problems in the cube as well as in geometric settings. More specifically, we address the following clustering problem. Given n points in a larger set (for example, R^d) endowed with a distance function (for example, L² distance), we would like to partition the data set into k disjoint clusters, each with a "cluster center", so as to minimize the sum over all data points of the distance between the point and the center of the cluster containing the point. The problem is provably NPhard in some high dimensional geometric settings, even for k = 2. We give polynomial time approximation schemes for this problem in several settings, including the binary cube {0, 1}^d with Hamming distance, and R^d either with L¹ distance, or with L² distance, or with the square of L&sup2; distance. In all these settings, the best previous results were constant factor approximation guarantees. We note that our problem is similar in flavor to the kmedian problem (and the related facility location problem), which has been considered in graphtheoretic and fixed dimensional geometric settings, where it becomes hard when k is part of the input. In contrast, we study the problem when k is fixed, but the dimension is part of the input.
Fixedparameter algorithms for the (k, r)center in planar graphs and map graphs
 ACM TRANSACTIONS ON ALGORITHMS
, 2003
"... ..."
Parameterized complexity and approximation algorithms
 Comput. J
, 2006
"... Approximation algorithms and parameterized complexity are usually considered to be two separate ways of dealing with hard algorithmic problems. In this paper, our aim is to investigate how these two fields can be combined to achieve better algorithms than what any of the two theories could offer. We ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
Approximation algorithms and parameterized complexity are usually considered to be two separate ways of dealing with hard algorithmic problems. In this paper, our aim is to investigate how these two fields can be combined to achieve better algorithms than what any of the two theories could offer. We discuss the different ways parameterized complexity can be extended to approximation algorithms, survey results of this type and propose directions for future research. 1.
The Analysis of a Simple kMeans Clustering Algorithm
, 2000
"... Kmeans clustering is a very popular clustering technique, which is used in numerous applications. Given a set of n data points in R d and an integer k, the problem is to determine a set of k points R d , called centers, so as to minimize the mean squared distance from each data point to its nea ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
Kmeans clustering is a very popular clustering technique, which is used in numerous applications. Given a set of n data points in R d and an integer k, the problem is to determine a set of k points R d , called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for kmeans clustering is Lloyd's algorithm. In this paper we present a simple and efficient implementation of Lloyd's kmeans clustering algorithm, which we call the filtering algorithm. This algorithm is very easy to implement. It differs from most other approaches in that it precomputes a kdtree data structure for the data points rather than the center points. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a datasensitive analysis of the algorithm's running time. Second, we have implemented the algorithm and performed a number of empirical studies, both on synthetically generated data and on real data from...