## Clustering for Edge-Cost Minimization (0)

Citations: | 30 - 4 self |

### BibTeX

@MISC{Schulman_clusteringfor,

author = {Leonard J. Schulman},

title = {Clustering for Edge-Cost Minimization},

year = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

Leonard J. Schulman College of Computing Georgia Institute of Technology Atlanta GA 30332-0280 ABSTRACT We address the problem of partitioning a set of n points into clusters, so as to minimize the sum, over all intracluster pairs of points, of the cost associated with each pair. We obtain a randomized approximation algorithm for this problem, for the cost functions ` 2 2 ; `1 and `2 , as well as any cost function isometrically embeddable in ` 2 2 .

### Citations

2745 | Authoritative Sources in a Hyperlinked Environment
- Kleinberg
- 1998
(Show Context)
Citation Context ...out that OE is nonnegative. For general references in the field of clustering see [10, 38, 33, 69, 27, 36, 53, 54, 2, 7]; for discussions of a variety of interesting methods and application areas see =-=[24, 68, 65, 58, 60, 67, 48, 26, 46]-=-. A key role in our method is played by a random sampling process which, given T , picks a very small weighted collection of points. We show that for a range of cost functions, the cost of this collec... |

2627 | Normalized cuts and image segmentation
- Shi, Malik
(Show Context)
Citation Context ...out that OE is nonnegative. For general references in the field of clustering see [10, 38, 33, 69, 27, 36, 53, 54, 2, 7]; for discussions of a variety of interesting methods and application areas see =-=[24, 68, 65, 58, 60, 67, 48, 26, 46]-=-. A key role in our method is played by a random sampling process which, given T , picks a very small weighted collection of points. We show that for a range of cost functions, the cost of this collec... |

2175 | Dubes. Algorithms for Clustering Data - Jain, Richard - 1988 |

1885 | Some methods for classification and analysis of multivariate observations
- MacQueen
- 1967
(Show Context)
Citation Context ...") appears to have been discussed first for one dimension by Fisher in 1958 [23] and for higher dimensions by Ward in 1963 [72] and Shlezinger in 1965 [66]; a partial list of subsequent literatur=-=e is [20, 37, 28, 51, 63, 6, 41, 11, 35, 31, 16]-=-, and some surveys touching on the subject are [27, 55]. (Point weights appear not to have been discussed in this literature, but can be accomodated without harm to any existing result.) The regions c... |

1520 |
Clustering algorithms
- Hartigan
- 1975
(Show Context)
Citation Context ...mmetric and that OE u;u = 0 8u 2 T ; hence equivalently OE(S) = 1 2 P u;v2S wuw v OE u;v : We will also assume throughout that OE is nonnegative. For general references in the field of clustering see =-=[10, 38, 33, 69, 27, 36, 53, 54, 2, 7]-=-; for discussions of a variety of interesting methods and application areas see [24, 68, 65, 58, 60, 67, 48, 26, 46]. A key role in our method is played by a random sampling process which, given T , p... |

1444 |
Reducibility among combinatorial problems
- Karp
- 1972
(Show Context)
Citation Context ...x cut (i.e. k = 2) has been provided in recent independent work by Fernandez de la Vega and Kenyon [14] (building upon [4, 12, 13]). Max cut (maximization of OE(T ) \Gamma OE(S; �� S)) is NP-compl=-=ete [42, 25] (ev-=-en for metric spaces); min cluster (minimization of OE(S; �� S)) is equivalent, but multiplicative approximation of these quantities is not equivalent, with min cluster being harder since there is... |

1146 | Geometric algorithms and combinatorial optimization - Grötschel, Lovász, et al. - 1993 |

948 |
On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications
- Vapnik, Chervonenkis
- 1971
(Show Context)
Citation Context ... of the lifted set by hyperplanes of dimension d. Almost the same result can be obtained from some broader considerations (which, however, rest upon the same proof technique). Vapnik and Chervonenkis =-=[71]-=-, Sauer [61] and Perles and Shelah [64] showed that the number of subsets of a set of size n that can be defined by intersections with elements of a range space of VC dimension d, is at most \Phi d (n... |

810 |
Estimation of dependences based on empirical data
- Vapnik
- 1982
(Show Context)
Citation Context ...tersections with elements of a range space of VC dimension d, is at most \Phi d (n). (For a discussion of applications in learning theory, statistics, and combinatorial and computational geometry see =-=[43, 70, 52]-=-.) Dudley has shown that the range space of balls in dimension d has VC dimension d + 1 [17], and the bound \Phi d+1 (n) follows. 2 The 2-stable partitions are even further restricted: Proposition 10 ... |

691 |
Algorithms in Combinatorial Geometry
- Edelsbrunner
- 1987
(Show Context)
Citation Context ...ructing the incidence graph of this arrangement; the size of the arrangement (total number of faces of all dimensions) is O(n d+1 ), and the arrangement can be constructed in the same amount of time (=-=[18]-=- x7). From this arrangement, a graph (represented by its adjacency list) can be constructed, whose nodes are the cells of the arrangement, connected by an edge if the corresponding cells share a hyper... |

596 |
An Introduction to computational learning theory
- Kearns, Vazirani
- 1994
(Show Context)
Citation Context ...tersections with elements of a range space of VC dimension d, is at most \Phi d (n). (For a discussion of applications in learning theory, statistics, and combinatorial and computational geometry see =-=[43, 70, 52]-=-.) Dudley has shown that the range space of balls in dimension d has VC dimension d + 1 [17], and the bound \Phi d+1 (n) follows. 2 The 2-stable partitions are even further restricted: Proposition 10 ... |

518 |
Hierarchical grouping to optimize an objective function
- Ward
- 1963
(Show Context)
Citation Context ... so as to minimize P /(S i ) (sometimes known as "sum of squares minimization") appears to have been discussed first for one dimension by Fisher in 1958 [23] and for higher dimensions by War=-=d in 1963 [72]-=- and Shlezinger in 1965 [66]; a partial list of subsequent literature is [20, 37, 28, 51, 63, 6, 41, 11, 35, 31, 16], and some surveys touching on the subject are [27, 55]. (Point weights appear not t... |

474 |
Mixture Models: Inference and Applications to Clustering
- McLachlan, Basford
- 1988
(Show Context)
Citation Context ...mmetric and that OE u;u = 0 8u 2 T ; hence equivalently OE(S) = 1 2 P u;v2S wuw v OE u;v : We will also assume throughout that OE is nonnegative. For general references in the field of clustering see =-=[10, 38, 33, 69, 27, 36, 53, 54, 2, 7]-=-; for discussions of a variety of interesting methods and application areas see [24, 68, 65, 58, 60, 67, 48, 26, 46]. A key role in our method is played by a random sampling process which, given T , p... |

453 | The geometry of graphs and some of its algorithmic applications
- Linial, London, et al.
- 1995
(Show Context)
Citation Context ...out that OE is nonnegative. For general references in the field of clustering see [10, 38, 33, 69, 27, 36, 53, 54, 2, 7]; for discussions of a variety of interesting methods and application areas see =-=[24, 68, 65, 58, 60, 67, 48, 26, 46]-=-. A key role in our method is played by a random sampling process which, given T , picks a very small weighted collection of points. We show that for a range of cost functions, the cost of this collec... |

409 |
Extensions of lipschitz mapping into hilbert space
- Johnson, Lindenstrauss
- 1984
(Show Context)
Citation Context ...ean space is mapped under a random orthogonal projection M to an O( log n ffl 2 )-dimensional subspace, then with high probability the distortion of the metric on these points is no more than 1 + ffl =-=[40]-=- (a constant of 4 is achievable in this theorem). The distortion is max a;b;c;d2T ( ae Ma;Mb ae c;d ae a;b ae Mc;Md ): Such a mapping may be found efficiently (in time ~ O(n 2 )) by trial and error. O... |

379 | A polynomail algorithm in linear programming - Khachiyan - 1979 |

377 |
Some simplified NP-complete graph problems
- Garey, Johnson, et al.
- 1976
(Show Context)
Citation Context ...x cut (i.e. k = 2) has been provided in recent independent work by Fernandez de la Vega and Kenyon [14] (building upon [4, 12, 13]). Max cut (maximization of OE(T ) \Gamma OE(S; �� S)) is NP-compl=-=ete [42, 25] (ev-=-en for metric spaces); min cluster (minimization of OE(S; �� S)) is equivalent, but multiplicative approximation of these quantities is not equivalent, with min cluster being harder since there is... |

366 | A simple parallel algorithm for the maximal independent set problem
- Luby
- 1986
(Show Context)
Citation Context ...d marginals (distributions of each variable Ku ). (By slight modification of the marginals, earlier and somewhat simpler methods based on linear error correcting codes can likely be used as well, see =-=[39, 50, 1, 9]-=-. This would require modification of the analysis of the sampling process.) 3 The case OE = ` 2 2 3.1 Preliminaries and related literature We focus now on the central special case in which the cost fu... |

320 | Polynomial time approximation schemes for Euclidean TSP and other geometric problems.Proc - Arora - 1996 |

318 | Approximation algorithms for metric facility location and k-median problems using the primal-dual scheme and Lagrangian relaxation - Jain, Vazirani - 2001 |

316 | Probabilistic approximations of metric spaces and its algorithmic applications - Bartal - 1996 |

240 |
On the density of families of sets
- Sauer
- 1972
(Show Context)
Citation Context ...ed set by hyperplanes of dimension d. Almost the same result can be obtained from some broader considerations (which, however, rest upon the same proof technique). Vapnik and Chervonenkis [71], Sauer =-=[61]-=- and Perles and Shelah [64] showed that the number of subsets of a set of size n that can be defined by intersections with elements of a range space of VC dimension d, is at most \Phi d (n). (For a di... |

218 |
A fast and simple randomized parallel algorithm for the maximal independent set problem
- Alon, Babai, et al.
- 1986
(Show Context)
Citation Context ...d marginals (distributions of each variable Ku ). (By slight modification of the marginals, earlier and somewhat simpler methods based on linear error correcting codes can likely be used as well, see =-=[39, 50, 1, 9]-=-. This would require modification of the analysis of the sampling process.) 3 The case OE = ` 2 2 3.1 Preliminaries and related literature We focus now on the central special case in which the cost fu... |

211 | A constant-factor approximation algorithm for the k-median problem - Charikar, Guha, et al. - 1999 |

197 | On the combinatorial and algebraic complexity of quanti elimination - Basu, Pollack, et al. - 1996 |

172 | Polynomial time approximation schemes for dense instances of NP-Hard problems
- Arora, Karger, et al.
- 1999
(Show Context)
Citation Context ...cient algorithm. For the case of metric spaces, an approximation algorithm for max cut (i.e. k = 2) has been provided in recent independent work by Fernandez de la Vega and Kenyon [14] (building upon =-=[4, 12, 13]). Max c-=-ut (maximization of OE(T ) \Gamma OE(S; �� S)) is NP-complete [42, 25] (even for metric spaces); min cluster (minimization of OE(S; �� S)) is equivalent, but multiplicative approximation of th... |

155 | The bit extraction problem or t-resilient functions
- Chor, Goldreich, et al.
- 1985
(Show Context)
Citation Context ...d marginals (distributions of each variable Ku ). (By slight modification of the marginals, earlier and somewhat simpler methods based on linear error correcting codes can likely be used as well, see =-=[39, 50, 1, 9]-=-. This would require modification of the analysis of the sampling process.) 3 The case OE = ` 2 2 3.1 Preliminaries and related literature We focus now on the central special case in which the cost fu... |

137 |
Central limit theorems for empirical measures
- Dudley
- 1978
(Show Context)
Citation Context ...on of applications in learning theory, statistics, and combinatorial and computational geometry see [43, 70, 52].) Dudley has shown that the range space of balls in dimension d has VC dimension d + 1 =-=[17], and-=- the bound \Phi d+1 (n) follows. 2 The 2-stable partitions are even further restricted: Proposition 10 If S ae R then j(R) ae j(S). Furthermore (R) " (S) can contain at most one point. Proof: A c... |

129 |
Mathematical Classification and Clustering
- Mirkin
- 1996
(Show Context)
Citation Context ...mmetric and that OE u;u = 0 8u 2 T ; hence equivalently OE(S) = 1 2 P u;v2S wuw v OE u;v : We will also assume throughout that OE is nonnegative. For general references in the field of clustering see =-=[10, 38, 33, 69, 27, 36, 53, 54, 2, 7]-=-; for discussions of a variety of interesting methods and application areas see [24, 68, 65, 58, 60, 67, 48, 26, 46]. A key role in our method is played by a random sampling process which, given T , p... |

97 |
A combinatorial problem; stability and order for models and theories in infinitary languages
- Shelah
- 1972
(Show Context)
Citation Context ...mension d. Almost the same result can be obtained from some broader considerations (which, however, rest upon the same proof technique). Vapnik and Chervonenkis [71], Sauer [61] and Perles and Shelah =-=[64]-=- showed that the number of subsets of a set of size n that can be defined by intersections with elements of a range space of VC dimension d, is at most \Phi d (n). (For a discussion of applications in... |

87 | Clustering in large graphs and matrices
- DRINEAS, FRIEZE, et al.
- 1999
(Show Context)
Citation Context ...") appears to have been discussed first for one dimension by Fisher in 1958 [23] and for higher dimensions by Ward in 1963 [72] and Shlezinger in 1965 [66]; a partial list of subsequent literatur=-=e is [20, 37, 28, 51, 63, 6, 41, 11, 35, 31, 16]-=-, and some surveys touching on the subject are [27, 55]. (Point weights appear not to have been discussed in this literature, but can be accomodated without harm to any existing result.) The regions c... |

79 | Approximation algorithms for geometric problems, in Approximation Algorithms for NP-hard Problems
- Bern, Eppstein
- 1997
(Show Context)
Citation Context |

76 |
Efficient algorithms for agglomerative hierarchical clustering methods
- Day, Edelsbrunner
- 1984
(Show Context)
Citation Context ...") appears to have been discussed first for one dimension by Fisher in 1958 [23] and for higher dimensions by Ward in 1963 [72] and Shlezinger in 1965 [66]; a partial list of subsequent literatur=-=e is [20, 37, 28, 51, 63, 6, 41, 11, 35, 31, 16]-=-, and some surveys touching on the subject are [27, 55]. (Point weights appear not to have been discussed in this literature, but can be accomodated without harm to any existing result.) The regions c... |

73 |
Lower bound for approximation by nonlinear manifolds
- Warren
- 1968
(Show Context)
Citation Context ... As suggested in [35], the partitions defined by these regions can be examined exhaustively, and that partition which gives the least-cost clustering can be identified as optimal. A theorem of Warren =-=[73]-=- shows that the number of components in R N \Gamma E, where E is the union of the zero-sets of M polynomials in N variables each of degree at most D, is less than ((4eDM )=N ) N . Therefore the number... |

65 | Applications of weighted voronoi diagrams and randomization to variance-based k-clustering: (extended abstract
- Inaba, Katoh, et al.
- 1994
(Show Context)
Citation Context ... ` 2 2 cost function is OE(S) = w 2 S Var(S). Weak separation was shown previously by Boros and Hammer [8]. The distinction between the kinds of separation was overlooked. Later Inaba, Katoh and Imai =-=[35]-=- proposed examining all sphere partitions to find an optimal partition. That proposal is justified only on the basis of the present work, because weak separation does not imply a sub-exponential time ... |

53 |
On Grouping for Maximum Homogeneity
- Fisher
- 1958
(Show Context)
Citation Context ...in this section we let OE = ` 2 2 ). Clustering so as to minimize P /(S i ) (sometimes known as "sum of squares minimization") appears to have been discussed first for one dimension by Fishe=-=r in 1958 [23]-=- and for higher dimensions by Ward in 1963 [72] and Shlezinger in 1965 [66]; a partial list of subsequent literature is [20, 37, 28, 51, 63, 6, 41, 11, 35, 31, 16], and some surveys touching on the su... |

51 |
A review of classification
- CORMACK
(Show Context)
Citation Context |

49 |
On the number of halving lines
- Lovasz
- 1971
(Show Context)
Citation Context ...sections with spheres would be required; the existence of such a bound seems to be an open question. (The related "k-sets" question for intersections with halfspaces is a long-standing chall=-=enge; see [52, 49, 22, 19, 5, 3, 57, 21, 15]-=-.) It would also be necessary to implement an efficient search of sphere partitions which did not expend much effort on spheres not in the relevant 2-family. 3.5 Exact deterministic algorithm for k-pa... |

47 |
Theorie der vielfachen Kontinuitat (1852), in
- Schläfli
- 1950
(Show Context)
Citation Context ... a detour into the literature concerning partitions of space by hyperplanes. Schlafli showed in the last century that the number of cells in a partition of R d by n hyperplanes is at most \Phi d (n) (=-=[62]-=- p. 209), and that this bound is achieved for hyperplanes in general position. The number of partitions of n points in general position in R d by hyperplanes is deduced to be \Phi d (n \Gamma 1) by an... |

46 |
On a set of almost deterministic k-independent random variables
- Joffe
- 1974
(Show Context)
Citation Context |

37 |
Dissection graphs of planar point sets
- Erdős, Lovász, et al.
- 1973
(Show Context)
Citation Context ...sections with spheres would be required; the existence of such a bound seems to be an open question. (The related "k-sets" question for intersections with halfspaces is a long-standing chall=-=enge; see [52, 49, 22, 19, 5, 3, 57, 21, 15]-=-.) It would also be necessary to implement an efficient search of sphere partitions which did not expend much effort on spheres not in the relevant 2-family. 3.5 Exact deterministic algorithm for k-pa... |

36 |
Proof techniques in the theory of finite sets
- Greene, Kleitman
- 1978
(Show Context)
Citation Context ... one point. Proof: A consequence of the equation OE(S [ fvg) \Gamma OE(S) = w v P u2S wu ae 2 u;v . 2 A collection of elements of a poset is termed a j-family if it contains no chains of length j + 1 =-=[29]-=-. (A 1-family is an antichain.) Corollary 11 (a) The collection of sets which occur as clusters in 1-stable partitions of a set T are a 2-family in the poset of subsets of T . (b) The collection of se... |

31 |
Points and triangles in the plane and halving planes in space
- Aronov, Chazelle, et al.
- 1991
(Show Context)
Citation Context ...sections with spheres would be required; the existence of such a bound seems to be an open question. (The related "k-sets" question for intersections with halfspaces is a long-standing chall=-=enge; see [52, 49, 22, 19, 5, 3, 57, 21, 15]-=-.) It would also be necessary to implement an efficient search of sphere partitions which did not expend much effort on spheres not in the relevant 2-family. 3.5 Exact deterministic algorithm for k-pa... |

31 |
de la Vega, Max-Cut has a randomized approximation scheme in dense graphs, Random Struct. Algorithms 8
- Fernandez
- 1996
(Show Context)
Citation Context ...cient algorithm. For the case of metric spaces, an approximation algorithm for max cut (i.e. k = 2) has been provided in recent independent work by Fernandez de la Vega and Kenyon [14] (building upon =-=[4, 12, 13]). Max c-=-ut (maximization of OE(T ) \Gamma OE(S; �� S)) is NP-complete [42, 25] (even for metric spaces); min cluster (minimization of OE(S; �� S)) is equivalent, but multiplicative approximation of th... |

31 | AND NIMROD MEGIDDO: Constructing small sample spaces satisfying given constraints
- KOLLER
- 1994
(Show Context)
Citation Context ... = j) = fi j u , while P (K u = j 0 ) = 0 for j 0 ? j, then we are guaranteed of introducing an additional error probability of no more than ffi into the analysis of the algorithm. Koller and Megiddo =-=[47] prov-=-ide a deterministic algorithm, running in time polynomial in n, which produces a 4-wise independent sample space of size O((nj) 4 ) ` ~ O((n log(n" \Gamma1 ffi \Gamma1 )) 4 ), with the assumed ma... |

30 |
Global convergence and empirical consistency of the generalized Lloyd algorithm
- Sabine, Gray
- 1986
(Show Context)
Citation Context |

29 | A randomized approximation scheme for metric max-cut
- Vega, Kenyon
- 1998
(Show Context)
Citation Context ... the sake of an efficient algorithm. For the case of metric spaces, an approximation algorithm for max cut (i.e. k = 2) has been provided in recent independent work by Fernandez de la Vega and Kenyon =-=[14] (buildi-=-ng upon [4, 12, 13]). Max cut (maximization of OE(T ) \Gamma OE(S; �� S)) is NP-complete [42, 25] (even for metric spaces); min cluster (minimization of OE(S; �� S)) is equivalent, but multipl... |

29 |
Quantization and the method of k-means
- Pollard
- 1982
(Show Context)
Citation Context |

28 |
A method for cluster analysis
- Edwards, Cavalli-Sforza
- 1965
(Show Context)
Citation Context |

25 |
Multidimensional group analysis
- Jancey
- 1966
(Show Context)
Citation Context |