## Fast Approximate Spectral Clustering (2009)

### Cached

### Download Links

- [stat-www.berkeley.edu]
- [stat-ftp.berkeley.edu]
- [www.stat.berkeley.edu]
- [ftp.stat.berkeley.edu]
- [www.cs.berkeley.edu]
- [www.cs.berkeley.edu]
- [www2.berkeley.intel-research.net]
- [www2.berkeley.intel-research.net]
- [www2.berkeley.intel-research.net]
- [www2.berkeley.intel-research.net]
- [esp.intel-research.net]

Citations: | 39 - 1 self |

### BibTeX

@MISC{Yan09fastapproximate,

author = {Donghui Yan and Ling Huang and Michael I. Jordan},

title = {Fast Approximate Spectral Clustering},

year = {2009}

}

### OpenURL

### Abstract

Spectral clustering refers to a flexible class of clustering procedures that can produce high-quality clusterings on small data sets but which has limited applicability to large-scale problems due to its computational complexity of O(n 3), with n the number of data points. We extend the range of spectral clustering by developing a general framework for fast approximate spectral clustering in which a distortion-minimizing local transformation is first applied to the data. This framework is based on a theoretical analysis that provides a statistical characterization of the effect of local distortion on the mis-clustering rate. We develop two concrete instances of our general framework, one based on local k-means clustering (KASP) and one based on random projection trees (RASP). Extensive experiments show that these algorithms can achieve significant speedups with little degradation in clustering accuracy. Specifically, our algorithms outperform k-means by a large margin in terms of accuracy, and run several times faster than approximate spectral clustering based on the Nyström method, with comparable accuracy and significantly smaller memory footprint. Remarkably, our algorithms make it possible for a single machine to spectral cluster data sets with a million observations within several minutes. 1

### Citations

2736 | Normalized cuts and image segmentation
- Shi, Malik
- 2000
(Show Context)
Citation Context ...es to solve clustering problems [15, 20]. A relatively recent area of focus has been spectral clustering, a class of methods based on eigendecompositions of affinity, dissimilarity or kernel matrices =-=[21, 29, 33]-=-. Whereas many clustering methods are strongly tied to Euclidean geometry, making explicit or implicit assumptions that clusters form convex regions in Euclidean space, spectral methods are more flexi... |

1609 |
A k-means clustering algorithm
- Hartigan, Wong
- 1979
(Show Context)
Citation Context ...primary importance in data mining, statistical machine learning and scientific discovery. An enormous variety of methods have been developed over the past several decades to solve clustering problems =-=[15, 20]-=-. A relatively recent area of focus has been spectral clustering, a class of methods based on eigendecompositions of affinity, dissimilarity or kernel matrices [21, 29, 33]. Whereas many clustering me... |

1574 | Random Forests
- Breiman
- 2001
(Show Context)
Citation Context ...ad, to provide a rough upper bound, we treat the clustering problem as a classification problem and present results from a state-of-the-art classification algorithm, the Random Forests (RF) algorithm =-=[7]-=-. These results suggest that the data reduction in KASP and RASP have not seriously degraded the clustering accuracy. We also performed a further comparison of k-means and our methods in which we incr... |

1374 | Data clustering: a review
- Jain, Murty, et al.
- 1999
(Show Context)
Citation Context ...primary importance in data mining, statistical machine learning and scientific discovery. An enormous variety of methods have been developed over the past several decades to solve clustering problems =-=[15, 20]-=-. A relatively recent area of focus has been spectral clustering, a class of methods based on eigendecompositions of affinity, dissimilarity or kernel matrices [21, 29, 33]. Whereas many clustering me... |

1166 | On spectral clustering: Analysis and an algorithm
- Ng, Jordan, et al.
- 2002
(Show Context)
Citation Context ...es to solve clustering problems [15, 20]. A relatively recent area of focus has been spectral clustering, a class of methods based on eigendecompositions of affinity, dissimilarity or kernel matrices =-=[21, 29, 33]-=-. Whereas many clustering methods are strongly tied to Euclidean geometry, making explicit or implicit assumptions that clusters form convex regions in Euclidean space, spectral methods are more flexi... |

835 | A fast and high quality multilevel scheme for partitioning irregular graphs
- Karypis, Kumar
- 1998
(Show Context)
Citation Context ...ches the partitioning problem by reducing the size of the graph by collapsing vertices and edges, partitioning the smaller graph, and then uncoarsening to construct a partition for the original graph =-=[17, 23]-=-. Our work is similar in spirit to this multiscale approach; we provide 3rigorous theoretical analysis for a particular kind of coarsening and uncoarsening methodology. More generally, our work is re... |

803 | An optimal algorithm for approximate nearest neighbor searching
- Arya, Mount, et al.
- 1994
(Show Context)
Citation Context ...representative for the data points in that cell. RP trees are based on k-d trees, which are spatial data structures that partition a data space by recursively splitting along one coordinate at a time =-=[2]-=-. Rather than splitting along coordinate directions, RP tree splits are made according to randomly chosen directions. All points in the current cell are projected along the random direction and the ce... |

652 | Approximation Theorems in Mathematical Statistics - Serfling - 1980 |

456 |
A Multilevel Algorithm for Partitioning Graphs
- Hendrickson, Leland
- 1993
(Show Context)
Citation Context ...ches the partitioning problem by reducing the size of the graph by collapsing vertices and edges, partitioning the smaller graph, and then uncoarsening to construct a partition for the original graph =-=[17, 23]-=-. Our work is similar in spirit to this multiscale approach; we provide 3rigorous theoretical analysis for a particular kind of coarsening and uncoarsening methodology. More generally, our work is re... |

358 |
Measurement error models
- Fuller
- 1987
(Show Context)
Citation Context ... constraints, computational efficiency or privacy considerations. Data perturbation can be modeled in several different ways, including contaminated distribution models [36], measurement error models =-=[12]-=- and mixture modeling. We choose to work with an additive noise model, due to its simplicity and its proven value in a number of problem areas such as data filtering, quantization and compression. We ... |

356 |
Meschach: Matrix Computation in C
- Stewart, leyd
- 1994
(Show Context)
Citation Context ...sis allows us to bound the perturbation of the Laplacian matrix expressed in terms of a Frobenius norm. To connect that analysis to Theorem 1, we make use of the following standard lemma. 12Lemma 2 (=-=[35]-=-). Let g denote the eigengap between the second and the third eigenvalues of L. Then the following holds: ‖˜v2 − v2‖ ≤ 1 g ‖˜ ( L − L|| + O ‖˜ L − L|| 2) . With these links in the chain of the argumen... |

312 | A tutorial on spectral clustering
- Luxburg
(Show Context)
Citation Context ...n successfully deployed in numerous applications in areas such as computer vision, bioinformatics, and robotics. Moreover, there is a substantial theoretical literature supporting spectral clustering =-=[21, 37]-=-. Despite these virtues, spectral clustering is not widely viewed as a competitor to classical algorithms such as hierarchical clustering and k-means for large-scale data mining problems. The reason i... |

301 | Using the Nyström method to speed up kernel machines. NIPS
- Williams, Seeger
- 2000
(Show Context)
Citation Context ...e on low-rank matrix approximations. This last approach has been particularly prominent in the literature; in particular, several researchers have proposed using the Nyström method for rank reduction =-=[9, 38, 11]-=-. While it is useful to define such preprocessors, simply possessing a knob that can adjust computational complexity does not constitute a solution to the problem of fast spectral clustering. What is ... |

264 | On clustering: Good, bad and spectral
- Kannan, Vempala, et al.
- 2000
(Show Context)
Citation Context ...es to solve clustering problems [15, 20]. A relatively recent area of focus has been spectral clustering, a class of methods based on eigendecompositions of affinity, dissimilarity or kernel matrices =-=[21, 29, 33]-=-. Whereas many clustering methods are strongly tied to Euclidean geometry, making explicit or implicit assumptions that clusters form convex regions in Euclidean space, spectral methods are more flexi... |

241 | Refining initial points for K-means clustering
- Bradley, Fayyad
- 1998
(Show Context)
Citation Context ...averaged over 100 runs. The performance of k-means can vary significantly depending on the initialization method. Recently a variety of approaches have been proposed for the initialization of k-means =-=[22, 1, 31, 25, 6]-=-. We chose to study three initialization methods, based on their documented favorable performance [6, 30, 31, 25], as well as their relatively straightforward implementation: the Hartigan-Wong algorit... |

234 | An efficient k-means clustering algorithm: analysis and implementation
- Kanungo, Mount, et al.
- 2002
(Show Context)
Citation Context ...averaged over 100 runs. The performance of k-means can vary significantly depending on the initialization method. Recently a variety of approaches have been proposed for the initialization of k-means =-=[22, 1, 31, 25, 6]-=-. We chose to study three initialization methods, based on their documented favorable performance [6, 30, 31, 25], as well as their relatively straightforward implementation: the Hartigan-Wong algorit... |

230 | k-means++: The Advantages of Careful Seeding
- Arthur, Vassilvitskii
- 2007
(Show Context)
Citation Context ...averaged over 100 runs. The performance of k-means can vary significantly depending on the initialization method. Recently a variety of approaches have been proposed for the initialization of k-means =-=[22, 1, 31, 25, 6]-=-. We chose to study three initialization methods, based on their documented favorable performance [6, 30, 31, 25], as well as their relatively straightforward implementation: the Hartigan-Wong algorit... |

206 | Spectral grouping using the Nyström method
- Fowlkes, Belongie, et al.
- 2004
(Show Context)
Citation Context ...e on low-rank matrix approximations. This last approach has been particularly prominent in the literature; in particular, several researchers have proposed using the Nyström method for rank reduction =-=[9, 38, 11]-=-. While it is useful to define such preprocessors, simply possessing a knob that can adjust computational complexity does not constitute a solution to the problem of fast spectral clustering. What is ... |

198 | Efficient SVM training using low-rank kernel representations
- Fine, Scheinberg
- 2001
(Show Context)
Citation Context ... linear algebra in which a matrix is replaced with a low-rank approximation. These methods have been widely adopted, particularly in the context of approximations for the support vector machine (SVM) =-=[9, 38, 10, 34]-=-. The affinity matrix of spectral clustering is a natural target for rank reduction. In particular, [11] have used the Nyström approximation, which samples columns of the affinity matrix and approxima... |

189 | Multiclass spectral clustering
- Yu, Shi
(Show Context)
Citation Context ...ng is to partition the data into m disjoint classes such that each xi belongs to one and only one class. Different spectral clustering algorithms formalize this partitioning problem in different ways =-=[33, 27, 29, 39]-=-. In the current paper we adopt the normalized cuts (Ncut) formulation [33]. 1 Define W(V1, V2) = ∑ i∈V1,j∈V2 aij for two (possibly overlapping) subsets V1 and V2 of V . Let V = 1 We use Ncut only for... |

185 | Sparse greedy matrix approximation for machine learning
- Smola, Schölkopf
- 2000
(Show Context)
Citation Context ... linear algebra in which a matrix is replaced with a low-rank approximation. These methods have been widely adopted, particularly in the context of approximations for the support vector machine (SVM) =-=[9, 38, 10, 34]-=-. The affinity matrix of spectral clustering is a natural target for rank reduction. In particular, [11] have used the Nyström approximation, which samples columns of the affinity matrix and approxima... |

121 |
Asymptotic quantization error of continuous signals and the quantization dimension
- Zador
- 1982
(Show Context)
Citation Context ...h L0B , and (18) with (19), we verify the claims stated in Lemma 7 and Lemma 8 as well as in their proofs. ⎤ ⎦, ⎤ ⎥ ⎦ ⎤ ⎥ ⎦ 177.3 Performance analysis for KASP Existing work from vector quantization =-=[40, 13]-=- allows us to characterize precisely the amount of distortion when the representative points are computed by k-means clustering if the probability distribution of the original data is given. Let a qua... |

114 | Approximate clustering via core-sets
- Bădoiu, Har-Peled, et al.
- 2002
(Show Context)
Citation Context ...sing techniques to overcome computational bottlenecks in mining large-scale data. Examples include [28], who proposed a nonparametric data reduction scheme based on multiscale density estimation, and =-=[5]-=-, who proposed a fast algorithm to extract small “core-sets” from the input data, based on which (1 + ǫ)-approximation algorithms for the k-center clustering have been developed. Our work is also rela... |

114 | On the Nyström method for approximating a Gram matrix for improved kernel-based learning
- Drineas, Mahoney
- 2005
(Show Context)
Citation Context ...ix approximations. Indeed, this last approach has been the approach most commonly pursued in the literature; in particular, several researchers have proposed using the Nyström method for this purpose =-=[10, 35, 12]-=-. While it is useful to define such preprocessors, simply possessing a knob that can adjust computational complexity does not constitute a solution to the problem of fast spectral clustering. What is ... |

103 | An empirical comparison of four initialization methods for the K-means algorithm
- Peña, Lozano, et al.
- 1999
(Show Context)
Citation Context ...cently a variety of approaches have been proposed for the initialization of k-means [22, 1, 31, 25, 6]. We chose to study three initialization methods, based on their documented favorable performance =-=[6, 30, 31, 25]-=-, as well as their relatively straightforward implementation: the Hartigan-Wong algorithm (KM-1) [16], the samplingbased two-stage algorithm (KM-2) (i.e., the Matlab implementation of k-means with the... |

82 | Weighted graph cuts without eigenvectors: A multilevel approach - DHILLON, GUAN, et al. - 2007 |

57 | Random projection trees and low dimensional manifolds
- Dasgupta, Freund
- 2008
(Show Context)
Citation Context ... current paper we provide two examples of such preprocessors. The first is classical kmeans, used in this context as a local data reduction step. The second is the Random Projection tree (RP tree) of =-=[8]-=-. In either case, the overall approximate spectral clustering algorithm takes the following form: (1) coarsen the affinity graph by using the preprocessor to collapse neighboring data points into a se... |

48 |
A Survey of Sampling from Contaminated Distributions
- Tukey
- 1960
(Show Context)
Citation Context ...n decisions reflecting resource constraints, computational efficiency or privacy considerations. Data perturbation can be modeled in several different ways, including contaminated distribution models =-=[36]-=-, measurement error models [12] and mixture modeling. We choose to work with an additive noise model, due to its simplicity and its proven value in a number of problem areas such as data filtering, qu... |

47 | M I, “Learning Spectral Clustering, With Application To Speech Separation
- Bach, Jordan
- 2006
(Show Context)
Citation Context ...igenvalue). The components of this vector are thresholded to define the class memberships of the data points. Although spectral clustering algorithms that work directly with multiway partitions exist =-=[4, 39]-=-, in the current paper we will focus on the classical recursive Ncut algorithm. We assume that the number of clusters is given a priori and we run the recursion until the desired number of clusters is... |

33 | The strong law of large numbers for U-statistics - Hoeffding - 1961 |

26 | Spectral clustering with perturbed data
- Huang, Yan, et al.
- 2008
(Show Context)
Citation Context ... given our focus on practical error bounds. It is also worth noting that this analysis has applications beyond the design of fast approximations to spectral clustering. In particular, as discussed by =-=[19]-=-, our perturbation analysis can be used for developing distributed versions of spectral clustering and for analyzing robustness to noise. The remainder of the paper is organized as follows. We begin w... |

23 |
Learning segmentation with random walk
- Meila, Shi
- 2001
(Show Context)
Citation Context ...ng is to partition the data into m disjoint classes such that each xi belongs to one and only one class. Different spectral clustering algorithms formalize this partitioning problem in different ways =-=[33, 27, 29, 39]-=-. In the current paper we adopt the normalized cuts (Ncut) formulation [33]. 1 Define W(V1, V2) = ∑ i∈V1,j∈V2 aij for two (possibly overlapping) subsets V1 and V2 of V . Let V = 1 We use Ncut only for... |

17 | Likelihood-based data squashing: a modeling approach to instance construction. Data Mining and Knowledge Discovery 6
- MADIGAN, RAGHAVAN, et al.
- 2002
(Show Context)
Citation Context ...tions in data mining in which a computational bottleneck is involved, we aim to find an effective preprocessor that reduces the size of the data structure that is input to that bottleneck (see, e.g., =-=[25, 27]-=-). There are many options that can be considered for this preprocessing step. One option is to perform various forms of subsampling of the data, selecting data points at random or according to some fo... |

12 | Density-based multiscale data condensation
- Mitra, Murthy, et al.
- 2002
(Show Context)
Citation Context ...tions in data mining in which a computational bottleneck is involved, we aim to find an effective preprocessor that reduces the size of the data structure that is input to that bottleneck (see, e.g., =-=[26, 28]-=-). There are many options that can be considered for this preprocessing step. One option is to perform various forms of subsampling of the data, selecting data points at random or according to some fo... |

11 |
A simple linear time (1 + ǫ)-approximation algorithm for k-means clustering in any dimensions
- Kumar, Sabharwal, et al.
- 2004
(Show Context)
Citation Context ... Algorithm 2. The computational complexity of step 1, k-means, is O(knt), where t is the number of iterations 2 . Given 2 There also exist approximate k-means algorithms (e.g., the (1 + ǫ) k-means in =-=[24]-=-) with a running time of O(nt). 5Algorithm 2 KASP (x1, . . . ,xn, k) n data points {xi} n i=1 , number of representative points k Input: Output: m-way partition of the input data 1. Perform k-means w... |

8 |
Hierarchical initialization approach for kmeans clustering
- Lu, Tang, et al.
- 2008
(Show Context)
Citation Context |

7 | Fast Iterative Kernel Principal Component Analysis
- Günter, Schraudolph, et al.
- 1918
(Show Context)
Citation Context ... sets as large as one million data points; the largest experiment that we are aware of for spectral algorithms involves the MNIST data set, which consists of 60,000 handwritten digits. In particular, =-=[14]-=- reported experiments using this data set, where a total running time of about 30 hours was required when using a fast iterative algorithm. 5.1 Evaluation metrics We used two quantities to assess the ... |

7 | A method for initialising the Kmeans clustering algorithm using kd-trees
- Redmond, Heneghan
- 2007
(Show Context)
Citation Context |