Results 1  10
of
52
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract

Cited by 286 (15 self)
 Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacianbased methods in a statistical setting.
A spectral algorithm for seriation and the consecutive ones problem
 SIAM Journal on Computing
, 1998
"... Abstract. In applications ranging from DNA sequencing through archeological dating to sparse matrix reordering, a recurrent problem is the sequencing of elements in such a way that highly correlated pairs of elements are near each other. That is, given a correlation function f reflecting the desire ..."
Abstract

Cited by 46 (0 self)
 Add to MetaCart
Abstract. In applications ranging from DNA sequencing through archeological dating to sparse matrix reordering, a recurrent problem is the sequencing of elements in such a way that highly correlated pairs of elements are near each other. That is, given a correlation function f reflecting the desire for each pair of elements to be near each other, find all permutations π with the property that if π(i) < π(j) < π(k) then f(i, j) ≥ f(i, k) and f(j, k) ≥ f(i, k). This seriation problem is a generalization of the wellstudied consecutive ones problem. We present a spectral algorithm for this problem that has a number of interesting features. Whereas most previous applications of spectral techniques provide only bounds or heuristics, our result is an algorithm that correctly solves a nontrivial combinatorial problem. In addition, spectral methods are being successfully applied as heuristics to a variety of sequencing problems, and our result helps explain and justify these applications.
Graph Partitioning Algorithms With Applications To Scientific Computing
 Parallel Numerical Algorithms
, 1997
"... Identifying the parallelism in a problem by partitioning its data and tasks among the processors of a parallel computer is a fundamental issue in parallel computing. This problem can be modeled as a graph partitioning problem in which the vertices of a graph are divided into a specified number of su ..."
Abstract

Cited by 41 (0 self)
 Add to MetaCart
Identifying the parallelism in a problem by partitioning its data and tasks among the processors of a parallel computer is a fundamental issue in parallel computing. This problem can be modeled as a graph partitioning problem in which the vertices of a graph are divided into a specified number of subsets such that few edges join two vertices in different subsets. Several new graph partitioning algorithms have been developed in the past few years, and we survey some of this activity. We describe the terminology associated with graph partitioning, the complexity of computing good separators, and graphs that have good separators. We then discuss early algorithms for graph partitioning, followed by three new algorithms based on geometric, algebraic, and multilevel ideas. The algebraic algorithm relies on an eigenvector of a Laplacian matrix associated with the graph to compute the partition. The algebraic algorithm is justified by formulating graph partitioning as a quadratic assignment p...
Web Document Clustering Using Hyperlink Structures
, 2001
"... With the exponential growth of information on the World Wide Web, there is great demand for developing efficient and effective methods for organizing and retrieving the information available. Document clustering plays an important role in information retrieval and taxonomy management for the World W ..."
Abstract

Cited by 38 (5 self)
 Add to MetaCart
With the exponential growth of information on the World Wide Web, there is great demand for developing efficient and effective methods for organizing and retrieving the information available. Document clustering plays an important role in information retrieval and taxonomy management for the World Wide Web and remains an interesting and challenging problem in the field of web computing. In this paper we consider document clustering methods exploring textual information, hyperlink structure and cocitation relations. In particular, we apply the normalizedcut clustering method developed in computer vision to the task of hyperdocument clustering. We also explore some theoretical connections of the normalizedcut method to Kmeans method. We then experiment with normalizedcut method in the context of clustering query result sets for web search engines.
PMRSB: Parallel Multilevel Recursive Spectral Bisection
 In Supercomputing
, 1995
"... The design of a parallel implementation of multilevel recursive spectral bisection on the Cray T3D is described. The code is intended to be fast enough to enable dynamic repartitioning of adaptive meshes and to partition meshes that are too large for workstations. Two innovations in the implementati ..."
Abstract

Cited by 35 (0 self)
 Add to MetaCart
The design of a parallel implementation of multilevel recursive spectral bisection on the Cray T3D is described. The code is intended to be fast enough to enable dynamic repartitioning of adaptive meshes and to partition meshes that are too large for workstations. Two innovations in the implementation are recursive asynchronous task teams and a parallel version of the multilevel accelerator. A performance improvement of a factor of 140 over the best available serial implementation is demonstrated. 1 Introduction The efficient implementation of unstructured problems on distributed memory parallel computers requires that the data be distributed across the memories in a way that simultaneously balances the work load of the processors and minimizes interprocessor communication. This paper describes a tool called PMRSB, which is implemented on the Cray T3D, that performs this function. Currently, it is restricted I would like to thank Horst Simon for his support and encouragement. I am ...
A MultiScale Algorithm for the Linear Arrangement Problem
 Proc. 28th Inter. Workshop on GraphTheoretic Concepts in Computer Science (WG’02), LNCS 2573
, 2002
"... Finding a linear ordering of the vertices of a graph is a common problem arising in diverse applications. In this paper we present a lineartime algorithm for this problem, based on the multiscale paradigm. Experimental results are similar to those of the best known approaches, while the running ti ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
Finding a linear ordering of the vertices of a graph is a common problem arising in diverse applications. In this paper we present a lineartime algorithm for this problem, based on the multiscale paradigm. Experimental results are similar to those of the best known approaches, while the running time is significantly better, enabling it to deal with much larger graphs. The paper contains a general multiscale construction, which may be used for a broader range of ordering problems.
TWO IMPROVED ALGORITHMS FOR ENVELOPE AND WAVEFRONT REDUCTION
, 1997
"... Two algorithms for reordering sparse, symmetric matrices or undirected graphs to reduce envelope and wavefront are considered. The rst is a combinatorial algorithm introduced by Sloan and further developed by Du, Reid, and Scott; we describe enhancements to the Sloan algorithm that improve its quali ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
Two algorithms for reordering sparse, symmetric matrices or undirected graphs to reduce envelope and wavefront are considered. The rst is a combinatorial algorithm introduced by Sloan and further developed by Du, Reid, and Scott; we describe enhancements to the Sloan algorithm that improve its quality and reduce its run time. Our test problems fall into two classes with differing asymptotic behavior of their envelope parameters as a function of the weights in the Sloan algorithm. We describe an e cient O(n log n + m) time implementation of the Sloan algorithm, where n is the number of rows (vertices), and m is the number of nonzeros (edges). On a collection of test problems, the improved Sloan algorithm required, on the average, only twice the time required by the simpler Reverse CuthillMcKee algorithm while improving the mean square wavefront by a factor of three. The second algorithm is a hybrid that combines a spectral algorithm for envelope and wavefront reduction with a refinement step that uses a modified Sloan algorithm. The hybrid algorithm reduces the envelope size and mean square wavefront obtained from the Sloan algorithm at the cost of greater running times. We illustrate how these reductions translate into tangible bene ts for frontal Cholesky factorization and incomplete factorization preconditioning.
Multilevel Algorithms for Wavefront Reduction
 SIAM J. SCIENTIFIC COMPUTING
, 2000
"... Multilevel algorithms are proposed for reordering sparse symmetric matrices to reduce the wavefront and profile. A graph representation of the matrix is used and two graph coarsening methods are investigated. A multilevel algorithm that uses a maximal independent vertex set for coarsening and the Sl ..."
Abstract

Cited by 17 (8 self)
 Add to MetaCart
Multilevel algorithms are proposed for reordering sparse symmetric matrices to reduce the wavefront and profile. A graph representation of the matrix is used and two graph coarsening methods are investigated. A multilevel algorithm that uses a maximal independent vertex set for coarsening and the Sloan algorithm on the coarsest graph is shown to produce orderings that are of a similar quality to those obtained using the best existing combinatorial algorithm (the hybrid Sloan algorithm). Advantages of the proposed algorithm over the the hybrid Sloan algorithm are that it does not require any spectral information and is significantly faster, requiring on average half the CPU time.