Results 1 - 10
of
44
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract
-
Cited by 170 (11 self)
- Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacian-based methods in a statistical setting.
Web Document Clustering Using Hyperlink Structures
, 2001
"... With the exponential growth of information on the World Wide Web, there is great demand for developing efficient and effective methods for organizing and retrieving the information available. Document clustering plays an important role in information retrieval and taxonomy management for the World W ..."
Abstract
-
Cited by 37 (5 self)
- Add to MetaCart
With the exponential growth of information on the World Wide Web, there is great demand for developing efficient and effective methods for organizing and retrieving the information available. Document clustering plays an important role in information retrieval and taxonomy management for the World Wide Web and remains an interesting and challenging problem in the field of web computing. In this paper we consider document clustering methods exploring textual information, hyperlink structure and co-citation relations. In particular, we apply the normalized-cut clustering method developed in computer vision to the task of hyperdocument clustering. We also explore some theoretical connections of the normalized-cut method to K-means method. We then experiment with normalized-cut method in the context of clustering query result sets for web search engines.
A spectral algorithm for seriation and the consecutive ones problem
- SIAM Journal on Computing
, 1998
"... Abstract. In applications ranging from DNA sequencing through archeological dating to sparse matrix reordering, a recurrent problem is the sequencing of elements in such a way that highly correlated pairs of elements are near each other. That is, given a correlation function f reflecting the desire ..."
Abstract
-
Cited by 35 (0 self)
- Add to MetaCart
Abstract. In applications ranging from DNA sequencing through archeological dating to sparse matrix reordering, a recurrent problem is the sequencing of elements in such a way that highly correlated pairs of elements are near each other. That is, given a correlation function f reflecting the desire for each pair of elements to be near each other, find all permutations π with the property that if π(i) < π(j) < π(k) then f(i, j) ≥ f(i, k) and f(j, k) ≥ f(i, k). This seriation problem is a generalization of the well-studied consecutive ones problem. We present a spectral algorithm for this problem that has a number of interesting features. Whereas most previous applications of spectral techniques provide only bounds or heuristics, our result is an algorithm that correctly solves a nontrivial combinatorial problem. In addition, spectral methods are being successfully applied as heuristics to a variety of sequencing problems, and our result helps explain and justify these applications.
Graph Partitioning Algorithms With Applications To Scientific Computing
- Parallel Numerical Algorithms
, 1997
"... Identifying the parallelism in a problem by partitioning its data and tasks among the processors of a parallel computer is a fundamental issue in parallel computing. This problem can be modeled as a graph partitioning problem in which the vertices of a graph are divided into a specified number of su ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
Identifying the parallelism in a problem by partitioning its data and tasks among the processors of a parallel computer is a fundamental issue in parallel computing. This problem can be modeled as a graph partitioning problem in which the vertices of a graph are divided into a specified number of subsets such that few edges join two vertices in different subsets. Several new graph partitioning algorithms have been developed in the past few years, and we survey some of this activity. We describe the terminology associated with graph partitioning, the complexity of computing good separators, and graphs that have good separators. We then discuss early algorithms for graph partitioning, followed by three new algorithms based on geometric, algebraic, and multilevel ideas. The algebraic algorithm relies on an eigenvector of a Laplacian matrix associated with the graph to compute the partition. The algebraic algorithm is justified by formulating graph partitioning as a quadratic assignment p...
PMRSB: Parallel Multilevel Recursive Spectral Bisection
- In Supercomputing
, 1995
"... The design of a parallel implementation of multilevel recursive spectral bisection on the Cray T3D is described. The code is intended to be fast enough to enable dynamic repartitioning of adaptive meshes and to partition meshes that are too large for workstations. Two innovations in the implementati ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
The design of a parallel implementation of multilevel recursive spectral bisection on the Cray T3D is described. The code is intended to be fast enough to enable dynamic repartitioning of adaptive meshes and to partition meshes that are too large for workstations. Two innovations in the implementation are recursive asynchronous task teams and a parallel version of the multilevel accelerator. A performance improvement of a factor of 140 over the best available serial implementation is demonstrated. 1 Introduction The efficient implementation of unstructured problems on distributed memory parallel computers requires that the data be distributed across the memories in a way that simultaneously balances the work load of the processors and minimizes interprocessor communication. This paper describes a tool called PMRSB, which is implemented on the Cray T3D, that performs this function. Currently, it is restricted I would like to thank Horst Simon for his support and encouragement. I am ...
A Multi-Scale Algorithm for the Linear Arrangement Problem
- Proc. 28th Inter. Workshop on Graph-Theoretic Concepts in Computer Science (WG’02), LNCS 2573
, 2002
"... Finding a linear ordering of the vertices of a graph is a common problem arising in diverse applications. In this paper we present a linear-time algorithm for this problem, based on the multi-scale paradigm. Experimental results are similar to those of the best known approaches, while the running ti ..."
Abstract
-
Cited by 25 (4 self)
- Add to MetaCart
Finding a linear ordering of the vertices of a graph is a common problem arising in diverse applications. In this paper we present a linear-time algorithm for this problem, based on the multi-scale paradigm. Experimental results are similar to those of the best known approaches, while the running time is significantly better, enabling it to deal with much larger graphs. The paper contains a general multi-scale construction, which may be used for a broader range of ordering problems.
TWO IMPROVED ALGORITHMS FOR ENVELOPE AND WAVEFRONT REDUCTION
, 1997
"... Two algorithms for reordering sparse, symmetric matrices or undirected graphs to reduce envelope and wavefront are considered. The rst is a combinatorial algorithm introduced by Sloan and further developed by Du, Reid, and Scott; we describe enhancements to the Sloan algorithm that improve its quali ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
Two algorithms for reordering sparse, symmetric matrices or undirected graphs to reduce envelope and wavefront are considered. The rst is a combinatorial algorithm introduced by Sloan and further developed by Du, Reid, and Scott; we describe enhancements to the Sloan algorithm that improve its quality and reduce its run time. Our test problems fall into two classes with differing asymptotic behavior of their envelope parameters as a function of the weights in the Sloan algorithm. We describe an e cient O(n log n + m) time implementation of the Sloan algorithm, where n is the number of rows (vertices), and m is the number of nonzeros (edges). On a collection of test problems, the improved Sloan algorithm required, on the average, only twice the time required by the simpler Reverse Cuthill-McKee algorithm while improving the mean square wavefront by a factor of three. The second algorithm is a hybrid that combines a spectral algorithm for envelope and wavefront reduction with a refinement step that uses a modified Sloan algorithm. The hybrid algorithm reduces the envelope size and mean square wavefront obtained from the Sloan algorithm at the cost of greater running times. We illustrate how these reductions translate into tangible bene ts for frontal Cholesky factorization and incomplete factorization preconditioning.
Sparse Numerical Linear Algebra: Direct Methods and Preconditioning
, 1996
"... Most of the current techniques for the direct solution of linear equations are based on supernodal or multifrontal approaches. An important feature of these methods is that arithmetic is performed on dense submatrices and Level 2 and Level 3 BLAS (matrixvector and matrix-matrix kernels) can be us ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Most of the current techniques for the direct solution of linear equations are based on supernodal or multifrontal approaches. An important feature of these methods is that arithmetic is performed on dense submatrices and Level 2 and Level 3 BLAS (matrixvector and matrix-matrix kernels) can be used. Both sparse LU and QR factorizations can be implemented within this framework. Partitioning and ordering techniques have seen major activity in recent years. We discuss bisection and multisection techniques, extensions to orderings to block triangular form, and recent improvements and modifications to standard orderings such as minimum degree. We also study advances in the solution of indefinite systems and sparse least-squares problems. The desire to exploit parallelism has been responsible for many of the developments in direct methods for sparse matrices over the last ten years. We examine this aspect in some detail, illustrating how current techniques have been developed or ...

