Results 1  10
of
24
METIS  Unstructured Graph Partitioning and Sparse Matrix Ordering System, Version 2.0
, 1995
"... this paper is organized as follows: Section 2 briefly describes the various ideas and algorithms implemented in METIS. Section 3 describes the user interface to the METIS graph partitioning and sparse matrix ordering packages. Sections 4 and 5 describe the formats of the input and output files used ..."
Abstract

Cited by 122 (5 self)
 Add to MetaCart
this paper is organized as follows: Section 2 briefly describes the various ideas and algorithms implemented in METIS. Section 3 describes the user interface to the METIS graph partitioning and sparse matrix ordering packages. Sections 4 and 5 describe the formats of the input and output files used by METIS. Section 6 describes the standalone library that implements the various algorithms implemented in METIS. Section 7 describes the system requirements for the METIS package. Appendix A describes and compares various graph partitioning algorithms that are extensively used.
Highly scalable parallel algorithms for sparse matrix factorization
 IEEE Transactions on Parallel and Distributed Systems
, 1994
"... In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algo ..."
Abstract

Cited by 117 (29 self)
 Add to MetaCart
In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithm substantially improves the state of the art in parallel direct solution of sparse linear systems—both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithm to factor a wide class of sparse matrices (including those arising from two and threedimensional finite element problems) that is asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithm incurs less communication overhead and is more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of our sparse Cholesky factorization algorithm delivers up to 20 GFlops on a Cray T3D for mediumsize structural engineering and linear programming problems. To the best of our knowledge,
Analysis of multilevel graph partitioning
, 1995
"... Recently, a number of researchers have investigated a class of algorithms that are based on multilevel graph partitioning that have moderate computational complexity, and provide excellent graph partitions. However, there exists little theoretical analysis that could explain the ability of multileve ..."
Abstract

Cited by 92 (13 self)
 Add to MetaCart
Recently, a number of researchers have investigated a class of algorithms that are based on multilevel graph partitioning that have moderate computational complexity, and provide excellent graph partitions. However, there exists little theoretical analysis that could explain the ability of multilevel algorithms to produce good partitions. In this paper we present such an analysis. We show under certain reasonable assumptions that even if no refinement is used in the uncoarsening phase, a good bisection of the coarser graph is worse than a good bisection of the finer graph by at most a small factor. We also show that the size of a good vertexseparator of the coarse graph projected to the finer graph (without performing refinement in the uncoarsening phase) is higher than the size of a good vertexseparator of the finer graph by at most a small factor.
Graph partitioning for high performance scientific simulations. Computing Reviews 45(2
, 2004
"... ..."
Graph Partitioning Algorithms With Applications To Scientific Computing
 Parallel Numerical Algorithms
, 1997
"... Identifying the parallelism in a problem by partitioning its data and tasks among the processors of a parallel computer is a fundamental issue in parallel computing. This problem can be modeled as a graph partitioning problem in which the vertices of a graph are divided into a specified number of su ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
Identifying the parallelism in a problem by partitioning its data and tasks among the processors of a parallel computer is a fundamental issue in parallel computing. This problem can be modeled as a graph partitioning problem in which the vertices of a graph are divided into a specified number of subsets such that few edges join two vertices in different subsets. Several new graph partitioning algorithms have been developed in the past few years, and we survey some of this activity. We describe the terminology associated with graph partitioning, the complexity of computing good separators, and graphs that have good separators. We then discuss early algorithms for graph partitioning, followed by three new algorithms based on geometric, algebraic, and multilevel ideas. The algebraic algorithm relies on an eigenvector of a Laplacian matrix associated with the graph to compute the partition. The algebraic algorithm is justified by formulating graph partitioning as a quadratic assignment p...
Sparse Gaussian Elimination on High Performance Computers
, 1996
"... This dissertation presents new techniques for solving large sparse unsymmetric linear systems on high performance computers, using Gaussian elimination with partial pivoting. The efficiencies of the new algorithms are demonstrated for matrices from various fields and for a variety of high performan ..."
Abstract

Cited by 35 (6 self)
 Add to MetaCart
This dissertation presents new techniques for solving large sparse unsymmetric linear systems on high performance computers, using Gaussian elimination with partial pivoting. The efficiencies of the new algorithms are demonstrated for matrices from various fields and for a variety of high performance machines. In the first part we discuss optimizations of a sequential algorithm to exploit the memory hierarchies that exist in most RISCbased superscalar computers. We begin with the leftlooking supernodecolumn algorithm by Eisenstat, Gilbert and Liu, which includes Eisenstat and Liu's symmetric structural reduction for fast symbolic factorization. Our key contribution is to develop both numeric and symbolic schemes to perform supernodepanel updates to achieve better data reuse in cache and floatingpoint register...
Improving The Run Time And Quality Of Nested Dissection Ordering
 SIAM J. SCI. COMPUT
, 1998
"... When performing sparse matrix factorization, the ordering of matrix rows and columns has a dramatic impact on the factorization time. This paper describes an approach to the reordering problem that produces significantly better orderings than prior methods. The algorithm is a hybrid of nested dissec ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
When performing sparse matrix factorization, the ordering of matrix rows and columns has a dramatic impact on the factorization time. This paper describes an approach to the reordering problem that produces significantly better orderings than prior methods. The algorithm is a hybrid of nested dissection and minimum degree ordering, and combines an assortment of different algorithmic advances. New or improved algorithms are described for graph compression, multilevel partitioning, and separator improvement. When these techniques are combined, the resulting orderings average 39% better than minimum degree over a suite of test matrices, while requiring roughly 2.7 times the run time of Liu's multiple minimum degree.