Results 1  10
of
13
Multifrontal Parallel Distributed Symmetric and Unsymmetric Solvers
, 1998
"... We consider the solution of both symmetric and unsymmetric systems of sparse linear equations. A new parallel distributed memory multifrontal approach is described. To handle numerical pivoting efficiently, a parallel asynchronous algorithm with dynamic scheduling of the computing tasks has been dev ..."
Abstract

Cited by 119 (32 self)
 Add to MetaCart
We consider the solution of both symmetric and unsymmetric systems of sparse linear equations. A new parallel distributed memory multifrontal approach is described. To handle numerical pivoting efficiently, a parallel asynchronous algorithm with dynamic scheduling of the computing tasks has been developed. We discuss some of the main algorithmic choices and compare both implementation issues and the performance of the LDL T and LU factorizations. Performance analysis on an IBM SP2 shows the efficiency and the potential of the method. The test problems used are from the RutherfordBoeing collection and from the PARASOL end users.
On the solution of equality constrained quadratic programming problems arising . . .
, 1998
"... ..."
A Parallel Formulation of Interior Point Algorithms
 DEPARTMENT OF COMPUTER SCIENCE, UNIVERSITY OF MINNESOTA
, 1994
"... In recent years, interior point algorithms have been used successfully for solving medium to largesize linear programming (LP) problems. In this paper we describe a highly parallel formulation of the interior point algorithm. A key component of the interior point algorithm is the solution of a s ..."
Abstract

Cited by 17 (9 self)
 Add to MetaCart
In recent years, interior point algorithms have been used successfully for solving medium to largesize linear programming (LP) problems. In this paper we describe a highly parallel formulation of the interior point algorithm. A key component of the interior point algorithm is the solution of a sparse system of linear equations using Cholesky factorization. The performance of parallel Cholesky factorization is determined by (a) the communication overhead incurred by the algorithm, and (b) the load imbalance among the processors. In our parallel interior point algorithm, we use our recently developed parallel multifrontal algorithm that has the smallest communication overhead over all parallel algorithms for Cholesky factorization developed to date. The computation imbalance depends on the shape of the elimination tree associated with the sparse system reordered for factorization. To balance the computation, we implemented and evaluated four di#erent ordering algorithms. Among these algorithms, KernighanLin and spectral nested dissection yield the most balanced elimination trees and greatly increase the amount of parallelism that can be exploited. Our preliminary implementation achieves a speedup as high as 108 on 256processor nCUBE 2 on moderatesize problems.
The Design of Sparse Direct Solvers using ObjectOriented Techniques
, 1999
"... We describe our experience in designing objectoriented software for sparse direct solvers. We discuss, a library of sparse matrix ordering codes, and, a package that implements the factorization and triangular solution steps of a direct solver. We discuss the goals of our design: managing complex ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
We describe our experience in designing objectoriented software for sparse direct solvers. We discuss, a library of sparse matrix ordering codes, and, a package that implements the factorization and triangular solution steps of a direct solver. We discuss the goals of our design: managing complexity, simplicity of interface, exibility, extensibility, safety, and efficiency. High performance is obtained by carefully implementing the computationally intensive kernels and by making several tradeo s to balance the conflicting demands of efficiency and good software design. Some of the missteps that we made in the course of this work are also described.
A high performance sparse Cholesky factorization algorithm for scalable parallel computers
 Department of Computer Science, University of Minnesota
, 1994
"... Abstract This paper presents a new parallel algorithm for sparse matrix factorization. This algorithm uses subforesttosubcube mapping instead of the subtreetosubcube mapping of another recently introduced scheme by Gupta and Kumar [13]. Asymptotically, both formulations are equally scalable on a ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
Abstract This paper presents a new parallel algorithm for sparse matrix factorization. This algorithm uses subforesttosubcube mapping instead of the subtreetosubcube mapping of another recently introduced scheme by Gupta and Kumar [13]. Asymptotically, both formulations are equally scalable on a wide range of architectures and a wide variety of problems. But the subtreetosubcube mapping of the earlier formulation causes significant load imbalance among processors, limiting overall efficiency and speedup. The new mapping largely eliminates the load imbalance among processors. Furthermore, the algorithm has a number of enhancements to improve the overall performance substantially. This new algorithm achieves up to 6GFlops on a 256processor Cray T3D for moderately large problems. To our knowledge, this is the highest performance ever obtained on an MPP for sparse Cholesky factorization.
Distributed Solution Of Sparse Linear Systems
"... We consider the solution of a linear system Ax = b on a distributed memory machine when the matrix A is large, sparse and symmetric positive de nite. In a previous paper we developed an algorithm to compute a llreducing nested dissection ordering of A on a distributed memory machine. We now develop ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
We consider the solution of a linear system Ax = b on a distributed memory machine when the matrix A is large, sparse and symmetric positive de nite. In a previous paper we developed an algorithm to compute a llreducing nested dissection ordering of A on a distributed memory machine. We now develop algorithms for the remaining steps of the solution process. The largegrain task parallelism resulting from sparsity is identified by a tree of separators available from nested dissection. Our parallel algorithms use this separator tree to estimate the structure of the Cholesky factor L and to organize numeric computations as a sequence of dense matrix operations. We present results of an implementation onanIntel iPSC/860 parallel computer. An an alternative to estimating the structure of L using the separator tree, we develop an algorithm to compute the elimination tree on a distributed memory machine. Our algorithm uses the separator tree to achieve better time and space complexity than earlier work.
A Parallel Frontal Solver For Large Scale Process Simulation and Optimization
, 1996
"... For the simulation and optimization of largescale chemical processes, the overall computing time is often dominated by the time needed to solve a large sparse system of linear equations. We present here a new parallel frontal solver which can significantly reduce the wallclock time required to solv ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
For the simulation and optimization of largescale chemical processes, the overall computing time is often dominated by the time needed to solve a large sparse system of linear equations. We present here a new parallel frontal solver which can significantly reduce the wallclock time required to solve these linear equation systems using parallel/vector supercomputers. The algorithm exploits both multiprocessing and vector processing by using a multilevel approach in which frontal elimination is used for the partial factorization of each front. Results on several large scale process simulation and optimization problems are presented. 1 Introduction The solution of realistic, industrialscale simulation and optimization problems is computationally very intense, and may require the use of high performance computing technology to be done in a timely manner. For example, Zitney et al. (1995) described a dynamic simulation problem at Bayer AG requiring 18 hours of CPU time on a CRAY C90 sup...
Rankings of Graphs
, 1995
"... A vertex (edge) coloring c : V ! f1; 2; : : : ; tg (c 0 : E ! f1; 2; : : : ; tg) of a graph G = (V; E) is a vertex (edge) tranking if for any two vertices (edges) of the same color every path between them contains a vertex (edge) of larger color. The vertex ranking number Ø r (G) (edge ranking n ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
A vertex (edge) coloring c : V ! f1; 2; : : : ; tg (c 0 : E ! f1; 2; : : : ; tg) of a graph G = (V; E) is a vertex (edge) tranking if for any two vertices (edges) of the same color every path between them contains a vertex (edge) of larger color. The vertex ranking number Ø r (G) (edge ranking number Ø 0 r (G)) is the smallest value of t such that G has a vertex (edge) tranking. In this paper we study the algorithmic complexity of the vertex ranking and edge ranking problems. Among others it is shown that Ø r (G) can be computed in polynomial time when restricted to graphs with treewidth at most k for any fixed k. We characterize those graphs where the vertex ranking number Ø r and the chromatic number Ø coincide on all induced subgraphs, show that Ø r (G) = Ø(G) implies Ø(G) = !(G) (largest clique size) and give a formula for Ø 0 r (Kn ).