Results 1  10
of
14
Applied Numerical Linear Algebra
 Society for Industrial and Applied Mathematics
, 1997
"... We survey general techniques and open problems in numerical linear algebra on parallel architectures. We rst discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing e cient algorithms. We illustrate ..."
Abstract

Cited by 531 (26 self)
 Add to MetaCart
We survey general techniques and open problems in numerical linear algebra on parallel architectures. We rst discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing e cient algorithms. We illustrate these principles using current architectures and software systems, and by showing how one would implement matrix multiplication. Then, we present direct and iterative algorithms for solving linear systems of equations, linear least squares problems, the symmetric eigenvalue problem, the nonsymmetric eigenvalue problem, and the singular value decomposition. We consider dense, band and sparse matrices.
Improved load distribution in parallel sparse Cholesky factorization
 In Proc. of Supercomputing'94
, 1994
"... Compared to the customary columnoriented approaches, blockoriented, distributedmemory sparse Cholesky factorization benefits from an asymptotic reduction in interprocessor communication volume and an asymptotic increase in the amount of concurrency that is exposed in the problem. Unfortunately, ..."
Abstract

Cited by 38 (1 self)
 Add to MetaCart
Compared to the customary columnoriented approaches, blockoriented, distributedmemory sparse Cholesky factorization benefits from an asymptotic reduction in interprocessor communication volume and an asymptotic increase in the amount of concurrency that is exposed in the problem. Unfortunately, blockoriented approaches (specifically, the block fanout method) have suffered from poor balance of the computational load. As a result, achieved performance can be quite low. This paper investigates the reasons for this load imbalance and proposes simple block mapping heuristics that dramatically improve it. The result is a roughly 20_o increase in realized parallel factorization performance, as demonstrated by performance results from an Intel Paragon TM system. We have achieved performance of nearly 3.2 billion floating point operations per second with this technique on a 196node Paragon system. 1
Sparse Gaussian Elimination on High Performance Computers
, 1996
"... This dissertation presents new techniques for solving large sparse unsymmetric linear systems on high performance computers, using Gaussian elimination with partial pivoting. The efficiencies of the new algorithms are demonstrated for matrices from various fields and for a variety of high performan ..."
Abstract

Cited by 35 (6 self)
 Add to MetaCart
This dissertation presents new techniques for solving large sparse unsymmetric linear systems on high performance computers, using Gaussian elimination with partial pivoting. The efficiencies of the new algorithms are demonstrated for matrices from various fields and for a variety of high performance machines. In the first part we discuss optimizations of a sequential algorithm to exploit the memory hierarchies that exist in most RISCbased superscalar computers. We begin with the leftlooking supernodecolumn algorithm by Eisenstat, Gilbert and Liu, which includes Eisenstat and Liu's symmetric structural reduction for fast symbolic factorization. Our key contribution is to develop both numeric and symbolic schemes to perform supernodepanel updates to achieve better data reuse in cache and floatingpoint register...
Sparse Numerical Linear Algebra: Direct Methods and Preconditioning
, 1996
"... Most of the current techniques for the direct solution of linear equations are based on supernodal or multifrontal approaches. An important feature of these methods is that arithmetic is performed on dense submatrices and Level 2 and Level 3 BLAS (matrixvector and matrixmatrix kernels) can be us ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
Most of the current techniques for the direct solution of linear equations are based on supernodal or multifrontal approaches. An important feature of these methods is that arithmetic is performed on dense submatrices and Level 2 and Level 3 BLAS (matrixvector and matrixmatrix kernels) can be used. Both sparse LU and QR factorizations can be implemented within this framework. Partitioning and ordering techniques have seen major activity in recent years. We discuss bisection and multisection techniques, extensions to orderings to block triangular form, and recent improvements and modifications to standard orderings such as minimum degree. We also study advances in the solution of indefinite systems and sparse leastsquares problems. The desire to exploit parallelism has been responsible for many of the developments in direct methods for sparse matrices over the last ten years. We examine this aspect in some detail, illustrating how current techniques have been developed or ...
Efficient Parallel Solutions Of Large Sparse SPD Systems On DistributedMemory Multiprocessors
 Advanced Computing Research Institute, Center for Theory and Simulation in Science and Engineering, Cornell
"... . We consider several issues involved in the solution of sparse symmetric positive definite systems by multifrontal method on distributedmemory multiprocessors. First, we present a new algorithm for computing the partial factorization of a frontal matrix on a subset of processors which significantl ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
. We consider several issues involved in the solution of sparse symmetric positive definite systems by multifrontal method on distributedmemory multiprocessors. First, we present a new algorithm for computing the partial factorization of a frontal matrix on a subset of processors which significantly improves the performance of a distributed multifrontal algorithm previously designed. Second, new parallel algorithms for computing sparse forward elimination and sparse backward substitution are described. The new algorithms solve the sparse triangular systems in a multifrontal fashion. Numerical experiments run on an Intel iPSC/860 and an Intel iPSC/2 for a set of problems with regular and irregular sparsity structure are reported. More than 180 million flops per second during the numerical factorization are achieved for a threedimensional grid problem on an iPSC/860 machine with 32 processors. Key words. Cholesky factorization, clique tree, distributedmemory multiprocessors, multifro...
On The LU Factorization Of Sequences Of Identically Structured Sparse Matrices Within A Distributed Memory Environment
, 1994
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xii CHAPTERS 1 INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 Topic Statement : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 1.2 Overview : : : : : : : : : : : : : : : : : : : : : : : : : ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xii CHAPTERS 1 INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 Topic Statement : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 1.2 Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 2 BACKGROUND AND RELATED EFFORTS : : : : : : : : : : : : : : 5 2.1 LU Factorization : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.2 Algorithm Stability and Error Analysis : : : : : : : : : : : : : : : 6 2.3 Sparse Matrix Concepts : : : : : : : : : : : : : : : : : : : : : : : 11 2.4 Multifrontal Methods : : : : : : : : : : : : : : : : : : : : : : : : : 19 2.5 Factorization Sequences of Matrices : : : : : : : : : : : : : : : : : 24 2.6 Parallel Matrix Computations : : : : : : : : : : : : : : : : : : : : 28 2.7 Multiprocessor Scheduling : : : : : : : : : : : : : : : : : : : : : : 32 3 IMPLEMENTATION PLATFORM : : : : : : : : : : : : : : : : : : : : 37 3.1 Hardware Ar...
Parallel Direct Methods For Sparse Linear Systems
, 1997
"... We present an overview of parallel direct methods for solving sparse systems of linear equations, focusing on symmetric positive definite systems. We examine the performance implications of the important differences between dense and sparse systems. Our main emphasis is on parallel implementation of ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We present an overview of parallel direct methods for solving sparse systems of linear equations, focusing on symmetric positive definite systems. We examine the performance implications of the important differences between dense and sparse systems. Our main emphasis is on parallel implementation of the numerically intensive factorization process, but we also briefly consider the other major components of direct methods, such as parallel ordering. Introduction In this paper we present a brief overview of parallel direct methods for solving sparse linear systems. Paradoxically, sparse matrix factorization offers additional opportunities for exploiting parallelism beyond those available with dense matrices, yet it is often more difficult to attain good efficiency in the sparse case. We examine both sides of this paradox: the additional parallelism induced by sparsity, and the difficulty in achieving high efficiency in spite of it. We focus on Cholesky factorization, primarily because th...
Developments and Trends in the Parallel Solution of Linear Systems
, 1999
"... In this review paper, we consider some important developments and trends in algorithm design for the solution of linear systems concentrating on aspects that involve the exploitation of parallelism. We briefly discuss the solution of dense linear systems, before studying the solution of sparse equat ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
In this review paper, we consider some important developments and trends in algorithm design for the solution of linear systems concentrating on aspects that involve the exploitation of parallelism. We briefly discuss the solution of dense linear systems, before studying the solution of sparse equations by direct and iterative methods. We consider preconditioning techniques for iterative solvers and discuss some of the present research issues in this field.
HPFIT: A Set of Integrated Tools for the Parallelization of Applications Using High Performance Fortran
, 1996
"... In this report, we present the HPFIT project whose aim is to provide a set of interactive tools integrated in a single environment to help users to parallelize scientific applications to be run on distributed memory parallel computers. HPFIT is built around a restructuring tool called TransTOOL whic ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
In this report, we present the HPFIT project whose aim is to provide a set of interactive tools integrated in a single environment to help users to parallelize scientific applications to be run on distributed memory parallel computers. HPFIT is built around a restructuring tool called TransTOOL which includes an editor, a parser, a dependence analysis tool and an optimization kernel. Moreover, we provide the users with a clean interface, so that developers of tools around High Performance Fortran can easily integrate their software within our tool.
Task Scheduling Using a Block Dependency DAG for BlockOriented Sparse Cholesky Factorization
 in: Proceedings of 14th ACM Symposium on Applied Computing
, 2000
"... Blockoriented sparse Cholesky factorization decomposes a sparse matrix into rectangular subblocks; each block can then be handled as a computational unit in order to increase data reuse in a hierarchical memory system. Also, the factorization method increases the degree of concurrency with the red ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Blockoriented sparse Cholesky factorization decomposes a sparse matrix into rectangular subblocks; each block can then be handled as a computational unit in order to increase data reuse in a hierarchical memory system. Also, the factorization method increases the degree of concurrency with the reduction of communication volumes so that it performs more efficiently on a distributedmemory multiprocessor system than the customary columnoriented factorization method. But until now, mapping of blocks to processors has been designed for load balance with restricted communication patterns. In this paper, we represent tasks using a block dependency DAG that shows the execution behavior of block sparse Cholesky factorization in a distributedmemory system. Since the characteristics of tasks for the block Cholesky factorization are different from those of the conventional parallel task model, we propose a new task scheduling algorithm using a block dependency DAG. The proposed algorithm consi...