Results 1  10
of
16
The design and implementation of the parallel outofcore scalapack lu, qr, and cholesky factorization routines. LAPACK Working Note 118 CS97247
, 1997
"... This paper describes the design and implementation of three core factorization routines — LU, QR and Cholesky — included in the outofcore extension of ScaLAPACK. These routines allow the factorization and solution of a dense system that is too large to fit entirely in physical memory. The full mat ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
This paper describes the design and implementation of three core factorization routines — LU, QR and Cholesky — included in the outofcore extension of ScaLAPACK. These routines allow the factorization and solution of a dense system that is too large to fit entirely in physical memory. The full matrix is stored on disk and the factorization routines transfer submatrice panels into memory. The ‘leftlooking ’ columnoriented variant of the factorization algorithm is implemented to reduce the disk I/O traffic. The routines are implemented using a portable I/O interface and utilize high performance ScaLAPACK factorization routines as incore computational kernels. We present the details of the implementation for the outofcore ScaLAPACK factorization routines, as well as performance and scalability results on a Beowulf linux cluster.
A User's Guide to the BLACS v1.0
, 1995
"... The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that is implemented efficiently and uniformly across a large range of distributed memory platforms. The length of time require ..."
Abstract

Cited by 22 (5 self)
 Add to MetaCart
The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that is implemented efficiently and uniformly across a large range of distributed memory platforms. The length of time required to implement efficient distributed memory algorithms makes it impractical to rewrite programs for every new parallel machine. The BLACS exist in order to make linear algebra applications both easier to program and more portable. It is for this reason that the BLACS are used as the communication layer for the ScaLAPACK project, which involves implementing the LAPACK library on distributed memory MIMD machines. This report describes the library which has arisen from this project. This work was supported in part by DARPA and ARO under contract number DAAL0391C0047, and in part by the National Science Foundation Science and Technology Center Cooperative Agreement No. CCR8809615. y Dept...
A User's Guide to the BLACS v1.1
, 1997
"... The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that is implemented efficiently and uniformly across a large range of distributed memory platforms. The length of time req ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that is implemented efficiently and uniformly across a large range of distributed memory platforms. The length of time required to implement efficient distributed memory algorithms makes it impractical to rewrite programs for every new parallel machine. The BLACS exist in order to make linear algebra applications both easier to program and more portable. It is for this reason that the BLACS are used as the communication layer for the ScaLAPACK project, which involves implementing the LAPACK library on distributed memory MIMD machines. This report describes the library which has arisen from this project. This work was supported in part by DARPA and ARO under contract number DAAL0391C0047, and in part by the National Science Foundation Science and Technology Center Cooperative Agreement No. CCR8809615...
Whaley LAPACK Working Note 94, A User's Guide to the BLACS v1.0
, 1995
"... Abstract The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that is implemented efficiently and uniformly across a large range of distributed memory platforms. The length of tim ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Abstract The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that is implemented efficiently and uniformly across a large range of distributed memory platforms. The length of time required to implement efficient distributed memory algorithms makes it impractical to rewrite programs for every new parallel machine. The BLACS exist in order to make linear algebra applications both easier to program and more portable. It is for this reason that the BLACS are used as the communication layer for the ScaLAPACK project, which involves implementing the LAPACK library on distributed memory MIMD machines.
Fault Tolerant Matrix Operations Using Checksum and Reverse Computation
, 1996
"... In this paper, we present a technique, based on checksum and reverse computation, that enables highperformance matrix operations to be faulttolerant with low overhead. We have implemented this technique on five matrix operations: matrix multiplication, Cholesky factorization, LU factorization, QR f ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
In this paper, we present a technique, based on checksum and reverse computation, that enables highperformance matrix operations to be faulttolerant with low overhead. We have implemented this technique on five matrix operations: matrix multiplication, Cholesky factorization, LU factorization, QR factorization and Hessenberg reduction. The overhead of checkpointing and recovery is analyzed both theoretically and experimentally. These analyses confirm that our technique can provide fault tolerance for these highperformance matrix operations with low overhead. 1 Introduction The price and performance of uniprocessor workstations and offtheshelf networking have made networks of workstations (NOWs) a costeffective parallel processing platform that is competitive with supercomputers. The popularity of NOW programming environments like PVM [14] and MPI [17, 30] and the availability of highperformance numerical libraries like ScaLAPACK (Scalable Linear Algebra PACKage) [7] for scienti...
Practical TaskOriented Parallelism for Gaussian Elimination in Distributed Memory
 in Distributed Memory", Linear Algebra and its Applications
, 1998
"... This paper discusses a methodology for easily and efficiently parallelizing sequential algorithms in linear algebra using costeffective networks of workstations, where the algorithm lends itself to parallelism. A particular target architecture of interest is the academic student laboratory, which t ..."
Abstract

Cited by 8 (8 self)
 Add to MetaCart
This paper discusses a methodology for easily and efficiently parallelizing sequential algorithms in linear algebra using costeffective networks of workstations, where the algorithm lends itself to parallelism. A particular target architecture of interest is the academic student laboratory, which typically contains many networked computers that lay idle at night. A case is made for why a taskoriented approach lends itself to the twin goals of programming ease and runtime efficiency. The approach is then described in the context of TOPC (TaskOriented Parallel C), an example of a system to support taskoriented parallelism. In this system, the programmer is relieved of lower level concerns such as latency, bandwidth, and message passing protocols, so as to better concentrate on higher level issues of task granularity and reduction of communication traffic. Gaussian elimination is chosen as the main example, since this algorithm is both widely used and sufficiently interesting to req...
High Performance Fortran Interfacing to ScaLAPACK
, 1996
"... The ScaLAPACK numerical library for MIMD distributedmemory parallel computers comprises highly efficient and robust parallel dense linear algebra routines, implemented using explicit message passing. High Performance Fortran (HPF) was developed as an alternative to the messagepassing paradigm. It ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
The ScaLAPACK numerical library for MIMD distributedmemory parallel computers comprises highly efficient and robust parallel dense linear algebra routines, implemented using explicit message passing. High Performance Fortran (HPF) was developed as an alternative to the messagepassing paradigm. It extends Fortran 90 with directives to automatically distribute data and to parallelize loops, such that all required interprocessor communication is generated by the compiler. While HPF can ease parallelization of many applications, it still does not make sense to reprogram existing libraries like ScaLAPACK. Rather, programmers shouldhave the opportunity to use them from within HPF programs. HPF interfaces to routines in the ScaLAPACK library are presented which are simplified considerably through exploitation of Fortran 90 array features. Substantial performance benefits from interfacing to efficient ScaLAPACK routines are also demonstrated via a comparison with equivalent HPFcoded fun...
A parallel distributed solver for large dense symmetric systems: applications to geodesy and electromagnetism problems, Int
 J. of High Performance Computing Applications
"... In this paper we describe the parallel distributed implementation of a linear solver for largescale applications involving real symmetric positive definite or complex symmetric nonHermitian dense systems. The advantage of this routine is that it performs a Cholesky factorization by requiring half ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
In this paper we describe the parallel distributed implementation of a linear solver for largescale applications involving real symmetric positive definite or complex symmetric nonHermitian dense systems. The advantage of this routine is that it performs a Cholesky factorization by requiring half the storage needed by the standard parallel libraries ScaLAPACK and PLAPACK. Our solver uses a Jvariant Cholesky algorithm and a onedimensional blockcyclic column data distribution but gives similar Gigaflops performance when applied to problems that can be solved on moderately parallel computers with up to 32 processors. Experiments and performance comparisons with ScaLAPACK and PLAPACK on our target applications are presented. These applications arise from the Earth’s gravity field recovery and computational electromagnetics.
Parallel image processing system on a cluster of personal computers
 in VECPAR 2000, 4th Int. Conf
, 2001
"... Abstract. The most demanding image processing applications require real time processing, often using special purpose hardware. The work herein presented refers to the application of cluster computing for o line image processing, where the end user bene ts from the operation of otherwise idle process ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract. The most demanding image processing applications require real time processing, often using special purpose hardware. The work herein presented refers to the application of cluster computing for o line image processing, where the end user bene ts from the operation of otherwise idle processors in the local LAN. The virtual parallel computer is composed by otheshelf personal computers connected by alow cost network, such as a 10 Mbits=s Ethernet. The aim is to minimise the processing time of a high level image processing package. The system developed to manage the parallel execution is described and some results obtained for the parallelisation of high level image processing algorithms are discussed, namely for active contour and modal analysis methods which require the computation of the eigenvectors of a symmetric matrix. 1
An MPI Implementation of the BLACS
 in Proc. 2nd MPI Developers Conf., (MPIDC'96, Notre
, 1996
"... An MPI implementation of the Basic Linear Communication Subprograms (BLACS) is presented. A wide spectrum of MPI functionality has been used to implement BLACS as succinctly as possible, thus making the implementation concise, but still yielding good performance. We discuss some of the implementatio ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
An MPI implementation of the Basic Linear Communication Subprograms (BLACS) is presented. A wide spectrum of MPI functionality has been used to implement BLACS as succinctly as possible, thus making the implementation concise, but still yielding good performance. We discuss some of the implementation details and present performance results for several parallel architectures with different MPI libraries. Finally, we gather our experiences in using MPI, and make some suggestions for the future functionality. 1. Introduction In this paper an MPI [9] implementation of the Basic Linear Algebra Communication Subprograms (BLACS) is presented. The BLACS are message passing routines that communicate matrices among processes arranged in a twodimensional virtual process topology. It forms the basic communication layer for ScaLAPACK [2, 1]. MPI provides the most suitable messagepassing layer for BLACS, since it is widely available, has high level functionality to support the BLACS communication ...