Results 1 - 10
of
17
The design and implementation of the parallel out-of-core scalapack lu, qr, and cholesky factorization routines. LAPACK Working Note 118 CS-97-247
, 1997
"... This paper describes the design and implementation of three core factorization routines — LU, QR and Cholesky — included in the out-of-core extension of ScaLAPACK. These routines allow the factorization and solution of a dense system that is too large to fit entirely in physical memory. The full mat ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
This paper describes the design and implementation of three core factorization routines — LU, QR and Cholesky — included in the out-of-core extension of ScaLAPACK. These routines allow the factorization and solution of a dense system that is too large to fit entirely in physical memory. The full matrix is stored on disk and the factorization routines transfer submatrice panels into memory. The ‘left-looking ’ column-oriented variant of the factorization algorithm is implemented to reduce the disk I/O traffic. The routines are implemented using a portable I/O interface and utilize high performance ScaLAPACK factorization routines as incore computational kernels. We present the details of the implementation for the out-of-core ScaLAPACK factorization routines, as well as performance and scalability results on a Beowulf linux cluster.
A User's Guide to the BLACS v1.0
, 1995
"... The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that is implemented efficiently and uniformly across a large range of distributed memory platforms. The length of time require ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that is implemented efficiently and uniformly across a large range of distributed memory platforms. The length of time required to implement efficient distributed memory algorithms makes it impractical to rewrite programs for every new parallel machine. The BLACS exist in order to make linear algebra applications both easier to program and more portable. It is for this reason that the BLACS are used as the communication layer for the ScaLAPACK project, which involves implementing the LAPACK library on distributed memory MIMD machines. This report describes the library which has arisen from this project. This work was supported in part by DARPA and ARO under contract number DAAL03-91-C-0047, and in part by the National Science Foundation Science and Technology Center Cooperative Agreement No. CCR-8809615. y Dept...
A User's Guide to the BLACS v1.1
, 1997
"... The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that is implemented efficiently and uniformly across a large range of distributed memory platforms. The length of time req ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that is implemented efficiently and uniformly across a large range of distributed memory platforms. The length of time required to implement efficient distributed memory algorithms makes it impractical to rewrite programs for every new parallel machine. The BLACS exist in order to make linear algebra applications both easier to program and more portable. It is for this reason that the BLACS are used as the communication layer for the ScaLAPACK project, which involves implementing the LAPACK library on distributed memory MIMD machines. This report describes the library which has arisen from this project. This work was supported in part by DARPA and ARO under contract number DAAL03-91-C-0047, and in part by the National Science Foundation Science and Technology Center Cooperative Agreement No. CCR-8809615...
Whaley LAPACK Working Note 94, A User's Guide to the BLACS v1.0
, 1995
"... Abstract The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that is implemented efficiently and uniformly across a large range of distributed memory platforms. The length of tim ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Abstract The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that is implemented efficiently and uniformly across a large range of distributed memory platforms. The length of time required to implement efficient distributed memory algorithms makes it impractical to rewrite programs for every new parallel machine. The BLACS exist in order to make linear algebra applications both easier to program and more portable. It is for this reason that the BLACS are used as the communication layer for the ScaLAPACK project, which involves implementing the LAPACK library on distributed memory MIMD machines.
Fault Tolerant Matrix Operations Using Checksum and Reverse Computation
, 1996
"... In this paper, we present a technique, based on checksum and reverse computation, that enables highperformance matrix operations to be fault-tolerant with low overhead. We have implemented this technique on five matrix operations: matrix multiplication, Cholesky factorization, LU factorization, QR f ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
In this paper, we present a technique, based on checksum and reverse computation, that enables highperformance matrix operations to be fault-tolerant with low overhead. We have implemented this technique on five matrix operations: matrix multiplication, Cholesky factorization, LU factorization, QR factorization and Hessenberg reduction. The overhead of checkpointing and recovery is analyzed both theoretically and experimentally. These analyses confirm that our technique can provide fault tolerance for these high-performance matrix operations with low overhead. 1 Introduction The price and performance of uniprocessor workstations and off-the-shelf networking have made networks of workstations (NOWs) a cost-effective parallel processing platform that is competitive with supercomputers. The popularity of NOW programming environments like PVM [14] and MPI [17, 30] and the availability of high-performance numerical libraries like ScaLAPACK (Scalable Linear Algebra PACKage) [7] for scienti...
Practical Task-Oriented Parallelism for Gaussian Elimination in Distributed Memory
- in Distributed Memory", Linear Algebra and its Applications
, 1998
"... This paper discusses a methodology for easily and efficiently parallelizing sequential algorithms in linear algebra using cost-effective networks of workstations, where the algorithm lends itself to parallelism. A particular target architecture of interest is the academic student laboratory, which t ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
This paper discusses a methodology for easily and efficiently parallelizing sequential algorithms in linear algebra using cost-effective networks of workstations, where the algorithm lends itself to parallelism. A particular target architecture of interest is the academic student laboratory, which typically contains many networked computers that lay idle at night. A case is made for why a task-oriented approach lends itself to the twin goals of programming ease and run-time efficiency. The approach is then described in the context of TOP-C (Task-Oriented Parallel C), an example of a system to support task-oriented parallelism. In this system, the programmer is relieved of lower level concerns such as latency, bandwidth, and message passing protocols, so as to better concentrate on higher level issues of task granularity and reduction of communication traffic. Gaussian elimination is chosen as the main example, since this algorithm is both widely used and sufficiently interesting to req...
High Performance Fortran Interfacing to ScaLAPACK
, 1996
"... The ScaLAPACK numerical library for MIMD distributed-memory parallel computers comprises highly efficient and robust parallel dense linear algebra routines, implemented using explicit message passing. High Performance Fortran (HPF) was developed as an alternative to the message-passing paradigm. It ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The ScaLAPACK numerical library for MIMD distributed-memory parallel computers comprises highly efficient and robust parallel dense linear algebra routines, implemented using explicit message passing. High Performance Fortran (HPF) was developed as an alternative to the message-passing paradigm. It extends Fortran 90 with directives to automatically distribute data and to parallelize loops, such that all required inter-processor communication is generated by the compiler. While HPF can ease parallelization of many applications, it still does not make sense to re-program existing libraries like ScaLAPACK. Rather, programmers shouldhave the opportunity to use them from within HPF programs. HPF interfaces to routines in the ScaLAPACK library are presented which are simplified considerably through exploitation of Fortran 90 array features. Substantial performance benefits from interfacing to efficient ScaLAPACK routines are also demonstrated via a comparison with equivalent HPF-coded fun...
A parallel distributed solver for large dense symmetric systems: applications to geodesy and electromagnetism problems, Int
- J. of High Performance Computing Applications
"... In this paper we describe the parallel distributed implementation of a linear solver for large-scale applications involving real symmetric positive definite or complex symmetric non-Hermitian dense systems. The advantage of this routine is that it performs a Cholesky factorization by requiring half ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In this paper we describe the parallel distributed implementation of a linear solver for large-scale applications involving real symmetric positive definite or complex symmetric non-Hermitian dense systems. The advantage of this routine is that it performs a Cholesky factorization by requiring half the storage needed by the standard parallel libraries ScaLAPACK and PLAPACK. Our solver uses a Jvariant Cholesky algorithm and a one-dimensional blockcyclic column data distribution but gives similar Gigaflops performance when applied to problems that can be solved on moderately parallel computers with up to 32 processors. Experiments and performance comparisons with ScaLAPACK and PLAPACK on our target applications are presented. These applications arise from the Earth’s gravity field recovery and computational electromagnetics.
Parallel image processing system on a cluster of personal computers
- in VECPAR 2000, 4th Int. Conf
, 2001
"... Abstract. The most demanding image processing applications require real time processing, often using special purpose hardware. The work herein presented refers to the application of cluster computing for o line image processing, where the end user bene ts from the operation of otherwise idle process ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. The most demanding image processing applications require real time processing, often using special purpose hardware. The work herein presented refers to the application of cluster computing for o line image processing, where the end user bene ts from the operation of otherwise idle processors in the local LAN. The virtual parallel computer is composed by o-the-shelf personal computers connected by alow cost network, such as a 10 Mbits=s Ethernet. The aim is to minimise the processing time of a high level image processing package. The system developed to manage the parallel execution is described and some results obtained for the parallelisation of high level image processing algorithms are discussed, namely for active contour and modal analysis methods which require the computation of the eigenvectors of a symmetric matrix. 1
An MPI Implementation of the BLACS
- in Proc. 2nd MPI Developers Conf., (MPIDC'96, Notre
, 1996
"... An MPI implementation of the Basic Linear Communication Subprograms (BLACS) is presented. A wide spectrum of MPI functionality has been used to implement BLACS as succinctly as possible, thus making the implementation concise, but still yielding good performance. We discuss some of the implementatio ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
An MPI implementation of the Basic Linear Communication Subprograms (BLACS) is presented. A wide spectrum of MPI functionality has been used to implement BLACS as succinctly as possible, thus making the implementation concise, but still yielding good performance. We discuss some of the implementation details and present performance results for several parallel architectures with different MPI libraries. Finally, we gather our experiences in using MPI, and make some suggestions for the future functionality. 1. Introduction In this paper an MPI [9] implementation of the Basic Linear Algebra Communication Subprograms (BLACS) is presented. The BLACS are message passing routines that communicate matrices among processes arranged in a twodimensional virtual process topology. It forms the basic communication layer for ScaLAPACK [2, 1]. MPI provides the most suitable message-passing layer for BLACS, since it is widely available, has high level functionality to support the BLACS communication ...

