## A Comparison of Parallel Solvers for Diagonally Dominant and General Narrow-Banded Linear Systems II (1999)

### Cached

### Download Links

- [www.netlib.org]
- [www.netlib.org]
- [www.inf.ethz.ch]
- [www.netlib.org]
- DBLP

### Other Repositories/Bibliography

Citations: | 4 - 1 self |

### BibTeX

@MISC{Arbenz99acomparison,

author = {Peter Arbenz and Andrew Cleary and Jack Dongarra and Markus Hegland},

title = {A Comparison of Parallel Solvers for Diagonally Dominant and General Narrow-Banded Linear Systems II},

year = {1999}

}

### OpenURL

### Abstract

. We continue the comparison of parallel algorithms for solving diagonally dominant and general narrow-banded linear systems of equations that we started in [2]. The solvers compared are the banded system solvers of ScaLAPACK [6] and those investigated by Arbenz and Hegland [1, 5]. We present the numerical experiments that we conducted on the IBM SP/2. 1 Introduction In this note we continue the comparison of direct parallel solvers for narrowbanded systems of linear equations Ax = b (1) that we started in [2]. The n-by-n matrix A has a narrow band if its lower halfbandwidth k l and upper half-bandwidth ku are much smaller than the order of A, k l + ku ø n. We separately compare implementations of an algorithm for solving diagonally dominant and of an algorithm for solving arbitrary band systems. The algorithm for the diagonally dominant band system can be interpreted as a generalization of the well known tridiagonal cyclic reduction (CR), or more usefully, as Gaussian elimination...

### Citations

1087 |
A Practical Guide to Splines
- Boor
- 1978
(Show Context)
Citation Context ... a canonical form of recursive equations, as well as having direct applications. The latter include the solution of eigenvalue problems with inverse iteration [17], spline interpolation and smoothing =-=[9]-=-, and the solution of boundary value problems for ordinary differential equations using finite difference or finite element methods [27]. For these y Institute of Scientific Computing, Swiss Federal I... |

847 | Accuracy and Stability of Numerical Algorithms - Higham - 2002 |

603 |
Introduction to Numerical Analysis
- Stoer, Bulirsch
- 1980
(Show Context)
Citation Context ... with inverse iteration [17], spline interpolation and smoothing [9], and the solution of boundary value problems for ordinary differential equations using finite difference or finite element methods =-=[27]-=-. For these y Institute of Scientific Computing, Swiss Federal Institute of Technology (ETH), 8092 Zurich, Switzerland (arbenz@inf.ethz.ch) z Center for Applied Scientific Computing, Lawrence Livermor... |

140 | Introduction to Parallel and Vector Solutions of Linear Systems - Ortega - 1988 |

84 |
Introduction to Parallel Computing
- Kumar, Grama, et al.
- 1994
(Show Context)
Citation Context ...ve assumed that the time for the transmission of a message of length n floating point numbers from one to another processor is independent of the processor distance and can be represented in the form =-=[24]-=- t s + nt w : t s denotes the startup time relative to the time of a floating point operation, i.e. the number of flops that can be executed during the startup time. t w denotes the number of floating... |

35 |
D.: On stable parallel linear systems solvers
- Sameh, Kuck
- 1978
(Show Context)
Citation Context ...e same algorithm and its implementation on the Thinking Machine CM-2 which required a different model for the complexity of the interprocessor communication. Related algorithms have been presented in =-=[26, 15, 14, 7, 12, 28]-=- for shared memory multiprocessors with a small number of processors. The algorithm that we consider here can be interpreted as a generalization of cyclic reduction (CR), or more usefully, as Gaussian... |

28 | Multifrontal QR Factorization in a Multiprocessor environment
- Amestoy, Duff, et al.
- 1996
(Show Context)
Citation Context ...mark 2. In (3.4) instead of the LU factorization a QR factorization could be computed [6, 20]. This doubles the computational effort but enhances stability. Similar ideas are pursued by Amestoy et al.=-=[1]-=- for the parallel computation of the QR factorization of large sparse matrices. 4. Numerical experiments on the Intel Paragon. We compared the algorithms described in the previous two sections by mean... |

21 |
Parallel Algorithms for Banded Linear Systems
- Wright
- 1991
(Show Context)
Citation Context ...e same algorithm and its implementation on the Thinking Machine CM-2 which required a different model for the complexity of the interprocessor communication. Related algorithms have been presented in =-=[26, 15, 14, 7, 12, 28]-=- for shared memory multiprocessors with a small number of processors. The algorithm that we consider here can be interpreted as a generalization of cyclic reduction (CR), or more usefully, as Gaussian... |

19 | A survey of direct parallel algorithms for banded linear systems
- Arbenz, Gander
- 1994
(Show Context)
Citation Context ...g the arrays to just the needed size. When compiling we chose the highest optimization level and turned off IEEE arithmetic. IEEE arithmetic turned on lead to erratic non-reproducible execution times =-=[4]-=-. We begin with the discussion of the diagonally dominant case (ff = 100). In Tab. 4.2 the execution times are listed for all problem sizes. For the ScaLAPACK and the Arbenz/Hegland (AH) implementatio... |

14 |
Solving banded systems on a parallel processor
- Dongarra, Johnsson
- 1987
(Show Context)
Citation Context ...e same algorithm and its implementation on the Thinking Machine CM-2 which required a different model for the complexity of the interprocessor communication. Related algorithms have been presented in =-=[26, 15, 14, 7, 12, 28]-=- for shared memory multiprocessors with a small number of processors. The algorithm that we consider here can be interpreted as a generalization of cyclic reduction (CR), or more usefully, as Gaussian... |

14 |
Solving narrow banded systems on ensemble architectures
- Johnsson
- 1985
(Show Context)
Citation Context ...ssed in detail in [3, 11] where the performance of implementations of this algorithm on distributed memory multicomputers like the Intel Paragon [3] or the IBM SP/2 [11] is analyzed as well. Johnsson =-=[23]-=- considered the same algorithm and its implementation on the Thinking Machine CM-2 which required a different model for the complexity of the interprocessor communication. Related algorithms have been... |

10 | Multiprocessor schemes for solving block tridiagonal linear systems - Berry, Sameh - 1988 |

10 |
On some parallel banded system solvers
- Dongarra, Sameh
- 1984
(Show Context)
Citation Context |

9 | The design, implementation and evaluation of a symmetric banded linear solver for distributed-memory parallel computers
- Gupta, Gustavson, et al.
- 1998
(Show Context)
Citation Context ...full system solvers. In particular, parallel algorithms using two-dimensional mappings (such as the torus-wrap mapping) and Gaussian elimination with partial pivoting have achieved reasonable success =-=[16, 10, 18]-=-. The parallelism of these algorithms is the same as that of dense matrix algorithms applied to matrices of size minfk l ; k u g, independent of n, from which it is obvious that small bandwidths sever... |

8 |
Parallel algorithms for the solution of narrow banded systems
- Conroy
- 1989
(Show Context)
Citation Context |

7 | On experiments with a parallel direct solver for diagonally dominant banded linear systems
- Arbenz
- 1996
(Show Context)
Citation Context ...idth is very small compared with the matrix order and is typically between 1 and 100. The solvers compared are the banded system solvers of ScaLAPACK [11] and those investigated by Arbenz and Hegland =-=[3, 6]-=-. For the diagonally dominant case, the algorithms are analogs of the well-known tridiagonal cyclic reduction algorithm, while the inspiration for the general case is the lesser-known bidiagonal cycli... |

6 | Implementation and performance of scalable scientific library subroutines on Fujitsu's VPP500 parallel-vector supercomputer
- Brent, Cleary, et al.
- 1994
(Show Context)
Citation Context ...full system solvers. In particular, parallel algorithms using two-dimensional mappings (such as the torus-wrap mapping) and Gaussian elimination with partial pivoting have achieved reasonable success =-=[16, 10, 18]-=-. The parallelism of these algorithms is the same as that of dense matrix algorithms applied to matrices of size minfk l ; k u g, independent of n, from which it is obvious that small bandwidths sever... |

5 | Implementation in ScaLAPACK of Divide-and-Conquer Algorithms for Banded and Tridiagonal Systems
- Cleary, Dongarra
- 1997
(Show Context)
Citation Context ...s of equations. Narrow-banded means that the bandwidth is very small compared with the matrix order and is typically between 1 and 100. The solvers compared are the banded system solvers of ScaLAPACK =-=[11]-=- and those investigated by Arbenz and Hegland [3, 6]. For the diagonally dominant case, the algorithms are analogs of the well-known tridiagonal cyclic reduction algorithm, while the inspiration for t... |

5 | Divide and conquer for the solution of banded linear systems of equations
- Hegland
- 1996
(Show Context)
Citation Context ...pplied to a permuted (nonsymmetrically for the general case) system of equations (PAQ T )Qx = Pb. Block bidiagonal cyclic reduction for the solution of banded linear systems was introduced by Hegland =-=[19]-=-. In section 4 we compare the ScaLAPACK implementations [11] of the two algorithms above with the implementations by Arbenz [3] and Arbenz and Hegland [6], respectively, by means of numerical experime... |

3 |
Software and guide are available from Netlib at URL http://www.netlib.org/lapack
- Anderson, Bai, et al.
- 1994
(Show Context)
Citation Context ...d by partial pivoting. Is PARALLEL SOLVERS FOR NARROW-BANDED LINEAR SYSTEMS 3 it necessary to have a pivoting as well as a non-pivoting algorithm for nonsymmetric band systems in ScaLAPACK? In LAPACK =-=[2]-=-, for instance, there are only pivoting subroutine for solving dense and banded systems of equations, respectively. 2. Parallel Gaussian elimination for the diagonally dominant case. In this section w... |

3 |
Radicati G. Factorization of band matrices using level-3
- J, Mayes
- 1990
(Show Context)
Citation Context ...full system solvers. In particular, parallel algorithms using two-dimensional mappings (such as the torus-wrap mapping) and Gaussian elimination with partial pivoting have achieved reasonable success =-=[16, 10, 18]-=-. The parallelism of these algorithms is the same as that of dense matrix algorithms applied to matrices of size minfk l ; k u g, independent of n, from which it is obvious that small bandwidths sever... |

2 | Scalable stable solvers for non-symmetric narrow-banded linear systems
- Arbenz, Hegland
- 1997
(Show Context)
Citation Context ...ian elimination applied to a symmetrically permuted system of equations (PAP T )Px = Pb. The latter interpretation has important consequences, such as it implies that the algorithm is backward stable =-=[5]-=-. It can also be used to show that the permutation necessarily causes Gaussian elimination to generate fill-in which in turn increases the computational complexity as well as the memory requirements o... |

2 | The use of computational kernels in full and sparse linear solvers, e cient code design on high-performance
- Dayde, Duff
- 1997
(Show Context)
Citation Context ...s. In the AH implementation the above mentioned auxiliary arrays are stored as `lying' blocks to further improve the scalability and to better exploit the RISC architecture of the underlying hardware =-=[13]-=-. The speedups of the AH implementation relative to the 2-processor performance is very close to ideal up to at least 64 processors. The ScaLAPACK implementation does not scale so well. For large proc... |

2 |
Algorithms for block bidiagonal systems on vector and parallel computers
- Hegland, Osborne
- 1998
(Show Context)
Citation Context ..., as in this case, also the serial code generates fill-in. PARALLEL SOLVERS FOR NARROW-BANDED LINEAR SYSTEMS 13 Remark 2. In (3.4) instead of the LU factorization a QR factorization could be computed =-=[6, 20]-=-. This doubles the computational effort but enhances stability. Similar ideas are pursued by Amestoy et al.[1] for the parallel computation of the QR factorization of large sparse matrices. 4. Numeric... |

2 | Software and guide are available from Netlib at URL - Blackford, Choi, et al. - 1997 |

1 |
Software and guide are available from Netlib at URL 18
- Blackford, Choi, et al.
- 1997
(Show Context)
Citation Context ...i.e. the maximal number of processor p that can be exploited for parallel execution, p ! (n+k)=(2k). The structure of A and its submatrices is depicted in Fig. 2.1(a) for the case p = 4. In ScaLAPACK =-=[11, 8]-=-, the local portions of A on each processor are stored in the LAPACK scheme as depicted in Fig. 2.2. This input scheme requires a preliminary step of moving the triangular block D L i from processor i... |