## Scientific Computing on Bulk Synchronous Parallel Architectures

### Cached

### Download Links

Citations: | 72 - 13 self |

### BibTeX

@MISC{Bisseling_scientificcomputing,

author = {R. H. Bisseling and W. F. McColl},

title = {Scientific Computing on Bulk Synchronous Parallel Architectures},

year = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

We theoretically and experimentally analyse the efficiency with which a wide range of important scientific computations can be performed on bulk synchronous parallel architectures.

### Citations

1489 | GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems
- Saad, Schultz
- 1986
(Show Context)
Citation Context ...oximate solution vectors x (k) . Two important iterative algorithms are the conjugate gradient algorithm [19] for symmetric positive definite matrices A and the generalised minimal residual algorithm =-=[29]-=- for general matrices. Iterative methods use the matrix A mainly in a multiplicative manner, by computing matrix-vector products of the form u := Av. Iterative methods are increasingly becoming popula... |

1184 |
A bridging model for parallel computation
- Valiant
- 1990
(Show Context)
Citation Context ..., models of parallel computation, scalable parallel computing, scientific computing, sparse matrices, sparse matrix-vector multiplication. 1 Introduction Bulk synchronous parallel (BSP) architectures =-=[26]-=- offer the prospect of achieving both scalable parallel performance and architecture independent parallel software. They provide a robust model on which to base the future development of general purpo... |

905 |
Linear Programming and Extensions
- Dantzig
- 1963
(Show Context)
Citation Context ...traint Ax b (to be interpreted component-wise), where A is an m n matrix, c and x are vectors of length n, and b is a vector of length m. This optimisation problem can be solved by the simplex method =-=[8]-=-, which involves a sequence of rank-one updates of the form A := A + uv T , or by an interior-point method [21], which involves multiplying the matrix A by its transpose and solving a symmetric positi... |

779 |
Methods of conjugate gradients for solving linear systems
- Hestenes, Stiefel
- 1952
(Show Context)
Citation Context ..., see [13]. An alternative approach is to use an iterative algorithm, based on successive improvements of approximate solution vectors x (k) . An important example is the conjugate gradient algorithm =-=[18]-=- for symmetric positive definite matrices. (A variety of iterative algorithms are available, in the form of a high-level implementation, in the TEMPLATES collection [3].) Iterative methods use the mat... |

687 | A new polynomial-time algorithm for linear programming
- Karmarkar
- 1984
(Show Context)
Citation Context ... b is a vector of length m. This optimisation problem can be solved by the simplex method [8], which involves a sequence of rank-one updates of the form A := A + uv T , or by an interior-point method =-=[21]-=-, which involves multiplying the matrix A by its transpose and solving a symmetric positive definite linear system of the form AA T u = v. This system is usually solved by Cholesky factorisation (see ... |

573 |
Direct Method for Sparse Matrices
- Duff, Erisman, et al.
- 1989
(Show Context)
Citation Context ... MLIB is a collection of sparse matrices and their generating programs. Each matrix is represented by a file which contains the nonzero elements of the matrix stored by the coordinate scheme given in =-=[7]-=-, i.e. element a ij 6= 0 is stored as a triple (i; j; x), where i is the row index, j the column index, and x = a ij the numerical value. Presently, the numerical values of MLIB are dummies, except in... |

529 |
der Vorst, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, Society for Industrial and Applied Mathematics
- Barrett, Berry, et al.
- 1994
(Show Context)
Citation Context ...e conjugate gradient algorithm [18] for symmetric positive definite matrices. (A variety of iterative algorithms are available, in the form of a high-level implementation, in the TEMPLATES collection =-=[3]-=-.) Iterative methods use the matrix A mainly in a multiplicative manner, computing matrix-vector products of the form u := Av or u := A T v. Iterative methods are increasingly becoming popular, becaus... |

519 |
Partitioning sparse matrices with eigenvectors of graphs
- Pothen, Simon, et al.
- 1990
(Show Context)
Citation Context ...roblems is a first step towards efficient parallel scientific computing, but to make further progress, this should be combined with developing algorithms that find structure in the problems, see e.g. =-=[27]-=- and [17]. The BSP model facilitates developing such algorithms, because it focuses attention on the partitioning of the problem to be solved and not on the mapping to any particular hardware. The ini... |

446 |
LAPACK Users’ Guide
- Anderson, Bai, et al.
- 1995
(Show Context)
Citation Context ...a wide range of application areas. Often, applications require the solution of large linear systems or large eigensystems. This has led to the extensive use of linear algebra libraries such as LAPACK =-=[2]-=-. To achieve portability, many scientific computer programs rely on using the Basic Linear Algebra Subprograms (BLAS) for their matrix and vector operations. Today, efficient BLAS implementations are ... |

408 | An iteration method for the solution of the eigenvalue problem of linear differential and integral operators
- Lanczos
- 1950
(Show Context)
Citation Context ...the matrix A (and in certain cases A T ) is multiplied by a vector, and the resulting vector is used to update the current approximate solution. Similarly, it is also the basis for the Lanczos method =-=[20]-=-, which is often used to solve sparse symmetric eigenproblems Ax = x, see [13]. 2. Sparse matrix-vector multiplication represents the execution of the finite difference 6 operator in certain PDE solve... |

343 |
Computational Frameworks for the Fast Fourier Transform (SIAM
- Loan
- 1992
(Show Context)
Citation Context ...ns that are not commonly thought of as linear algebra computations. A prime example of the benefit of this approach is the use of matrix-vector notation to formulate Fast Fourier Transform algorithms =-=[28]-=-. An important application area of linear algebra is the solution of partial differential equations (PDEs) by finite difference, finite element, or finite volume methods. These 5 methods require the r... |

294 |
Sparse matrix test problems
- Duff, Grimes, et al.
- 1989
(Show Context)
Citation Context ...the underlying neighbour structure of the graph. This may eliminate unnecessary communication of grid variables. At present, there exists a library of sparse matrices, the Harwell-Boeing (HB) library =-=[8]-=-, which is widely being used to test sparse matrix algorithms. It contains a number of examples of matrices that occur in practical applications. We have included a small subset of the HB library in M... |

238 | Users’ guide for the Harwell-Boeing sparse matrix collection (Release I
- Duff, Grimes, et al.
- 1992
(Show Context)
Citation Context ...andom number generator ran1 from [25]. (All random numbers used in this paper were produced by this generator.) ffl hb.x, the matrix x from the HB collection [8]. For a description of the matrix, see =-=[9]-=-. The subset of the HB collection that is included in MLIB consists of five matrices from various application fields. ffl md.n:r c \Gamma1 , an n \Theta n matrix which corresponds to a random configur... |

208 | General purpose parallel architectures - Valiant - 1990 |

188 |
An improved spectral graph partitioning algorithm for mapping parallel computations
- Hendrickson, Leland
- 1995
(Show Context)
Citation Context ...s a first step towards efficient parallel scientific computing, but to make further progress, this should be combined with developing algorithms that find structure in the problems, see e.g. [27] and =-=[17]-=-. The BSP model facilitates developing such algorithms, because it focuses attention on the partitioning of the problem to be solved and not on the mapping to any particular hardware. The initial tech... |

171 | Direct Bulk-Synchronous Parallel Algorithms
- Gerbessiotis, Valiant
- 1994
(Show Context)
Citation Context ...ny processor during S, and h r be the maximumnumber of messages received by any processor during S. In the original BSP model [26], the cost of S is maxfw; gh s ; gh r ; lg time steps. An alternative =-=[12]-=- is to charge maxfw + gh s ; w + gh r ; lg time steps for superstep S. In this paper, we will charge w + g \Delta maxfh s ; h r g + l time steps for S. The cost of a BSP algorithm is simply the sum of... |

89 |
Matrix Computations: Second Edition
- Golub, Loan
- 1989
(Show Context)
Citation Context .... Usually, the matrix A is sparse, i.e., only O(n) of its n 2 elements are nonzero. Such systems can be solved by a direct algorithm, using for example Cholesky factorisation or LU decomposition, see =-=[13]-=-. An alternative approach is to use an iterative algorithm, based on successive improvements of approximate solution vectors x (k) . An important example is the conjugate gradient algorithm [18] for s... |

87 |
Communication efficient basic linear algebra computations on hypercube architectures
- Johnsson
- 1987
(Show Context)
Citation Context ... p. This distribution is optimal for linear algebra computations such as dense LU decomposition [6, 17]. It is also known under other names such as scattered square decomposition [11], cyclic storage =-=[19], and torus-wrap map-=-ping [17]. (The term "grid distribution" should not be confused with the term "grid" used in the context of 11 f A : n \Theta n; distr(A) = (OE 0 ; OE 1 ); v : n; distr(v) = distr(... |

77 | General purpose parallel computing
- McColl
- 1993
(Show Context)
Citation Context ...hould not be aware of any hierarchical memory organisation based on network locality in the particular physical interconnect structure that is currently used, as in special purpose parallel computing =-=[21]-=-. Instead, performance of the communications network is described only in terms of its global properties, using the parameters l and g. The complexity of a superstep S in a BSP algorithm is determined... |

66 | The torus-wrap mapping for dense matrix calculations on massively parallel computers
- Hendrickson, Womble
- 1994
(Show Context)
Citation Context ...the matrix distribution defined by OE 0 (i) = OE 1 (i) = i mod p p; for 0si ! n; (6) where q 0 = q 1 = p p. This distribution is optimal for linear algebra computations such as dense LU decomposition =-=[6, 17]. It is also kn-=-own under other names such as scattered square decomposition [11], cyclic storage [19], and torus-wrap mapping [17]. (The term "grid distribution" should not be confused with the term "... |

45 |
Solving problems on concurrent processors
- Fox, al
- 1988
(Show Context)
Citation Context ... q0 = q1 = p p. This distribution is optimal for linear algebra computations such as dense LU decomposition [6]. This distribution is known under various names, such as scattered square decomposition =-=[13]-=- and cyclic storage [20]. (The grid distribution of a matrix should not be confused with discrete grids used to model e.g. PDE’s.) Our general distribution scheme leaves much freedom in choosing parti... |

38 | An efficient parallel algorithm for matrix-vector multiplication
- Hendrickson, Leland, et al.
- 1995
(Show Context)
Citation Context ...th high probability, the random/random distribution has a good load balance. (Our algorithm differs from the algorithm of [24] , and also from a similar algorithm of Hendrickson, Leland, and Plimpton =-=[15]-=-, in that we reduce communication by exploiting sparsity and by choosing a vector distribution that matches the distribution of the matrix diagonal. The algorithms of [15, 24], however, require the sa... |

30 | Sparse matrix computations on parallel processor arrays
- Ogielski, Aiello
- 1993
(Show Context)
Citation Context ...alent to randomly permuting the rows and columns, and then distributing the matrix according to the square grid distribution (6). This random permutation procedure was proposed by Ogielski and Aiello =-=[24]-=- for use in a parallel algorithm for sparse matrixvector multiplication. Ogielski and Aiello [24] also show that, with high probability, the random/random distribution has a good load balance. (Our al... |

27 |
An architecture independent programming model for scalable parallel computing
- McColl
- 1993
(Show Context)
Citation Context ...by l and g. This can indeed be done, because the network performance of a BSP computer is captured in global terms using the values l and g. (A language that supports this style of programming is GPL =-=[23]-=-.) The resulting algorithms will therefore be efficiently implementable on a range of BSP architectures with widely differing l and g values. A systematic study of direct bulk synchronous algorithms r... |

15 |
Efficient parallel implementation of molecular dynamics on a toroidal network: I. Parallelizing strategy
- Esselink, Smith, et al.
- 1993
(Show Context)
Citation Context ...f force computations. Furthermore, distributed memory parallel algorithms based on geometric parallelism exploit the sparsity also to reduce the number of communications of current particle positions =-=[10]-=-. Simplifying molecular dynamics simulations by modelling their essence in matrix terms may give remarkable new insights, and may even lead to new ways of performing these 7 simulations. A recent exam... |

15 | Parallel Many-Body Simulations Without All-to-All Communication
- Hendrickson, Plimton
- 1994
(Show Context)
Citation Context ...ir essence in matrix terms may give remarkable new insights, and may even lead to new ways of performing these 7 simulations. A recent example of this approach is the work of Hendrickson and Plimpton =-=[16]-=- on parallel many-body simulations. They achieve a reduction in communication volume by an order of p p, compared to all-to-all communication, by using techniques from dense linear algebra and careful... |

13 |
Principles for a direct SCF approach to LCAO-MO ab initio calculations
- Almlof, Faegri, et al.
- 1982
(Show Context)
Citation Context ... of computer resources. Another application area of linear algebra is ab initio quantum chemistry, and in particular the solution of the time-independent Schrodinger equation by the direct SCF method =-=[1]-=-. The dominant part of this computation is the calculation of two-electron integrals and their incorporation into a matrix (the Fock matrix); other important parts are the computation of the eigenvalu... |

13 | A parallel interior point algorithm for linear programming on a network of 400 transputers - BISSELING, DOUP, et al. - 1993 |

12 |
der Vorst. Parallel LU Decomposition on a Transputer Network
- Bisseling, van
- 1989
(Show Context)
Citation Context ...a dense vector u. The matrix A =(a ij� 0 i� j < n) has size n n and the vectors u =(u i� 0 i<n) and v =(v i� 0 i<n) have length n. We assume that the matrix is distributed by a Cartesian distribution =-=[6]-=-. This means that the processors are numbered by two-dimensional identifiers (s� t), with 0 s<q0 and 0 t < q1, where p = q0q1 is the number of processors, and that there are mappings 0 : f0� 1�:::�n; ... |

8 |
Parallel iterative solution of sparse linear systems on a transputer network
- Bisseling
- 1993
(Show Context)
Citation Context ... a consequence, communication is often needed only within a subset of q 0 or q 1 processors. The two-dimensional numbering of the processors originates in special purpose algorithms for mesh networks =-=[4, 6]-=-. In the present work, however, the two-dimensional numbering reflects a property of the problem to be solved and not of any particular network topology: the BSP model is topology-independent. We assu... |

5 |
Gl: An architecture independent programming language for scalable parallel computing
- McColl
- 1993
(Show Context)
Citation Context ... by l and g. This can indeed be done, because the network performance of a BSP computer is captured in global terms using the values l and g. (A language that supports this style of programming is GL =-=[25]-=-.) The resulting algorithms can therefore be efficiently implemented on a range of BSP architectures with widely differing l and g values. A systematic study of direct bulk synchronous algorithms rema... |

2 |
de Vorst, "Parallel LU decomposition on a transputer network," in Parallel Computing
- Bisseling, van
- 1988
(Show Context)
Citation Context ...matrix-vector multiplication imposes the constraint that the vectors u and v and the diagonal of A are distributed in the same way and that the matrix A is distributed in a so-called Cartesian manner =-=[6]-=-. This means that the p 2 processors used by the algorithm are numbered by two-dimensional Cartesian coordinates (s; t), and that each matrix row is assigned to a set of processors with the same first... |

2 |
Optimal choice of grid points in multidimensional pseudospectral Fourier methods
- Bisseling, Kosloff
- 1988
(Show Context)
Citation Context ...phere packing lattice [7] and assigning particles to the nearest centre of a sphere. (Sphere packing lattices have been used in other areas of scientific computing; for instance, it has been proposed =-=[5]-=- to use them to decrease anisotropy in pseudo-spectral PDE solving on multidimensional grids.) This method splits the universe into Voronoi cells, each of which corresponds to a processor. Figure 4 sh... |

1 |
Lenthe, "Parallelism in computational chemistry
- Guest, Sherwood, et al.
- 1993
(Show Context)
Citation Context ...ultiplication of matrices. Since the latter parts are more difficult to parallelise than the trivially parallel integral calculations, they may well dominate the computing time on a parallel computer =-=[14]-=-. These examples suggest that a first approach to achieving general purpose parallel computing for scientific applications may be based on developing BSP algorithms for linear algebra computations. Fo... |

1 |
Lenthe, Concurrent supercomputing at
- Guest, Sherwood, et al.
- 1990
(Show Context)
Citation Context ...ultiplication of matrices. Since the latter parts are more difficult to parallelise than the trivially parallel integral calculations, they may well dominate the computing time on a parallel computer =-=[16]-=-. The examples above suggest that a first approach to achieving general purpose parallel computing for scientific applications may be based on developing BSP algorithms for linear algebra computations... |

1 | LAPACK Users' Guide - Chem - 1982 |