Results 1 
6 of
6
Special Purpose Parallel Computing
 Lectures on Parallel Computation
, 1993
"... A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing ..."
Abstract

Cited by 77 (5 self)
 Add to MetaCart
A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing [365] demonstrated that, in principle, a single general purpose sequential machine could be designed which would be capable of efficiently performing any computation which could be performed by a special purpose sequential machine. The importance of this universality result for subsequent practical developments in computing cannot be overstated. It showed that, for a given computational problem, the additional efficiency advantages which could be gained by designing a special purpose sequential machine for that problem would not be great. Around 1944, von Neumann produced a proposal [66, 389] for a general purpose storedprogram sequential computer which captured the fundamental principles of...
The Multicomputer Toolbox: Scalable Parallel Libraries for LargeScale Concurrent Applications
, 1994
"... In this paper, we consider what is required to develop parallel algorithms for engineering applications on messagepassing concurrent computers (multicomputers). At Caltech, the first author studied the concurrent dynamic simulation of distillation column networks [19, 21, 20, 14]. This research was ..."
Abstract

Cited by 19 (11 self)
 Add to MetaCart
(Show Context)
In this paper, we consider what is required to develop parallel algorithms for engineering applications on messagepassing concurrent computers (multicomputers). At Caltech, the first author studied the concurrent dynamic simulation of distillation column networks [19, 21, 20, 14]. This research was accomplished with attention to portability, high performance and reusability of the underlying algorithms. Emerging from this work are several key results: first, a methodology for explicit parallelization of algorithms and for the evaluation of parallel algorithms in the distributedmemory context; second, a set of portable, reusable numerical algorithms constituting a "Multicomputer Toolbox," suitable for use on both existing and future mediumgrain concurrent computers; third, a working prototype simulation system, Cdyn, for distillation problems, that can be enhanced (with additional work) to address more complex flowsheeting problems in chemical engineering; fourth, ideas for how to a...
Parallel Pivots LU Algorithm on the Cray T3E
, 1999
"... . Solving large nonsymmetric sparse linear systems on distributed memory multiprocessors is an active research area. We present a looplevel parallelized generic algorithm which comprises analysefactorize and solve stages. To further exploit matrix sparsity and parallelism, the analyse step looks f ..."
Abstract
 Add to MetaCart
(Show Context)
. Solving large nonsymmetric sparse linear systems on distributed memory multiprocessors is an active research area. We present a looplevel parallelized generic algorithm which comprises analysefactorize and solve stages. To further exploit matrix sparsity and parallelism, the analyse step looks for a set of compatible pivots. Sparse techniques are applied until the reduced submatrix reaches a threshold density. At this point, a switch to dense routines takes place in both analysefactorize and solve stages. The SPMD code follows a sparse cyclic distribution to map the system matrix onto a P \Theta Q processor mesh. Experimental results show a good behavior of our sequential algorithm compared with a standard generic solver: the MA48 routine. Additionally, a parallel version on the Cray T3E exhibits high performance in terms of speedup and efficiency. 1 Introduction The kernel of many computerassisted scientific applications is to solve large sparse linear systems. We find example...
PARALLEL SPARSE LU DECOMPOSITION ON A MESH NETWORK OF TRANSPUTERS*
"... Abstract. A parallel algorithm is presented for the LU decomposition of a general sparse matrix on a distributedmemory MIMD multiprocessor with a square mesh communication network. In the algorithm, matrix elements are assigned to processors according to the grid distribution. Each processor repres ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. A parallel algorithm is presented for the LU decomposition of a general sparse matrix on a distributedmemory MIMD multiprocessor with a square mesh communication network. In the algorithm, matrix elements are assigned to processors according to the grid distribution. Each processor represents the nonzero elements of its part of the matrix by a local, ordered, twodimensional linkedlist data structure. The complexity of important operations on this data structure and on several others is analysed. At each step of the algorithm, a parallel search for a set of m compatible pivot elements is performed. The Markowitz counts of the pivot elements are close to minimum, to preserve the sparsity of the matrix. The pivot elements also satisfy a threshold criterion, to ensure numerical stability. The compatibility of the m pivots enables the simultaneous elimination ofm pivot rows and m pivot columns in a rankm update of the reduced matrix. Experimental results on a network of 400 transputers are presented for a set of test matrices from the HarwellBoeing sparse matrix collection.