Results 11 -
16 of
16
Parallel Prefix Computation with Few Processors
, 1992
"... We present a parallel prefix algorithm which uses... ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We present a parallel prefix algorithm which uses...
Generalized Scans and Tri-Diagonal Systems
"... Motivated by the analysis of known parallel techniques for the solution of linear tridiagonal system, weintroduce generalized scans, a class of recursively de#ned lengthpreserving, sequence-to-sequence transformations that generalize the well-known pre#x computations #scans#. Generalized scan functi ..."
Abstract
- Add to MetaCart
Motivated by the analysis of known parallel techniques for the solution of linear tridiagonal system, weintroduce generalized scans, a class of recursively de#ned lengthpreserving, sequence-to-sequence transformations that generalize the well-known pre#x computations #scans#. Generalized scan functions are described in terms of three algorithmic phases, the reduction phase that saves data for the third or expansion phase and prepares data for the second phase which is a recursiveinvocation of the same function on one fewer variable. Both the reduction and expansion phases operate on bounded numberofvariables, a key feature for their parallelization. Generalized scans enjoya property, called here protoassociativity, that gives rise to ordinary associativity when generalized scans are specialized to ordinary scans. We show that the solution of positive de#nite block tridiagonal linear systems can be cast as a generalized scan, thereby shedding light on the underlying structure enabling k...
Optimal Parallel Prefix on Mesh Architectures
"... Algorithms for efficient implementation of computation of prefix products on mesh-connected... ..."
Abstract
- Add to MetaCart
Algorithms for efficient implementation of computation of prefix products on mesh-connected...
Fast Computation of Divided Differences and Parallel Hermite Interpolation
"... We present parallel algorithms for fast polynomial interpolation. These algorithms can be used for constructing and evaluating polynomials interpolating the function values and its derivatives of arbitrary order (Hermite interpolation). For interpolation, the parallel arithmetic complexity is O(log& ..."
Abstract
- Add to MetaCart
We present parallel algorithms for fast polynomial interpolation. These algorithms can be used for constructing and evaluating polynomials interpolating the function values and its derivatives of arbitrary order (Hermite interpolation). For interpolation, the parallel arithmetic complexity is O(log² M + log N) for large M and N...
On Fast Computation of Continued Fractions
, 1991
"... We give an O(log n) algorithm to compute the nth convergent of a periodic continued fraction. The algorithm is based on matrix representation of continued fractions, due to Milne-Thomson. This approach also allows for the computation of first n convergents of a general continued fraction in O(log n) ..."
Abstract
- Add to MetaCart
We give an O(log n) algorithm to compute the nth convergent of a periodic continued fraction. The algorithm is based on matrix representation of continued fractions, due to Milne-Thomson. This approach also allows for the computation of first n convergents of a general continued fraction in O(log n) time using O(n/log n) processors.
Integrating synchronous and asynchronous paradigms: the Fork95 parallel programming language
"... The SB-PRAM is a lock-step-synchronous, massively parallel multiprocessor currently being built at Saarbrucken University, with up to 4096 RISC-style processing elements and with a (from the programmer's view) physically shared memory of up to 2GByte with uniform memory access time. Fork95 is a rede ..."
Abstract
- Add to MetaCart
The SB-PRAM is a lock-step-synchronous, massively parallel multiprocessor currently being built at Saarbrucken University, with up to 4096 RISC-style processing elements and with a (from the programmer's view) physically shared memory of up to 2GByte with uniform memory access time. Fork95 is a redesign of the Pram language FORK, based on ANSI C, with additional constructs to create parallel processes, hierarchically dividing processor groups into subgroups, managing shared and private address subspaces. Fork95 makes the assembly-level synchronicity of the underlying hardware available to the programmer at the language level. Nevertheless, it provides comfortable facilities for locally asynchronous computation where desired by the programmer. We show that Fork95 o ers full expressibility for the implementation of practically relevant parallel algorithms. We do this by examining all known parallel programming paradigms used for the parallel solution of real{world problems, such as strictly synchronous execution, asynchronous processes, pipelining and systolic algorithms, parallel divide and conquer, parallel pre x computation, data parallelism, etc., and show how these parallel programming paradigms are supported bytheFork95 language and run time system. 1

