Results 1  10
of
16
Powerlist: a structure for parallel recursion
 ACM Transactions on Programming Languages and Systems
, 1994
"... Many data parallel algorithms – Fast Fourier Transform, Batcher’s sorting schemes and prefixsum – exhibit recursive structure. We propose a data structure, powerlist, that permits succinct descriptions of such algorithms, highlighting the roles of both parallelism and recursion. Simple algebraic pro ..."
Abstract

Cited by 60 (2 self)
 Add to MetaCart
Many data parallel algorithms – Fast Fourier Transform, Batcher’s sorting schemes and prefixsum – exhibit recursive structure. We propose a data structure, powerlist, that permits succinct descriptions of such algorithms, highlighting the roles of both parallelism and recursion. Simple algebraic properties of this data structure can be exploited to derive properties of these algorithms and establish equivalence of different algorithms that solve the same problem.
ArchitectureCognizant Divide and Conquer Algorithms
, 1999
"... Divide and conquer programs can achieve good performance on parallel computers and computers with deep memory hierarchies. We introduce architecturecognizant divide and conquer algorithms, and explore how they can achieve even better performance. An architecturecognizant algorithm has functionall ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
Divide and conquer programs can achieve good performance on parallel computers and computers with deep memory hierarchies. We introduce architecturecognizant divide and conquer algorithms, and explore how they can achieve even better performance. An architecturecognizant algorithm has functionallyequivalent variants of the divide and/or combine functions, and a variant policy that specifies which variant to use at each level of recursion. An optimal variant policy is chosen for each target computer via experimentation. With h levels of recursion, an exhaustive search requires (v h ) experiments (where v is the number of variants). We present a method based on dynamic programming that reduces this to (h c ) (where c is typically a small constant) experiments for a class of architecturecognizant programs. We verify our technique on two kernels (matrix multiply and 2D Point Jacobi) using three architectures. Our technique improves performance by up to a factor of two, compared...
Architecture Independent Massive Parallelization of DivideandConquer Algorithms
 Mathematics of Program Construction, Lecture Notes in Computer Science 947
, 1995
"... . We present a strategy to develop, in a functional setting, correct, efficient and portable DivideandConquer (DC) programs for massively parallel architectures. Starting from an operational DC program, mapping sequences to sequences, we apply a set of semantics preserving transformation rules, wh ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
. We present a strategy to develop, in a functional setting, correct, efficient and portable DivideandConquer (DC) programs for massively parallel architectures. Starting from an operational DC program, mapping sequences to sequences, we apply a set of semantics preserving transformation rules, which transform the parallel control structure of DC into a sequential control flow, thereby making the implicit data parallelism in a DC scheme explicit. In the next phase of our strategy, the parallel architecture is fully expressed, where `architecture dependent' higherorder functions are introduced. Then  due to the rising communication complexities on particular architectures  topology dependent communication patterns are optimized in order to reduce the overall communication costs. The advantages of this approach are manifold and are demonstrated with a set of nontrivial examples. 1 Introduction It is wellknown that the main problems in exploiting the power of modern parallel sys...
Formal Derivation of DivideandConquer Programs: A Case Study in the Multidimensional FFT's
 Formal Methods for Parallel Programming: Theory and Applications. Workshop at IPPS'97
, 1997
"... This paper reports a case study in the development of parallel programs in the BirdMeertens formalism (BMF), starting from divideandconquer algorithm specifications. The contribution of the paper is twofold: (1) we classify divideandconquer algorithms and formally derive a parameterized family ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
This paper reports a case study in the development of parallel programs in the BirdMeertens formalism (BMF), starting from divideandconquer algorithm specifications. The contribution of the paper is twofold: (1) we classify divideandconquer algorithms and formally derive a parameterized family of parallel implementations for an important subclass of divideandconquer, called DH (distributable homomorphisms); (2) we systematically adjust the mathematical specification of the Fast Fourier Transform (FFT) to the DH format and thereby obtain a generic SPMD program, well suited for implementation under MPI. The target program includes the efficient FFT solutions used in practice the binaryexchange and the 2D and 3Dtranspose implementations as its special cases.
Massive parallelization of divideandconquer algorithms over powerlists. Science of Computer Programming, 26:5978
 In 4th Principles and Practice of Parallel Programming
, 1996
"... It contains all proofs of the introduced transformation rules as well as programming examples on a SIMD computer. ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
It contains all proofs of the introduced transformation rules as well as programming examples on a SIMD computer.
Parallelization of DivideandConquer in the BirdMeertens Formalism
, 1995
"... . An SPMD parallel implementation schema for divideandconquer specifications is proposed and derived by formal refinement (transformation) of the specification. The specification is in the form of a mutually recursive functional definition. In a first phase, a parallel functional program schema is ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
. An SPMD parallel implementation schema for divideandconquer specifications is proposed and derived by formal refinement (transformation) of the specification. The specification is in the form of a mutually recursive functional definition. In a first phase, a parallel functional program schema is constructed which consists of a communication tree and a functional program that is shared by all nodes of the tree. The fact that this phase proceeds by semanticspreserving transformations in the BirdMeertens formalism of higherorder functions guarantees the correctness of the resulting functional implementation. A second phase yields an imperative distributed messagepassing implementation of this schema. The derivation process is illustrated with an example: a twodimensional numerical integration algorithm. 1. Introduction One of the main problems in exploiting modern multiprocessor systems is how to develop correct and efficient programs for them. We address this problem using the ap...
Parallel Divide and Conquer on Meshes
 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1996
"... We address the problem of mapping divideandconquer programs to mesh connected multicomputers with wormhole or storeandforward routing. We propose the binomial tree as an efficient model of parallel divideandconquer and present two mappings of the binomial tree to the 2D mesh. Our mappings expl ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We address the problem of mapping divideandconquer programs to mesh connected multicomputers with wormhole or storeandforward routing. We propose the binomial tree as an efficient model of parallel divideandconquer and present two mappings of the binomial tree to the 2D mesh. Our mappings exploit regularity in the communication structure of the divideandconquer computation and are also sensitive to the underlying flow control scheme of the target architecture. We evaluate these mappings using new metrics which are extensions of the classical notions of dilation and contention. We introduce the notion of communication slowdown as a measure of the total communication overhead incurred by a parallel computation. We conclude that significant performance gains can be realized when the mapping is sensitive to the flow control scheme of the target architecture.
A personal, historical perspective of parallel programming for high performance
 CommunicationBased Systems (CBS 2000
, 2000
"... ..."
Formal Derivation and Implementation of DivideandConquer on a Transputer Network
 Transputer Applications and Systems '94
, 1994
"... This paper considers parallel program development based on functional mutually recursive specifications. The development yields a communication structure linking an arbitrary fixed number of processors and an SPMD program executable on the structure. There are two steps in the development proces ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
This paper considers parallel program development based on functional mutually recursive specifications. The development yields a communication structure linking an arbitrary fixed number of processors and an SPMD program executable on the structure. There are two steps in the development process: first, a parallel functional implementation is obtained through formal transformations in the BirdMeertens formalism; it is then systematically transformed into an imperative target program with message passing. The approach is illustrated with a divideandconquer algorithm for numerical twodimensional sparse grid integration. The optimization of the target program and the results of experimental performance measurements on a 64transputer network under OS Parix are presented. 1 Introduction We take the following approach to parallelization: we try to identify certain standard patterns of highlevel functional specifications and to associate equivalent parallel programs to them...