Powerlist: a structure for parallel recursion
 ACM Transactions on Programming Languages and Systems
, 1994
"... Many data parallel algorithms – Fast Fourier Transform, Batcher’s sorting schemes and prefixsum – exhibit recursive structure. We propose a data structure, powerlist, that permits succinct descriptions of such algorithms, highlighting the roles of both parallelism and recursion. Simple algebraic pro ..."
Cited by 59 (2 self)
Many data parallel algorithms – Fast Fourier Transform, Batcher’s sorting schemes and prefixsum – exhibit recursive structure. We propose a data structure, powerlist, that permits succinct descriptions of such algorithms, highlighting the roles of both parallelism and recursion. Simple algebraic properties of this data structure can be exploited to derive properties of these algorithms and establish equivalence of different algorithms that solve the same problem.
Systematic Efficient Parallelization of Scan and Other List Homomorphisms
 In Annual European Conference on Parallel Processing, LNCS 1124
, 1996
"... Homomorphisms are functions which can be parallelized by the divideandconquer paradigm. A class of distributable homomorphisms (DH) is introduced and an efficient parallel implementation schema for all functions of the class is derived by transformations in the BirdMeertens formalism. The schema ..."
Cited by 27 (7 self)
Homomorphisms are functions which can be parallelized by the divideandconquer paradigm. A class of distributable homomorphisms (DH) is introduced and an efficient parallel implementation schema for all functions of the class is derived by transformations in the BirdMeertens formalism. The schema can be directly mapped on the hypercube with an unlimited or an arbitrary fixed number of processors, providing provable correctness and predictable performance. The popular scanfunction (parallel prefix) illustrates the presentation: the systematically derived implementation for scan coincides with the practically used "folklore" algorithm for distributedmemory machines.
Parallelization of DivideandConquer by Translation to Nested Loops
 J. Functional Programming
, 1997
"... We propose a sequence of equational transformations and specializations which turns a divideandconquer skeleton in Haskell into a parallel loop nest in C. Our initial skeleton is often viewed as general divideandconquer. The specializations impose a balanced call tree, a fixed degree of the prob ..."
Cited by 12 (6 self)
We propose a sequence of equational transformations and specializations which turns a divideandconquer skeleton in Haskell into a parallel loop nest in C. Our initial skeleton is often viewed as general divideandconquer. The specializations impose a balanced call tree, a fixed degree of the problem division, and elementwise operations. Our goal is to select parallel implementations of divideandconquer via a spacetime mapping, which can be determined at compile time. The correctness of our transformations is proved by equational reasoning in Haskell; recursion and iteration are handled by induction. Finally, we demonstrate the practicality of the skeleton by expressing Strassen's matrix multiplication in it.
Extracting and Implementing List Homomorphisms in Parallel Program Development
 Science of Computer Programming
, 1997
"... this paper, we study functions called list homomorphisms, which represent a particular pattern of parallelism. ..."
Cited by 12 (0 self)
this paper, we study functions called list homomorphisms, which represent a particular pattern of parallelism.
Derivation of Efficient Data Parallel Programs
 In 17th Australasian Computer Science Conference
, 1993
"... This paper considers the expression and derivation of efficient data parallel programs for SIMD and MIMD machines. It is shown that efficient parallel programs must utilise both sequential and parallel computation; these are termed hybrid programs. The BirdMeertens formalism, a calculus of higher ..."
Cited by 6 (0 self)
This paper considers the expression and derivation of efficient data parallel programs for SIMD and MIMD machines. It is shown that efficient parallel programs must utilise both sequential and parallel computation; these are termed hybrid programs. The BirdMeertens formalism, a calculus of higher order functions, is used to derive and express programs. Our goal is to derive efficient parallel programs for a variety of machines by: starting with an abstract specification, deriving an abstract algorithm and successively refining this to more efficient and machine dependent algorithms incorporating greater implementation detail. Nested data structures are used to express hybrid algorithms. Using this technique efficient accumulate (scan/parallel prefix) algorithms are derived for SIMD and MIMD machines. 1 Introduction The main reason for parallel programming is to achieve high performance. Unfortunately designing and writing efficient parallel programs, especially for MIMD machines, i...
List Processing Primitives for Parallel Computation
 Computer Languages
, 1993
"... A new model of list processing is proposed which is more suitable as a basic data structure for architectureindependent programming languages than the traditional model of lists. Its main primitive functions are: concatenate, which concatenates two lists; split, which partitions a list into two pa ..."
Cited by 6 (0 self)
A new model of list processing is proposed which is more suitable as a basic data structure for architectureindependent programming languages than the traditional model of lists. Its main primitive functions are: concatenate, which concatenates two lists; split, which partitions a list into two parts; and length, which gives the number of elements in a list. This model contains a degree of nondeterminism which allows greater freedom to the implementation to achieve high performance on both parallel and serial architectures. Keywords: data structures, functional programming, list processing, parallel programming. 1 Introduction Lists have been used as basic data structures within programming languages since the 1950s. The most elegant and successful formulation was in Lisp [9] with its primitive functions car, cdr and cons, often now referred to by the more meaningful names of head, tail and cons respectively. Lisp and its model of list processing based on the head, tail and cons ...
Formal Derivation and Implementation of DivideandConquer on a Transputer Network
 Transputer Applications and Systems '94
, 1994
"... This paper considers parallel program development based on functional mutually recursive specifications. The development yields a communication structure linking an arbitrary fixed number of processors and an SPMD program executable on the structure. There are two steps in the development proces ..."
Cited by 2 (2 self)
This paper considers parallel program development based on functional mutually recursive specifications. The development yields a communication structure linking an arbitrary fixed number of processors and an SPMD program executable on the structure. There are two steps in the development process: first, a parallel functional implementation is obtained through formal transformations in the BirdMeertens formalism; it is then systematically transformed into an imperative target program with message passing. The approach is illustrated with a divideandconquer algorithm for numerical twodimensional sparse grid integration. The optimization of the target program and the results of experimental performance measurements on a 64transputer network under OS Parix are presented. 1 Introduction We take the following approach to parallelization: we try to identify certain standard patterns of highlevel functional specifications and to associate equivalent parallel programs to them...
Notes on the SpaceTime Mapping of DivideandConquer Recursions
 In GI/ITG FG PARS'95, number 14 in PARS Mitteilungen
, 1995
"... We propose a functional program skeleton for balanced fixeddegree divideandconquer and a method for its parallel implementation on messagepassing multiprocessors. In the method, the operations of the skeleton are first mapped to a geometric computational model which is then mapped to spaceti ..."
Cited by 2 (2 self)
We propose a functional program skeleton for balanced fixeddegree divideandconquer and a method for its parallel implementation on messagepassing multiprocessors. In the method, the operations of the skeleton are first mapped to a geometric computational model which is then mapped to spacetime in order to expose the inherent parallelism. This approach is inspired by the method of parallelizing nested loops in the polytope model. Keywords: divideandconquer, functional programming, parallelization, polytope model, skeleton, spacetime mapping. 1 Introduction The divideandconquer (D&C) paradigm is a special case of cascading recursion which enables efficient solutions to many practical problems like the multiplication of matrices or large integers, fast Fourier transform, sorting, etc. We are interested in the parallelization of D&C recursions with the goal of sublinear execution times on a mesh. Sublinearity can only be achieved if input data are read in parallel and pro...
A personal, historical perspective of parallel programming for high performance
 CommunicationBased Systems (CBS 2000
, 2000
"... ..."