Results 1  10
of
11
Powerlist: a structure for parallel recursion
 ACM Transactions on Programming Languages and Systems
, 1994
"... Many data parallel algorithms – Fast Fourier Transform, Batcher’s sorting schemes and prefixsum – exhibit recursive structure. We propose a data structure, powerlist, that permits succinct descriptions of such algorithms, highlighting the roles of both parallelism and recursion. Simple algebraic pro ..."
Abstract

Cited by 59 (2 self)
 Add to MetaCart
Many data parallel algorithms – Fast Fourier Transform, Batcher’s sorting schemes and prefixsum – exhibit recursive structure. We propose a data structure, powerlist, that permits succinct descriptions of such algorithms, highlighting the roles of both parallelism and recursion. Simple algebraic properties of this data structure can be exploited to derive properties of these algorithms and establish equivalence of different algorithms that solve the same problem.
ArchitectureCognizant Divide and Conquer Algorithms
, 1999
"... Divide and conquer programs can achieve good performance on parallel computers and computers with deep memory hierarchies. We introduce architecturecognizant divide and conquer algorithms, and explore how they can achieve even better performance. An architecturecognizant algorithm has functionall ..."
Abstract

Cited by 26 (5 self)
 Add to MetaCart
Divide and conquer programs can achieve good performance on parallel computers and computers with deep memory hierarchies. We introduce architecturecognizant divide and conquer algorithms, and explore how they can achieve even better performance. An architecturecognizant algorithm has functionallyequivalent variants of the divide and/or combine functions, and a variant policy that specifies which variant to use at each level of recursion. An optimal variant policy is chosen for each target computer via experimentation. With h levels of recursion, an exhaustive search requires (v h ) experiments (where v is the number of variants). We present a method based on dynamic programming that reduces this to (h c ) (where c is typically a small constant) experiments for a class of architecturecognizant programs. We verify our technique on two kernels (matrix multiply and 2D Point Jacobi) using three architectures. Our technique improves performance by up to a factor of two, compared...
Architecture Independent Massive Parallelization of DivideandConquer Algorithms
 Mathematics of Program Construction, Lecture Notes in Computer Science 947
, 1995
"... . We present a strategy to develop, in a functional setting, correct, efficient and portable DivideandConquer (DC) programs for massively parallel architectures. Starting from an operational DC program, mapping sequences to sequences, we apply a set of semantics preserving transformation rules, wh ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
. We present a strategy to develop, in a functional setting, correct, efficient and portable DivideandConquer (DC) programs for massively parallel architectures. Starting from an operational DC program, mapping sequences to sequences, we apply a set of semantics preserving transformation rules, which transform the parallel control structure of DC into a sequential control flow, thereby making the implicit data parallelism in a DC scheme explicit. In the next phase of our strategy, the parallel architecture is fully expressed, where `architecture dependent' higherorder functions are introduced. Then  due to the rising communication complexities on particular architectures  topology dependent communication patterns are optimized in order to reduce the overall communication costs. The advantages of this approach are manifold and are demonstrated with a set of nontrivial examples. 1 Introduction It is wellknown that the main problems in exploiting the power of modern parallel sys...
Formal Derivation of DivideandConquer Programs: A Case Study in the Multidimensional FFT's
 Formal Methods for Parallel Programming: Theory and Applications. Workshop at IPPS'97
, 1997
"... This paper reports a case study in the development of parallel programs in the BirdMeertens formalism (BMF), starting from divideandconquer algorithm specifications. The contribution of the paper is twofold: (1) we classify divideandconquer algorithms and formally derive a parameterized family ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
This paper reports a case study in the development of parallel programs in the BirdMeertens formalism (BMF), starting from divideandconquer algorithm specifications. The contribution of the paper is twofold: (1) we classify divideandconquer algorithms and formally derive a parameterized family of parallel implementations for an important subclass of divideandconquer, called DH (distributable homomorphisms); (2) we systematically adjust the mathematical specification of the Fast Fourier Transform (FFT) to the DH format and thereby obtain a generic SPMD program, well suited for implementation under MPI. The target program includes the efficient FFT solutions used in practice the binaryexchange and the 2D and 3Dtranspose implementations as its special cases.
Parallelization of DivideandConquer in the BirdMeertens Formalism
, 1995
"... . An SPMD parallel implementation schema for divideandconquer specifications is proposed and derived by formal refinement (transformation) of the specification. The specification is in the form of a mutually recursive functional definition. In a first phase, a parallel functional program schema is ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
. An SPMD parallel implementation schema for divideandconquer specifications is proposed and derived by formal refinement (transformation) of the specification. The specification is in the form of a mutually recursive functional definition. In a first phase, a parallel functional program schema is constructed which consists of a communication tree and a functional program that is shared by all nodes of the tree. The fact that this phase proceeds by semanticspreserving transformations in the BirdMeertens formalism of higherorder functions guarantees the correctness of the resulting functional implementation. A second phase yields an imperative distributed messagepassing implementation of this schema. The derivation process is illustrated with an example: a twodimensional numerical integration algorithm. 1. Introduction One of the main problems in exploiting modern multiprocessor systems is how to develop correct and efficient programs for them. We address this problem using the ap...
Massive parallelization of divideandconquer algorithms over powerlists. Science of Computer Programming, 26:5978
 In 4th Principles and Practice of Parallel Programming
, 1996
"... It contains all proofs of the introduced transformation rules as well as programming examples on a SIMD computer. ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
It contains all proofs of the introduced transformation rules as well as programming examples on a SIMD computer.
Formal Derivation and Implementation of DivideandConquer on a Transputer Network
 Transputer Applications and Systems '94
, 1994
"... This paper considers parallel program development based on functional mutually recursive specifications. The development yields a communication structure linking an arbitrary fixed number of processors and an SPMD program executable on the structure. There are two steps in the development proces ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
This paper considers parallel program development based on functional mutually recursive specifications. The development yields a communication structure linking an arbitrary fixed number of processors and an SPMD program executable on the structure. There are two steps in the development process: first, a parallel functional implementation is obtained through formal transformations in the BirdMeertens formalism; it is then systematically transformed into an imperative target program with message passing. The approach is illustrated with a divideandconquer algorithm for numerical twodimensional sparse grid integration. The optimization of the target program and the results of experimental performance measurements on a 64transputer network under OS Parix are presented. 1 Introduction We take the following approach to parallelization: we try to identify certain standard patterns of highlevel functional specifications and to associate equivalent parallel programs to them...
Abstraction and Performance in the Design of Parallel Programs
, 1997
"... ion and Performance in the Design of Parallel Programs Der Fakultat fur Mathematik und Informatik der Universitat Passau vorgelegte Zusammenfassung der Veroffentlichungen zur Erlangung der venia legendi von Dr. Sergei Gorlatch Passau, Juli 1997 Contents 1 Introduction 2 2 Outline of the SA ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
ion and Performance in the Design of Parallel Programs Der Fakultat fur Mathematik und Informatik der Universitat Passau vorgelegte Zusammenfassung der Veroffentlichungen zur Erlangung der venia legendi von Dr. Sergei Gorlatch Passau, Juli 1997 Contents 1 Introduction 2 2 Outline of the SAT Approach 6 2.1 Performance View . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Abstraction View . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Design in SAT: Stages and Transformations . . . . . . . . . . 9 2.4 SAT and Homomorphisms . . . . . . . . . . . . . . . . . . . . 11 3 List Homomorphisms 12 4 Extraction and Adjustment 14 4.1 The CSMethod . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.2 Mechanizing the CSMethod . . . . . . . . . . . . . . . . . . . 17 4.3 AlmostHomomorphisms: the MSS Problem . . . . . . . . . . 19 5 Composition of Homomorphisms 21 5.1 Rules of Composition . . . . . . . . . . . . . . . . . . . . . . . 21 5.2 Derivation by Transformation...
A personal, historical perspective of parallel programming for high performance
 CommunicationBased Systems (CBS 2000
, 2000
"... ..."
CompressandConquer for Optimal Multicore Computing
"... We propose a programming paradigm called compressandconquer (CC) that leads to optimal performance on multicore platforms. Given a multicore system of p cores and a problem of size n, the problem is first reduced to p smaller problems, each of which can be solved independently of the others (the c ..."
Abstract
 Add to MetaCart
We propose a programming paradigm called compressandconquer (CC) that leads to optimal performance on multicore platforms. Given a multicore system of p cores and a problem of size n, the problem is first reduced to p smaller problems, each of which can be solved independently of the others (the compression phase). From the solutions to the p problems, a compressed version of the same problem of size O(p) is deduced and solved (the global phase). The solution to the original problem is then derived from the solution to the compressed problem together with the solutions of the smaller problems (the expansion phase). The CC paradigm reduces the complexity of multicore programming by allowing the bestknown sequential algorithm for a problem to be used in each of the three phases. In this paper we apply the CC paradigm to a range of problems including scan, nested scan, difference equations, banded linear systems, and linear tridiagonal systems. The performance of CC programs is analyzed, and their optimality and linear speedup are proven. Characteristics of the problem space subject to CC are formally examined, and we show that its computational power subsumes that of scan, nested scan, and mapReduce. The CC paradigm has been implemented in Haskell as a modular, higherorder function, whose constituent functions can be shared by seemingly unrelated problems. This function is compiled into lowlevel Haskell threads that run on a multicore machine, and performance benchmarks confirm the theoretical analysis. D.1.3 [Parallel Program