Results 1  10
of
19
An Accumulative Parallel Skeleton for All
, 2001
"... Parallel skeletons intend to encourage programmers to build... ..."
Abstract

Cited by 14 (11 self)
 Add to MetaCart
Parallel skeletons intend to encourage programmers to build...
Parallelization of DivideandConquer by Translation to Nested Loops
 J. Functional Programming
, 1997
"... We propose a sequence of equational transformations and specializations which turns a divideandconquer skeleton in Haskell into a parallel loop nest in C. Our initial skeleton is often viewed as general divideandconquer. The specializations impose a balanced call tree, a fixed degree of the prob ..."
Abstract

Cited by 13 (7 self)
 Add to MetaCart
(Show Context)
We propose a sequence of equational transformations and specializations which turns a divideandconquer skeleton in Haskell into a parallel loop nest in C. Our initial skeleton is often viewed as general divideandconquer. The specializations impose a balanced call tree, a fixed degree of the problem division, and elementwise operations. Our goal is to select parallel implementations of divideandconquer via a spacetime mapping, which can be determined at compile time. The correctness of our transformations is proved by equational reasoning in Haskell; recursion and iteration are handled by induction. Finally, we demonstrate the practicality of the skeleton by expressing Strassen's matrix multiplication in it.
Diffusion: Calculating Efficient Parallel Programs
 IN 1999 ACM SIGPLAN WORKSHOP ON PARTIAL EVALUATION AND SEMANTICSBASED PROGRAM MANIPULATION (PEPM ’99
, 1999
"... Parallel primitives (skeletons) intend to encourage programmers to build a parallel program from readymade components for which efficient implementations are known to exist, making the parallelization process easier. However, programmers often suffer from the difficulty to choose a combination of p ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
Parallel primitives (skeletons) intend to encourage programmers to build a parallel program from readymade components for which efficient implementations are known to exist, making the parallelization process easier. However, programmers often suffer from the difficulty to choose a combination of proper parallel primitives so as to construct efficient parallel programs. To overcome this difficulty, we shall propose a new transformation, called diffusion, which can efficiently decompose a recursive definition into several functions such that each function can be described by some parallel primitive. This allows programmers to describe algorithms in a more natural recursive form. We demonstrate our idea with several interesting examples. Our diffusion transformation should be significant not only in development of new parallel algorithms, but also in construction of parallelizing compilers.
Optimizing Compositions of Scans and Reductions in Parallel Program Derivation
, 1997
"... Introduction We study two popular programming schemas: scan (also known as prefix sums, parallel prefix, etc.) and reduction (also known as fold). Originally from the functional world [3], they are becoming increasingly popular as primitives of parallel programming. The reasons are that, first, such ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
Introduction We study two popular programming schemas: scan (also known as prefix sums, parallel prefix, etc.) and reduction (also known as fold). Originally from the functional world [3], they are becoming increasingly popular as primitives of parallel programming. The reasons are that, first, such higherorder combinators are adequate and useful for a broad class of applications [4], second, they encourage wellstructured, coarsegrained parallel programming and, third, their implementation in the MPI standard [14] makes the target programs portable across different parallel architectures with predictable performance. Our contributions are as follows:  We formally prove two optimization rules: the first rule transforms a sequential composition of scan and reduction into a single reduction, the second rule transforms a composition of two scans into a single scan.  We apply the first rule in the formal derivation of a parallel algorithm for the
Parallelizing Functional Programs by Generalization
 Journal of Functional Programming
, 1997
"... List homomorphisms are functions that are parallelizable using the divideandconquer paradigm. We study the problem of finding a homomorphic representation of a given function, based on the BirdMeertens theory of lists. A previous work proved that to each pair of leftward and rightward sequential ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
List homomorphisms are functions that are parallelizable using the divideandconquer paradigm. We study the problem of finding a homomorphic representation of a given function, based on the BirdMeertens theory of lists. A previous work proved that to each pair of leftward and rightward sequential representations of a function, based on cons and snoclists, respectively, there is also a representation as a homomorphism. Our contribution is a mechanizable method to extract the homomorphism representation from a pair of sequential representations. The method is decomposed to a generalization problem and an inductive claim, both solvable by term rewriting techniques. To solve the former we present a sound generalization procedure which yields the required representation, and terminates under reasonable assumptions. We illustrate the method and the procedure by the parallelization of the scanfunction (parallel prefix). The inductive claim is provable automatically.
(De)Composition Rules for Parallel Scan and Reduction
 In Proc. 3rd Int. Working Conf. on Massively Parallel Programming Models (MPPM'97
, 1998
"... We study the use of welldefined building blocks for SPMD programming of machines with distributed memory. Our general framework is based on homomorphisms, functions that capture the idea of dataparallelism and have a close correspondence with collective operations of the MPI standard, e.g., scan an ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
We study the use of welldefined building blocks for SPMD programming of machines with distributed memory. Our general framework is based on homomorphisms, functions that capture the idea of dataparallelism and have a close correspondence with collective operations of the MPI standard, e.g., scan and reduction. We prove two composition rules: under certain conditions, a composition of a scan and a reduction can be transformed into one reduction, and a composition of two scans into one scan. As an example of decomposition, we transform a segmented reduction into a composition of partial reduction and allgather. The performance gain and overhead of the proposed composition and decomposition rules are assessed analytically for the hypercube and compared with the estimates for some other parallel models.
Automatic Inversion Generates DivideandConquer Parallel Programs
"... Divideandconquer algorithms are suitable for modern parallel machines, tending to have large amounts of inherent parallelism and working well with caches and deep memory hierarchies. Among others, list homomorphisms are a class of recursive functions on lists, which match very well with the divide ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
(Show Context)
Divideandconquer algorithms are suitable for modern parallel machines, tending to have large amounts of inherent parallelism and working well with caches and deep memory hierarchies. Among others, list homomorphisms are a class of recursive functions on lists, which match very well with the divideandconquer paradigm. However, direct programming with list homomorphisms is a challenge for many programmers. In this paper, we propose and implement a novel system that can automatically derive costoptimal list homomorphisms from a pair of sequential programs, based on the third homomorphism theorem. Our idea is to reduce extraction of list homomorphisms to derivation of weak right inverses. We show that a weak right inverse always exists and can be automatically generated from a wide class of sequential programs. We demonstrate our system with several nontrivial examples, including the maximum prefix sum problem, the prefix sum computation, the maximum segment sum problem, and the lineofsight problem. The experimental results show practical efficiency of our automatic parallelization algorithm and good speedups of the generated parallel programs.
Practical Parallel DivideandConquer Algorithms
, 1997
"... Nested data parallelism has been shown to be an important feature of parallel languages, allowing the concise expression of algorithms that operate on irregular data structures such as graphs and sparse matrices. However, previous nested dataparallel languages have relied on a vector PRAM impleme ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Nested data parallelism has been shown to be an important feature of parallel languages, allowing the concise expression of algorithms that operate on irregular data structures such as graphs and sparse matrices. However, previous nested dataparallel languages have relied on a vector PRAM implementation layer that cannot be efficiently mapped to MPPs with high interprocessor latency. This thesis shows that by restricting the problem set to that of dataparallel divideandconquer algorithms I can maintain the expressibility of full nested dataparallel languages while achieving good efficiency on current distributedmemory machines. Specifically, I define
Formal Derivation of DivideandConquer Programs: A Case Study in the Multidimensional FFT's
 Formal Methods for Parallel Programming: Theory and Applications. Workshop at IPPS'97
, 1997
"... This paper reports a case study in the development of parallel programs in the BirdMeertens formalism (BMF), starting from divideandconquer algorithm specifications. The contribution of the paper is twofold: (1) we classify divideandconquer algorithms and formally derive a parameterized family ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
This paper reports a case study in the development of parallel programs in the BirdMeertens formalism (BMF), starting from divideandconquer algorithm specifications. The contribution of the paper is twofold: (1) we classify divideandconquer algorithms and formally derive a parameterized family of parallel implementations for an important subclass of divideandconquer, called DH (distributable homomorphisms); (2) we systematically adjust the mathematical specification of the Fast Fourier Transform (FFT) to the DH format and thereby obtain a generic SPMD program, well suited for implementation under MPI. The target program includes the efficient FFT solutions used in practice the binaryexchange and the 2D and 3Dtranspose implementations as its special cases.
Parallelizing Functional Programs by Term Rewriting
, 1997
"... List homomorphisms are functions that can be computed in parallel using the divideandconquer paradigm. We study the problem of finding a homomorphic representation of a given function, based on the BirdMeertens theory of lists. A previous work proved that to each pair of leftward and rightward se ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
List homomorphisms are functions that can be computed in parallel using the divideandconquer paradigm. We study the problem of finding a homomorphic representation of a given function, based on the BirdMeertens theory of lists. A previous work proved that to each pair of leftward and rightward sequential representations of a function, based on cons and snoclists, respectively, there is also a representation as a homomorphism. Our contribution is a mechanizable method to extract the homomorphism representation from a pair of sequential representations. The method is decomposed to a generalization problem and an inductive claim, both solvable by term rewriting techniques. To solve the former we present a sound generalization procedure which yields the required representation, and terminates under reasonable assumptions. We illustrate the method and the procedure by the parallelization of the scanfunction (parallel prefix). The inductive claim is provable automatically. Keywords: P...