Results 1 - 10
of
14
Parallelization of Divide-and-Conquer by Translation to Nested Loops
- J. Functional Programming
, 1997
"... We propose a sequence of equational transformations and specializations which turns a divide-and-conquer skeleton in Haskell into a parallel loop nest in C. Our initial skeleton is often viewed as general divide-and-conquer. The specializations impose a balanced call tree, a fixed degree of the prob ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
We propose a sequence of equational transformations and specializations which turns a divide-and-conquer skeleton in Haskell into a parallel loop nest in C. Our initial skeleton is often viewed as general divide-and-conquer. The specializations impose a balanced call tree, a fixed degree of the problem division, and elementwise operations. Our goal is to select parallel implementations of divide-and-conquer via a space-time mapping, which can be determined at compile time. The correctness of our transformations is proved by equational reasoning in Haskell; recursion and iteration are handled by induction. Finally, we demonstrate the practicality of the skeleton by expressing Strassen's matrix multiplication in it.
An Accumulative Parallel Skeleton for All
, 2001
"... Parallel skeletons intend to encourage programmers to build... ..."
Abstract
-
Cited by 11 (9 self)
- Add to MetaCart
Parallel skeletons intend to encourage programmers to build...
Optimizing Compositions of Scans and Reductions in Parallel Program Derivation
, 1997
"... Introduction We study two popular programming schemas: scan (also known as prefix sums, parallel prefix, etc.) and reduction (also known as fold). Originally from the functional world [3], they are becoming increasingly popular as primitives of parallel programming. The reasons are that, first, such ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Introduction We study two popular programming schemas: scan (also known as prefix sums, parallel prefix, etc.) and reduction (also known as fold). Originally from the functional world [3], they are becoming increasingly popular as primitives of parallel programming. The reasons are that, first, such higher-order combinators are adequate and useful for a broad class of applications [4], second, they encourage well-structured, coarse-grained parallel programming and, third, their implementation in the MPI standard [14] makes the target programs portable across different parallel architectures with predictable performance. Our contributions are as follows: -- We formally prove two optimization rules: the first rule transforms a sequential composition of scan and reduction into a single reduction, the second rule transforms a composition of two scans into a single scan. -- We apply the first rule in the formal derivation of a parallel algorithm for the
(De)Composition Rules for Parallel Scan and Reduction
- In Proc. 3rd Int. Working Conf. on Massively Parallel Programming Models (MPPM'97
, 1998
"... We study the use of well-defined building blocks for SPMD programming of machines with distributed memory. Our general framework is based on homomorphisms, functions that capture the idea of dataparallelism and have a close correspondence with collective operations of the MPI standard, e.g., scan an ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We study the use of well-defined building blocks for SPMD programming of machines with distributed memory. Our general framework is based on homomorphisms, functions that capture the idea of dataparallelism and have a close correspondence with collective operations of the MPI standard, e.g., scan and reduction. We prove two composition rules: under certain conditions, a composition of a scan and a reduction can be transformed into one reduction, and a composition of two scans into one scan. As an example of decomposition, we transform a segmented reduction into a composition of partial reduction and all-gather. The performance gain and overhead of the proposed composition and decomposition rules are assessed analytically for the hypercube and compared with the estimates for some other parallel models.
Diffusion: Calculating Efficient Parallel Programs
- IN 1999 ACM SIGPLAN WORKSHOP ON PARTIAL EVALUATION AND SEMANTICS-BASED PROGRAM MANIPULATION (PEPM ’99
, 1999
"... Parallel primitives (skeletons) intend to encourage programmers to build a parallel program from ready-made components for which efficient implementations are known to exist, making the parallelization process easier. However, programmers often suffer from the difficulty to choose a combination of p ..."
Abstract
-
Cited by 8 (7 self)
- Add to MetaCart
Parallel primitives (skeletons) intend to encourage programmers to build a parallel program from ready-made components for which efficient implementations are known to exist, making the parallelization process easier. However, programmers often suffer from the difficulty to choose a combination of proper parallel primitives so as to construct efficient parallel programs. To overcome this difficulty, we shall propose a new transformation, called diffusion, which can efficiently decompose a recursive definition into several functions such that each function can be described by some parallel primitive. This allows programmers to describe algorithms in a more natural recursive form. We demonstrate our idea with several interesting examples. Our diffusion transformation should be significant not only in development of new parallel algorithms, but also in construction of parallelizing compilers.
Parallelizing Functional Programs by Generalization
- Journal of Functional Programming
, 1997
"... List homomorphisms are functions that are parallelizable using the divide-and-conquer paradigm. We study the problem of finding a homomorphic representation of a given function, based on the Bird-Meertens theory of lists. A previous work proved that to each pair of leftward and rightward sequential ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
List homomorphisms are functions that are parallelizable using the divide-and-conquer paradigm. We study the problem of finding a homomorphic representation of a given function, based on the Bird-Meertens theory of lists. A previous work proved that to each pair of leftward and rightward sequential representations of a function, based on cons- and snoc-lists, respectively, there is also a representation as a homomorphism. Our contribution is a mechanizable method to extract the homomorphism representation from a pair of sequential representations. The method is decomposed to a generalization problem and an inductive claim, both solvable by term rewriting techniques. To solve the former we present a sound generalization procedure which yields the required representation, and terminates under reasonable assumptions. We illustrate the method and the procedure by the parallelization of the scan-function (parallel prefix). The inductive claim is provable automatically.
Practical Parallel Divide-and-Conquer Algorithms
, 1997
"... Nested data parallelism has been shown to be an important feature of parallel languages, allowing the concise expression of algorithms that operate on irregular data structures such as graphs and sparse matrices. However, previous nested dataparallel languages have relied on a vector PRAM impleme ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Nested data parallelism has been shown to be an important feature of parallel languages, allowing the concise expression of algorithms that operate on irregular data structures such as graphs and sparse matrices. However, previous nested dataparallel languages have relied on a vector PRAM implementation layer that cannot be efficiently mapped to MPPs with high inter-processor latency. This thesis shows that by restricting the problem set to that of data-parallel divide-and-conquer algorithms I can maintain the expressibility of full nested data-parallel languages while achieving good efficiency on current distributed-memory machines. Specifically, I define
Formal Derivation of Divide-and-Conquer Programs: A Case Study in the Multidimensional FFT's
- Formal Methods for Parallel Programming: Theory and Applications. Workshop at IPPS'97
, 1997
"... This paper reports a case study in the development of parallel programs in the Bird-Meertens formalism (BMF), starting from divide-and-conquer algorithm specifications. The contribution of the paper is two-fold: (1) we classify divide-and-conquer algorithms and formally derive a parameterized family ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
This paper reports a case study in the development of parallel programs in the Bird-Meertens formalism (BMF), starting from divide-and-conquer algorithm specifications. The contribution of the paper is two-fold: (1) we classify divide-and-conquer algorithms and formally derive a parameterized family of parallel implementations for an important subclass of divide-and-conquer, called DH (distributable homomorphisms); (2) we systematically adjust the mathematical specification of the Fast Fourier Transform (FFT) to the DH format and thereby obtain a generic SPMD program, well suited for implementation under MPI. The target program includes the efficient FFT solutions used in practice the binary-exchange and the 2D- and 3D-transpose implementations as its special cases.
Parallel Implementations of Combinations of Broadcast, Reduction and Scan
- Proc. 2nd Int. Workshop on Software Engineering for Parallel and Distributed Systems (PDSE'97
, 1997
"... Broadcast, Reduction and Scan are popular functional skeletons which are used in distributed algorithms to distribute and gather data. We derive new parallel implementations of combinations of Broadcast, Reduction and Scan via a tabular classification of linearly recursive functions. The trick in th ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Broadcast, Reduction and Scan are popular functional skeletons which are used in distributed algorithms to distribute and gather data. We derive new parallel implementations of combinations of Broadcast, Reduction and Scan via a tabular classification of linearly recursive functions. The trick in the derivation is to not simply combine the individual parallel implementations of Broadcast, Reduction and Scan, but to transform these combinations to skeletons with a better performance. These skeletons are also linearly recursive. Keywords: functional programming, linear recursion, parallelization, skeletons 1. Introduction Functional programming offers a very high-level approach to specifying executable problem solutions. For example, the scheme of linear recursion can be expressed concisely as a higher-order function. In the data-parallel world, higher-order functions are used which represent classes of parallel algorithms on data structures; these higher-order functions are also call...
Towards polytypic parallel programming
, 1998
"... Data parallelism is currently one of the most successful models for programming massively parallel computers. The central idea is to evaluate a uniform collection of data in parallel by simultaneously manipulating each data element in the collection. Despite many of its promising features, the curre ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Data parallelism is currently one of the most successful models for programming massively parallel computers. The central idea is to evaluate a uniform collection of data in parallel by simultaneously manipulating each data element in the collection. Despite many of its promising features, the current approach suffers from two problems. First, the main parallel data structures that most data parallel languages currently support are restricted to simple collection data types like lists, arrays or similar structures. But other useful data structures like trees have not been well addressed. Second, parallel programming relies on a set of parallel primitives that capture parallel skeletons of interest. However, these primitives are not well structured, and efficient parallel programming with these primitives is difficult. In this paper, we propose a polytypic framework for developing efficient parallel programs on most data structures. We showhow a set of polytypic parallel primitives can be formally defined for manipulating most data structures, how these primitives can be successfully structured into a uniform recursive definition, and how an efficient combination of primitives can be derived from a naive specification program. Our framework should be significant not only in development of new parallel algorithms, but also in construction of parallelizing compilers.

