Results 1  10
of
19
Systematic Efficient Parallelization of Scan and Other List Homomorphisms
 In Annual European Conference on Parallel Processing, LNCS 1124
, 1996
"... Homomorphisms are functions which can be parallelized by the divideandconquer paradigm. A class of distributable homomorphisms (DH) is introduced and an efficient parallel implementation schema for all functions of the class is derived by transformations in the BirdMeertens formalism. The schema ..."
Abstract

Cited by 26 (7 self)
 Add to MetaCart
Homomorphisms are functions which can be parallelized by the divideandconquer paradigm. A class of distributable homomorphisms (DH) is introduced and an efficient parallel implementation schema for all functions of the class is derived by transformations in the BirdMeertens formalism. The schema can be directly mapped on the hypercube with an unlimited or an arbitrary fixed number of processors, providing provable correctness and predictable performance. The popular scanfunction (parallel prefix) illustrates the presentation: the systematically derived implementation for scan coincides with the practically used "folklore" algorithm for distributedmemory machines.
Optimization Rules for Programming with Collective Operations
 IPPS/SPDP'99. 13th Int. Parallel Processing Symp. & 10th Symp. on Parallel and Distributed Processing
, 1999
"... We study how several collective operations like broadcast, reduction, scan, etc. can be composed efficiently in complex parallel programs. Our specific contributions are: (1) a formal framework for reasoning about collective operations; (2) a set of optimization rules which save communications by fu ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
We study how several collective operations like broadcast, reduction, scan, etc. can be composed efficiently in complex parallel programs. Our specific contributions are: (1) a formal framework for reasoning about collective operations; (2) a set of optimization rules which save communications by fusing several collective operations into one; (3) performance estimates, which guide the application of optimization rules depending on the machine characteristics; (4) a simple case study with the first results of machine experiments.
Program Development for Computational Grids Using Skeletons and Peformance Prediction
 Letters
, 2002
"... We address the challenging problem of algorithm and program design for the Computational Grid by providing the application user with a set of highlevel, parameterized components called skeletons. We describe a Javabased Grid programming system in which algorithms are composed of skeletons and t ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
We address the challenging problem of algorithm and program design for the Computational Grid by providing the application user with a set of highlevel, parameterized components called skeletons. We describe a Javabased Grid programming system in which algorithms are composed of skeletons and the computational resources for executing individual skeletons are chosen using performance prediction. The advantage of our approach is that skeletons are reusable for dierent applications and that skeletons' implementations can be tuned to particular machines. The focus of this paper is on predicting performance for Grid applications constructed using skeletons.
Extracting and Implementing List Homomorphisms in Parallel Program Development
 Science of Computer Programming
, 1997
"... this paper, we study functions called list homomorphisms, which represent a particular pattern of parallelism. ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
this paper, we study functions called list homomorphisms, which represent a particular pattern of parallelism.
Comparing Parallel Functional Languages: Programming and Performance
, 2002
"... This paper presents a practical evaluation and comparison of three stateoftheart parallel functional languages. The evaluation is based on implementations of three typical symbolic computation programs, with performance measured on a Beowulfclass parallel architecture. We assess ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
This paper presents a practical evaluation and comparison of three stateoftheart parallel functional languages. The evaluation is based on implementations of three typical symbolic computation programs, with performance measured on a Beowulfclass parallel architecture. We assess
Tuning Task Granularity and Data Locality of Data Parallel GpH Programs
, 2001
"... The performance of data parallel programs often hinges on two key coordination aspects: the computational costs of the parallel tasks relative to their management overhead  task granularity ; and the communication costs induced by the distance between tasks and their data  data locality . In da ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
The performance of data parallel programs often hinges on two key coordination aspects: the computational costs of the parallel tasks relative to their management overhead  task granularity ; and the communication costs induced by the distance between tasks and their data  data locality . In data parallel programs both granularity and locality can be improved by clustering, i.e. arranging for parallel tasks to operate on related subcollections of data.
Parallel Functional Programming At Two Levels Of Abstraction
, 2001
"... The parallel functional language Eden extends Haskell with expressions to define and instantiate process systems. These extensions allow also the easy definition of skeletons as higherorder functions. Parallel programming is possible in Eden at two levels: Recursive programming and higherorder ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
The parallel functional language Eden extends Haskell with expressions to define and instantiate process systems. These extensions allow also the easy definition of skeletons as higherorder functions. Parallel programming is possible in Eden at two levels: Recursive programming and higherorder programming. At the lower level, processes are explicitly created by using recursive definitions, allowing the definition of skeletons. This is novel, as most skeletonbased languages use an imperative language to create new skeletons. At the higher level, available skeletons are used to create applications or to define new skeletons on top of the other ones. In this paper, we present five skeletons, most of them wellknown, covering a wide range of parallel structures. For each one, several Eden implementations are given, together with their corresponding cost models. Finally, examples of application programming are shown, including predicted and actual results on a Beowulf cluster.
A Compiler for HDC
 Fakultt fr Mathematik und Informatik
, 1999
"... We present a compiler for the functional language HDC, which aims at the generation of efficient code from highlevel programs. HDC, which is syntactically a subset of the widely used language Haskell, facilitates the clean integration of skeletons with a predefined efficient parallel implementation ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
We present a compiler for the functional language HDC, which aims at the generation of efficient code from highlevel programs. HDC, which is syntactically a subset of the widely used language Haskell, facilitates the clean integration of skeletons with a predefined efficient parallel implementation into a functional program. Skeletons are higherorder functions which represent program schemata that can be specialized by providing customizing functions as parameters. The only restriction on customizing functions is their type. Skeletons can be composed of skeletons again. With HDC, we focus on the divideandconquer paradigm, which has a high potential for an efficient parallelization. We describe the most important phases of the compiler: desugaring, elimination of higherorder functions, generation of an optimized directed acyclic graph and code generation, with a focus on the integration of skeletons. The effect of the transformations on the target code is demonstrated on the examp...
Formal Derivation of DivideandConquer Programs: A Case Study in the Multidimensional FFT's
 Formal Methods for Parallel Programming: Theory and Applications. Workshop at IPPS'97
, 1997
"... This paper reports a case study in the development of parallel programs in the BirdMeertens formalism (BMF), starting from divideandconquer algorithm specifications. The contribution of the paper is twofold: (1) we classify divideandconquer algorithms and formally derive a parameterized family ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
This paper reports a case study in the development of parallel programs in the BirdMeertens formalism (BMF), starting from divideandconquer algorithm specifications. The contribution of the paper is twofold: (1) we classify divideandconquer algorithms and formally derive a parameterized family of parallel implementations for an important subclass of divideandconquer, called DH (distributable homomorphisms); (2) we systematically adjust the mathematical specification of the Fast Fourier Transform (FFT) to the DH format and thereby obtain a generic SPMD program, well suited for implementation under MPI. The target program includes the efficient FFT solutions used in practice the binaryexchange and the 2D and 3Dtranspose implementations as its special cases.
From Transformations to Methodology in Parallel Program Development: A Case Study
 Microprocessing and Microprogramming
, 1996
"... The BirdMeertens formalism (BMF) of higherorder functions over lists is a mathematical framework supporting formal derivation of algorithms from functional specifications. This paper reports results of a case study on the systematic use of BMF in the process of parallel program development. We dev ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
The BirdMeertens formalism (BMF) of higherorder functions over lists is a mathematical framework supporting formal derivation of algorithms from functional specifications. This paper reports results of a case study on the systematic use of BMF in the process of parallel program development. We develop a parallel program for polynomial multiplication, starting with a straightforward mathematical specification and arriving at the target processor topology together with a program for each processor of it. The development process is based on formal transformations; design decisions concerning data partitioning, processor interconnections, etc. are governed by formal type analysis and performance estimation rather than made ad hoc. The parallel target implementation is parameterized for an arbitrary number of processors; for the particular number, the target program is both time and costoptimal. We compare our results with systolic solutions to polynomial multiplication.