Results 1 
4 of
4
The Loop Parallelizer LooPo
 Proc. Sixth Workshop on Compilers for Parallel Computers, volume 21 of Konferenzen des Forschungszentrums Jülich
, 1996
"... . We report on a prototype for testing different methods of spacetime mapping loop nests. LooPo admits perfect or imperfect loop nests in a number of imperative languages, takes data dependences from the user or derives them itself from the source code, provides a choice of strategies for sched ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
. We report on a prototype for testing different methods of spacetime mapping loop nests. LooPo admits perfect or imperfect loop nests in a number of imperative languages, takes data dependences from the user or derives them itself from the source code, provides a choice of strategies for scheduling and allocating the loop nest's iterations, and produces synchronous or asynchronous parallel target code for sharedmemory or distributedmemory machines. 1 Why LooPo? LooPo is not meant to be yet another parallelizing compiler. It is a prototype system whose purpose is to assist us in the research on and evaluation of spacetime mapping methods for loop parallelization. To that end, it implements the complete path from executable source code to executable target code, with switches for choosing alternative methods. At present, we provide several inequality solving methods, several schedulers and several methods of code generation, one dependence analyzer (we are working on a second...
Architecture Independent Massive Parallelization of DivideandConquer Algorithms
 Mathematics of Program Construction, Lecture Notes in Computer Science 947
, 1995
"... . We present a strategy to develop, in a functional setting, correct, efficient and portable DivideandConquer (DC) programs for massively parallel architectures. Starting from an operational DC program, mapping sequences to sequences, we apply a set of semantics preserving transformation rules, wh ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
. We present a strategy to develop, in a functional setting, correct, efficient and portable DivideandConquer (DC) programs for massively parallel architectures. Starting from an operational DC program, mapping sequences to sequences, we apply a set of semantics preserving transformation rules, which transform the parallel control structure of DC into a sequential control flow, thereby making the implicit data parallelism in a DC scheme explicit. In the next phase of our strategy, the parallel architecture is fully expressed, where `architecture dependent' higherorder functions are introduced. Then  due to the rising communication complexities on particular architectures  topology dependent communication patterns are optimized in order to reduce the overall communication costs. The advantages of this approach are manifold and are demonstrated with a set of nontrivial examples. 1 Introduction It is wellknown that the main problems in exploiting the power of modern parallel sys...
From Transformations to Methodology in Parallel Program Development: A Case Study
 Microprocessing and Microprogramming
, 1996
"... The BirdMeertens formalism (BMF) of higherorder functions over lists is a mathematical framework supporting formal derivation of algorithms from functional specifications. This paper reports results of a case study on the systematic use of BMF in the process of parallel program development. We dev ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
The BirdMeertens formalism (BMF) of higherorder functions over lists is a mathematical framework supporting formal derivation of algorithms from functional specifications. This paper reports results of a case study on the systematic use of BMF in the process of parallel program development. We develop a parallel program for polynomial multiplication, starting with a straightforward mathematical specification and arriving at the target processor topology together with a program for each processor of it. The development process is based on formal transformations; design decisions concerning data partitioning, processor interconnections, etc. are governed by formal type analysis and performance estimation rather than made ad hoc. The parallel target implementation is parameterized for an arbitrary number of processors; for the particular number, the target program is both time and costoptimal. We compare our results with systolic solutions to polynomial multiplication.
Transformational Derivation of (parallel) Programs Using Skeletons
 Katholieke Universiteit Nijmegen
"... We describe a framework for the derivation of programs for arbitrary (in particular, parallel) architectures, motivated by a generalization of the derivation process for sequential algorithms. The central concept in this approach is that of a skeleton: on the one hand, a higherorder function for ta ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We describe a framework for the derivation of programs for arbitrary (in particular, parallel) architectures, motivated by a generalization of the derivation process for sequential algorithms. The central concept in this approach is that of a skeleton: on the one hand, a higherorder function for targeting transformational derivations at, on the other hand representing an elementary computation on the architecture aimed at. Skeletons thus form a basis for intermediate languages, that can be implemented once and for all, as a process separate from individual program developments. The available knowledge on the derivation of (higherorder) functional programs can be used for deriving parallel ones. This paper presents an overview of the method, illustrated with an example (trapezoidal rule on SIMD processor array), and ideas for future research. 1 Introduction and overview The introduction of various computer networks and parallel computers in recent years has led to a large increase in...