Results 1 -
4 of
4
The Loop Parallelizer LooPo
- Proc. Sixth Workshop on Compilers for Parallel Computers, volume 21 of Konferenzen des Forschungszentrums Jülich
, 1996
"... . We report on a prototype for testing different methods of space-time mapping loop nests. LooPo admits perfect or imperfect loop nests in a number of imperative languages, takes data dependences from the user or derives them itself from the source code, provides a choice of strategies for sched ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
. We report on a prototype for testing different methods of space-time mapping loop nests. LooPo admits perfect or imperfect loop nests in a number of imperative languages, takes data dependences from the user or derives them itself from the source code, provides a choice of strategies for scheduling and allocating the loop nest's iterations, and produces synchronous or asynchronous parallel target code for sharedmemory or distributed-memory machines. 1 Why LooPo? LooPo is not meant to be yet another parallelizing compiler. It is a prototype system whose purpose is to assist us in the research on and evaluation of space-time mapping methods for loop parallelization. To that end, it implements the complete path from executable source code to executable target code, with switches for choosing alternative methods. At present, we provide several inequality solving methods, several schedulers and several methods of code generation, one dependence analyzer (we are working on a second...
Architecture Independent Massive Parallelization of Divide-and-Conquer Algorithms
- Mathematics of Program Construction, Lecture Notes in Computer Science 947
, 1995
"... . We present a strategy to develop, in a functional setting, correct, efficient and portable Divide-and-Conquer (DC) programs for massively parallel architectures. Starting from an operational DC program, mapping sequences to sequences, we apply a set of semantics preserving transformation rules, wh ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
. We present a strategy to develop, in a functional setting, correct, efficient and portable Divide-and-Conquer (DC) programs for massively parallel architectures. Starting from an operational DC program, mapping sequences to sequences, we apply a set of semantics preserving transformation rules, which transform the parallel control structure of DC into a sequential control flow, thereby making the implicit data parallelism in a DC scheme explicit. In the next phase of our strategy, the parallel architecture is fully expressed, where `architecture dependent' higher-order functions are introduced. Then -- due to the rising communication complexities on particular architectures -- topology dependent communication patterns are optimized in order to reduce the overall communication costs. The advantages of this approach are manifold and are demonstrated with a set of non-trivial examples. 1 Introduction It is well-known that the main problems in exploiting the power of modern parallel sys...
From Transformations to Methodology in Parallel Program Development: A Case Study
- Microprocessing and Microprogramming
, 1996
"... The Bird-Meertens formalism (BMF) of higher-order functions over lists is a mathematical framework supporting formal derivation of algorithms from functional specifications. This paper reports results of a case study on the systematic use of BMF in the process of parallel program development. We dev ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
The Bird-Meertens formalism (BMF) of higher-order functions over lists is a mathematical framework supporting formal derivation of algorithms from functional specifications. This paper reports results of a case study on the systematic use of BMF in the process of parallel program development. We develop a parallel program for polynomial multiplication, starting with a straight-forward mathematical specification and arriving at the target processor topology together with a program for each processor of it. The development process is based on formal transformations; design decisions concerning data partitioning, processor interconnections, etc. are governed by formal type analysis and performance estimation rather than made ad hoc. The parallel target implementation is parameterized for an arbitrary number of processors; for the particular number, the target program is both time and cost-optimal. We compare our results with systolic solutions to polynomial multiplication.
Transformational Derivation of (parallel) Programs Using Skeletons
- Katholieke Universiteit Nijmegen
"... We describe a framework for the derivation of programs for arbitrary (in particular, parallel) architectures, motivated by a generalization of the derivation process for sequential algorithms. The central concept in this approach is that of a skeleton: on the one hand, a higher-order function for ta ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We describe a framework for the derivation of programs for arbitrary (in particular, parallel) architectures, motivated by a generalization of the derivation process for sequential algorithms. The central concept in this approach is that of a skeleton: on the one hand, a higher-order function for targeting transformational derivations at, on the other hand representing an elementary computation on the architecture aimed at. Skeletons thus form a basis for intermediate languages, that can be implemented once and for all, as a process separate from individual program developments. The available knowledge on the derivation of (higher-order) functional programs can be used for deriving parallel ones. This paper presents an overview of the method, illustrated with an example (trapezoidal rule on SIMD processor array), and ideas for future research. 1 Introduction and overview The introduction of various computer networks and parallel computers in recent years has led to a large increase in...

