## From Transformations to Methodology in Parallel Program Development: A Case Study (1996)

Venue: | Microprocessing and Microprogramming |

Citations: | 4 - 1 self |

### BibTeX

@ARTICLE{Gorlatch96fromtransformations,

author = {Sergei Gorlatch},

title = {From Transformations to Methodology in Parallel Program Development: A Case Study},

journal = {Microprocessing and Microprogramming},

year = {1996},

volume = {41},

pages = {58--8}

}

### OpenURL

### Abstract

The Bird-Meertens formalism (BMF) of higher-order functions over lists is a mathematical framework supporting formal derivation of algorithms from functional specifications. This paper reports results of a case study on the systematic use of BMF in the process of parallel program development. We develop a parallel program for polynomial multiplication, starting with a straight-forward mathematical specification and arriving at the target processor topology together with a program for each processor of it. The development process is based on formal transformations; design decisions concerning data partitioning, processor interconnections, etc. are governed by formal type analysis and performance estimation rather than made ad hoc. The parallel target implementation is parameterized for an arbitrary number of processors; for the particular number, the target program is both time and cost-optimal. We compare our results with systolic solutions to polynomial multiplication.

### Citations

1337 |
Introduction to parallel algorithms and architectures: arrays, trees, hypercubes
- Leighton
- 1992
(Show Context)
Citation Context ...o cost linear time which is not satisfactory. One possible way to enable fast broadcasting, satisfying at the same time the practical requirement of a fixed fan-in topology, is to use a mesh of trees =-=[19]-=- where processors in one row (column) are connected in a balanced binary tree. Sixth design decision. Processors are connected in the psp mesh of trees. Note that we do not need usual 2D-mesh connecti... |

1335 | The essence of functional programming
- Wadler
- 1992
(Show Context)
Citation Context ...een carried out by transformations together with a suitable choice of the output data partitioning. We see the novelty of our work in that, in contrast to previous work on the Bird-Meertens formalism =-=[3, 17, 24]-=-, we do not restrict the consideration to formal derivation of an BMF expression with apparently good time complexity. In addition, we concentrate on the methodological aspects of the transition from ... |

150 | Parallel programming using skeleton functions
- Darlington, Field, et al.
- 1993
(Show Context)
Citation Context ...sequently leads to algorithmic skeletons which encapsulate typical templates of parallelism. Skeletons were introduced by Cole [7] and have been studied by the group of Darlington at Imperial College =-=[9]-=-, the group around Pepper [24, 25], Partsch and Geerling [23, 11], etc. 1 Closely related work is the Ruby system by Jones and Sheeran [18], the research carried out at Belfast [6], the P 3 L project ... |

97 | Loop parallelization in the polytope model
- Lengauer
- 1993
(Show Context)
Citation Context ...is example and its analogue, convolution, resemble structures of parallelism typical for many numerical applications; they have been studied extensively in the polytope theory of loop parallelization =-=[20]-=- and systolic design. We take the opportunity to compare the results obtained by two formal approaches: BMF and the polytope method. We start with the mathematical specification of the polynomial prod... |

71 | The design of a standard message passing interface for distributed memory concurrent computers
- Walker
- 1994
(Show Context)
Citation Context ...oadcasts segments of a along the columns, the second broadcasts segments of b along the rows of the matrix. We introduce broadcast functions which can be directly expressed, e.g., in the MPI standard =-=[32]-=-: . bcast-row p = map (copy p ) --- for broadcasting a list of segments along the rows of the processor matrix; . bcast-col p = copy p --- for broadcasting along the columns of the matrix. Thus, our p... |

58 | A Cost Calculus for Parallel Functional Programming
- Skillicorn, Cai
- 1995
(Show Context)
Citation Context ...hich denotes a list of length n with elements a i which are of type ff. The list length is used explicitly mostly for complexity estimation (similar assumptions were made by Jones [17] and Skillicorn =-=[27]-=-). We omit 3 the length of a list if it is not important in the given context, e.g., lists of arbitrary length whose elements are lists of length k with elements of type ff constitute type [ [ff] k ].... |

55 |
Algorithmic skeletons: a structured approach to the management of parallel computation
- Cole
- 1988
(Show Context)
Citation Context ... calculus for parallel functional programs. The higher-order approach consequently leads to algorithmic skeletons which encapsulate typical templates of parallelism. Skeletons were introduced by Cole =-=[7]-=- and have been studied by the group of Darlington at Imperial College [9], the group around Pepper [24, 25], Partsch and Geerling [23, 11], etc. 1 Closely related work is the Ruby system by Jones and ... |

48 | R.: Algorithm Theories and Design Tactics
- Smith, Lowry
- 1990
(Show Context)
Citation Context ...rtsch and Geerling [23, 11], etc. 1 Closely related work is the Ruby system by Jones and Sheeran [18], the research carried out at Belfast [6], the P 3 L project at Pisa [1], the KIDS system by Smith =-=[30] and -=-the functional approach by O'Donnell [22]. This paper reports results of a case study on the systematic use of BMF in the process of parallel program development. We are trying "to go all the way... |

28 |
Upwards and downwards accumulations on trees
- Gibbons
- 1993
(Show Context)
Citation Context ... list processing and optimization problems by Bird and de Moor [2], the Fast Fourier Transform (FFT) by G. Jones [17], calculation of recurrences by Cai and Skillicorn [4], tree algorithms by Gibbons =-=[13]-=-, parsing by Cole [8], image processing by Harrison and Grant-Duff [15], divide-and-conquer by the author jointly with Lengauer [14], etc. BMF seems especially promising in the area of parallel algori... |

26 | Designing arithmetic circuits by refinement in Ruby
- Jones, G, et al.
- 1993
(Show Context)
Citation Context ...e been studied by the group of Darlington at Imperial College [9], the group around Pepper [24, 25], Partsch and Geerling [23, 11], etc. 1 Closely related work is the Ruby system by Jones and Sheeran =-=[18]-=-, the research carried out at Belfast [6], the P 3 L project at Pisa [1], the KIDS system by Smith [30] and the functional approach by O'Donnell [22]. This paper reports results of a case study on the... |

23 |
Parallel Computing
- Quinn
- 1994
(Show Context)
Citation Context ... a =[ [1; 3]; [5; 7] ] # # b = [ [2; 4]; !s!s# # [6; 8] ] !s!sdistribute =) 0 @ ( [1; 3]; [2; 4] ) ( [5; 7]; [2; 4] ) ( [1; 3]; [6; 8] ) ( [5; 7]; [6; 8] ) 1 A + compute 0 B B @s[2; 10; 22; 34; 28] #s=-=[6; 26; 54; 82; 56]-=- 1 C C A map ( red-zipl-shift ) (= 0 @ [2; 10; 12] ! [10; 34; 28] [6; 26; 24] ! [30; 82; 56] 1 A + red-zipl-shift /s[2; 10; 28; 60; 82; 82; 56] ! 8 6 First Estimation of Complexity. In this section, w... |

17 |
Deductive Derivation of Parallel Programs
- Pepper
- 1993
(Show Context)
Citation Context ...c skeletons which encapsulate typical templates of parallelism. Skeletons were introduced by Cole [7] and have been studied by the group of Darlington at Imperial College [9], the group around Pepper =-=[24, 25]-=-, Partsch and Geerling [23, 11], etc. 1 Closely related work is the Ruby system by Jones and Sheeran [18], the research carried out at Belfast [6], the P 3 L project at Pisa [1], the KIDS system by Sm... |

17 |
Deriving Parallel Programs from Specifications using Cost
- Skillicorn
- 1993
(Show Context)
Citation Context ..., etc. BMF seems especially promising in the area of parallel algorithms because many of its functionals have a natural parallel implementation. This aspect has been extensively studied by Skillicorn =-=[28]-=-, who proposed a cost calculus for parallel functional programs. The higher-order approach consequently leads to algorithmic skeletons which encapsulate typical templates of parallelism. Skeletons wer... |

13 |
Calculating Recurrences Using the Bird-Meertens Formalism
- Cai, Skillicorn
- 1992
(Show Context)
Citation Context ...l application domains including: list processing and optimization problems by Bird and de Moor [2], the Fast Fourier Transform (FFT) by G. Jones [17], calculation of recurrences by Cai and Skillicorn =-=[4]-=-, tree algorithms by Gibbons [13], parsing by Cole [8], image processing by Harrison and Grant-Duff [15], divide-and-conquer by the author jointly with Lengauer [14], etc. BMF seems especially promisi... |

11 |
A Higher-Order Approach to Parallel Algorithms
- Harrison
- 1992
(Show Context)
Citation Context ... processors in which it is located are those on the north-east border. For our concrete example, we have: 0 B @ [2; 10; 12] [10; 34; 28] % [6; 26; 24] [30; 82; 56] 1 C A combine1 =) 0 B @ [2; 10; 12] =-=[16; 60; 52]-=-s[30; 82; 56] 1 C A It remains to compute combine 2 = red (zipl +) ffi shift k which has type [ [ ] 2k\Gamma1 ] 2p\Gamma1 ! [ ] 2pk\Gamma1 , i.e., it increases the length of the lists to be transmitte... |

11 |
Constructing a calculus of programs
- Meertens
- 1989
(Show Context)
Citation Context ...tures. We address this problem using the Bird-Meertens formalism (BMF) which is essentially a collection of higher-order functions (functionals) over lists together with a set of algebraic identities =-=[3, 21]-=-. Algorithms on lists are specified as expressions of BMF, usually functional compositions. Using equational reasoning, a specification can be transformed into a form suitable for an efficient impleme... |

9 | List homomorphic parallel algorithms for bracket matching
- Cole
- 1993
(Show Context)
Citation Context ...ptimization problems by Bird and de Moor [2], the Fast Fourier Transform (FFT) by G. Jones [17], calculation of recurrences by Cai and Skillicorn [4], tree algorithms by Gibbons [13], parsing by Cole =-=[8]-=-, image processing by Harrison and Grant-Duff [15], divide-and-conquer by the author jointly with Lengauer [14], etc. BMF seems especially promising in the area of parallel algorithms because many of ... |

9 | Deriving the fast Fourier algorithm by calculation
- Jones
- 1989
(Show Context)
Citation Context ...on. BMF has been used for deriving algorithms in several application domains including: list processing and optimization problems by Bird and de Moor [2], the Fast Fourier Transform (FFT) by G. Jones =-=[17]-=-, calculation of recurrences by Cai and Skillicorn [4], tree algorithms by Gibbons [13], parsing by Cole [8], image processing by Harrison and Grant-Duff [15], divide-and-conquer by the author jointly... |

8 | Data distribution algebras --- a formal basis for programming using skeletons
- Sudholt
(Show Context)
Citation Context ...l specification of an algorithm to a provably correct, predictably efficient multiprocessor implementation of this algorithm. Our work continues efforts by Skillicorn and Cai [29], Pepper and Sudholt =-=[24, 31]-=- and others to incorporate data distribution into the development process. A new aspect is the use of transformations with curried and uncurried functional forms; this leads to a data distribution wit... |

7 | Functional development of massively parallel programs - Pepper, Exner, et al. - 1993 |

5 |
Some experiments in transforming towards parallel executability
- Partsch
- 1993
(Show Context)
Citation Context ...ypical templates of parallelism. Skeletons were introduced by Cole [7] and have been studied by the group of Darlington at Imperial College [9], the group around Pepper [24, 25], Partsch and Geerling =-=[23, 11]-=-, etc. 1 Closely related work is the Ruby system by Jones and Sheeran [18], the research carried out at Belfast [6], the P 3 L project at Pisa [1], the KIDS system by Smith [30] and the functional app... |

4 |
Solving optimization problems with catamorphisms
- Moor, Bird
- 1992
(Show Context)
Citation Context ...s provably correct with respect to the specification. BMF has been used for deriving algorithms in several application domains including: list processing and optimization problems by Bird and de Moor =-=[2]-=-, the Fast Fourier Transform (FFT) by G. Jones [17], calculation of recurrences by Cai and Skillicorn [4], tree algorithms by Gibbons [13], parsing by Cole [8], image processing by Harrison and Grant-... |

4 |
Equational code generation: Implementing categorical data types for data parallelism
- Skillicorn, Cai
- 1994
(Show Context)
Citation Context ...ion from a very high-level specification of an algorithm to a provably correct, predictably efficient multiprocessor implementation of this algorithm. Our work continues efforts by Skillicorn and Cai =-=[29]-=-, Pepper and Sudholt [24, 31] and others to incorporate data distribution into the development process. A new aspect is the use of transformations with curried and uncurried functional forms; this lea... |

3 |
Systematic development of an SPMD implementation schema for mutually recursive divide-and-conquer specifications
- Gorlatch, Lengauer
- 1994
(Show Context)
Citation Context ...f recurrences by Cai and Skillicorn [4], tree algorithms by Gibbons [13], parsing by Cole [8], image processing by Harrison and Grant-Duff [15], divide-and-conquer by the author jointly with Lengauer =-=[14]-=-, etc. BMF seems especially promising in the area of parallel algorithms because many of its functionals have a natural parallel implementation. This aspect has been extensively studied by Skillicorn ... |

2 |
Mapping of uniform dependence algorithm onto fixed size processor arrays
- Chen, Shang
- 1993
(Show Context)
Citation Context ...nce between them, we present our square lists of lists [ [ff] p ] p in a two-dimensional setting for the following example which we will use throughout the paper. Let us multiply two polynomials: a = =-=[1; 3; 5; 7] and b = [-=-2; 4; 6; 8]. For (n-p-k)-partitioning with n = 4, p = 2, k = 2, the construction part yields the following pair of "matrices": ( a; b ) ! / [ [ [ 1; 3 ]; [ 5; 7 ] ]; [ [ 1; 3 ]; [ 5; 7 ] ] ]... |

2 | A family of dataparallel derivations
- Clint, Fitzpatrick, et al.
- 1994
(Show Context)
Citation Context ...at Imperial College [9], the group around Pepper [24, 25], Partsch and Geerling [23, 11], etc. 1 Closely related work is the Ruby system by Jones and Sheeran [18], the research carried out at Belfast =-=[6]-=-, the P 3 L project at Pisa [1], the KIDS system by Smith [30] and the functional approach by O'Donnell [22]. This paper reports results of a case study on the systematic use of BMF in the process of ... |

2 |
Formal derivation of SIMD parallelism from non-linear recursive specifications
- Geerling
- 1994
(Show Context)
Citation Context ...ypical templates of parallelism. Skeletons were introduced by Cole [7] and have been studied by the group of Darlington at Imperial College [9], the group around Pepper [24, 25], Partsch and Geerling =-=[23, 11]-=-, etc. 1 Closely related work is the Ruby system by Jones and Sheeran [18], the research carried out at Belfast [6], the P 3 L project at Pisa [1], the KIDS system by Smith [30] and the functional app... |

2 |
Program transformations and skeletons: formal derivation of parallel programs
- Geerling
- 1994
(Show Context)
Citation Context ...!s!sdistribute =) 0 @ ( [1; 3]; [2; 4] ) ( [5; 7]; [2; 4] ) ( [1; 3]; [6; 8] ) ( [5; 7]; [6; 8] ) 1 A + compute 0 B B @s[2; 10; 22; 34; 28] #s[6; 26; 54; 82; 56] 1 C C A map ( red-zipl-shift ) (= 0 @ =-=[2; 10; 12]-=- ! [10; 34; 28] [6; 26; 24] ! [30; 82; 56] 1 A + red-zipl-shift /s[2; 10; 28; 60; 82; 82; 56] ! 8 6 First Estimation of Complexity. In this section, we estimate the complexity of our parallel program.... |

1 |
et al. Efficient compilation of structured parallel programs for distributed memory MIMD machines
- Bacci, Danelutto
- 1994
(Show Context)
Citation Context ...oup around Pepper [24, 25], Partsch and Geerling [23, 11], etc. 1 Closely related work is the Ruby system by Jones and Sheeran [18], the research carried out at Belfast [6], the P 3 L project at Pisa =-=[1]-=-, the KIDS system by Smith [30] and the functional approach by O'Donnell [22]. This paper reports results of a case study on the systematic use of BMF in the process of parallel program development. W... |

1 |
list homomorhisms and parallel program transformation
- Skeletons
- 1994
(Show Context)
Citation Context ... Fast Fourier Transform (FFT) by G. Jones [17], calculation of recurrences by Cai and Skillicorn [4], tree algorithms by Gibbons [13], parsing by Cole [8], image processing by Harrison and Grant-Duff =-=[15]-=-, divide-and-conquer by the author jointly with Lengauer [14], etc. BMF seems especially promising in the area of parallel algorithms because many of its functionals have a natural parallel implementa... |

1 |
A correctness proof of parallel scan. Parallel Processing Letters
- O'Donnell
(Show Context)
Citation Context ...lated work is the Ruby system by Jones and Sheeran [18], the research carried out at Belfast [6], the P 3 L project at Pisa [1], the KIDS system by Smith [30] and the functional approach by O'Donnell =-=[22]. This pap-=-er reports results of a case study on the systematic use of BMF in the process of parallel program development. We are trying "to go all the way" from a mathematical specification of the alg... |