## Optimizing Compositions of Scans and Reductions in Parallel Program Derivation (1997)

Citations: | 9 - 2 self |

### BibTeX

@TECHREPORT{Gorlatch97optimizingcompositions,

author = {Sergei Gorlatch},

title = {Optimizing Compositions of Scans and Reductions in Parallel Program Derivation},

institution = {},

year = {1997}

}

### OpenURL

### Abstract

Introduction We study two popular programming schemas: scan (also known as prefix sums, parallel prefix, etc.) and reduction (also known as fold). Originally from the functional world [3], they are becoming increasingly popular as primitives of parallel programming. The reasons are that, first, such higher-order combinators are adequate and useful for a broad class of applications [4], second, they encourage well-structured, coarse-grained parallel programming and, third, their implementation in the MPI standard [14] makes the target programs portable across different parallel architectures with predictable performance. Our contributions are as follows: -- We formally prove two optimization rules: the first rule transforms a sequential composition of scan and reduction into a single reduction, the second rule transforms a composition of two scans into a single scan. -- We apply the first rule in the formal derivation of a parallel algorithm for the

### Citations

1311 | Introduction to Functional Programming
- Bird, Wadler
- 1988
(Show Context)
Citation Context ... segment sum. 1 Introduction We study two popular programming schemas: scan (also known as prefix sums, parallel prefix, etc.) and reduction (also known as fold). Originally from the functional world =-=[3]-=-, they are becoming increasingly popular as primitives of parallel programming. The reasons are that, first, such higher-order combinators are adequate and useful for a broad class of applications [4]... |

301 | Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire
- MEIJER, FOKKINGA, et al.
- 1991
(Show Context)
Citation Context ...ons demonstrate the elegance and power of the calculational BMF approach in parallel programming. A common ground for homomorphism fusion is given by the category-theoretical results on catamorphisms =-=[10]-=-. Our Theorems 4 and 5 provide a direct method for testing the applicability of the transformations and computing their results. A version of Theorem 4 in the sequential setting was proved and used in... |

161 |
2000), Programming Pearls
- Bentley
(Show Context)
Citation Context ...Gamma , ! \Phi;\Omega ? are defined by (17),(18), (20) respectively. 5 5 Application to the Maximum Segment Sum Problem We consider the famous maximum segment sum (mss) problem -- a programming pearl =-=[1]-=-, studied by many authors [3,6,11--13]. Given a list of numbers, function mss finds a contiguous list segment whose members have the largest sum among all such segments and returns this sum. For examp... |

157 | Scans as Primitive Parallel Operations
- Blelloch
- 1989
(Show Context)
Citation Context ...[3], they are becoming increasingly popular as primitives of parallel programming. The reasons are that, first, such higher-order combinators are adequate and useful for a broad class of applications =-=[4]-=-, second, they encourage well-structured, coarse-grained parallel programming and, third, their implementation in the MPI standard [14] makes the target programs portable across different parallel arc... |

69 |
Foundations of Parallel Programming
- Skillicorn
- 1994
(Show Context)
Citation Context ...minating extra synchronization and improving performance. Preprint submitted to Elsevier Preprint 30 April 1997 2 Notation and Basic Results We adopt the notation of the Bird-Meertens formalism (BMF) =-=[3,11]-=- and restrict ourselves to the total functions over non-empty lists. Such functions are defined by giving their value on a one-element list, say [a], and then inductively, using list concatenation ++ ... |

58 |
Algebraic identities for program calculation
- Bird
- 1989
(Show Context)
Citation Context ... Our Theorems 4 and 5 provide a direct method for testing the applicability of the transformations and computing their results. A version of Theorem 4 in the sequential setting was proved and used in =-=[2,13]-=-. In [5], a version of red (! \Phi;\Omega ?), called recur-reduce, is used for parallelizing linear recurrences. An approach dual to ours is taken in our own group [15]: a foldl with an analogue of h\... |

42 |
Parallel programming with list homomorphisms
- Cole
- 1995
(Show Context)
Citation Context ...efined by (17),(18), (20) respectively. 5 5 Application to the Maximum Segment Sum Problem We consider the famous maximum segment sum (mss) problem -- a programming pearl [1], studied by many authors =-=[3,6,11--13]-=-. Given a list of numbers, function mss finds a contiguous list segment whose members have the largest sum among all such segments and returns this sum. For example: mss [ 2; \Gamma4; 2; \Gamma1; 6; \... |

27 | Systematic efficient parallelization of scan and other list homomorphisms - Gorlatch - 1996 |

24 |
Applications of a strategy for designing divide-and-conquer algorithms
- Smith
- 1987
(Show Context)
Citation Context ...allel solutions: an application of Theorem 4 for suffix leads to a solution similar to [5], which, by a further application of Theorem 4 for prefix, is improved and transformed into the solution from =-=[6,12]-=-. Our derivation combines the advantages of both solutions: it is carried out without thinking about the meaning of the involved operators, and results in just one global parallel operation. The deriv... |

17 | Systematic extraction and implementation of divide-and-conquer parallelism
- Gorlatch
- 1996
(Show Context)
Citation Context ...ormat (4), because functions different from prefred are applied to x and y . This indicates that prefred is not directly a homomorphism. To cure that, we "massage" prefred into an almost-hom=-=omorphism [8]-=-, by tupling it with red as an auxiliary function: prefred 0 (\Phi;\Omega ) def = (prefred (\Phi;\Omega ) ; red (\Omega )) (14) Note that prefred 0 yields a pair, where prefred is the first component:... |

14 | Virtual data structures
- Swierstra, Moor
- 1992
(Show Context)
Citation Context ... Our Theorems 4 and 5 provide a direct method for testing the applicability of the transformations and computing their results. A version of Theorem 4 in the sequential setting was proved and used in =-=[2,13]-=-. In [5], a version of red (! \Phi;\Omega ?), called recur-reduce, is used for parallelizing linear recurrences. An approach dual to ours is taken in our own group [15]: a foldl with an analogue of h\... |

13 |
Calculating Recurrences Using the Bird-Meertens Formalism
- Cai, Skillicorn
- 1992
(Show Context)
Citation Context ...ir = f Eq. (7) g red (") ffi maps1 ffi pref i !"; + ? j ffi map pair We have arrived at an expression for mss which can be implemented in time O (n) sequentially and O (log n) in parallel. D=-=erivation [5] stops here, but we -=-can actually advance further: -- Extend operator " to pairs: (a; b) * (c; d) def = (a " c ; b " d) and note that * is associative and also that:s1 ffi red (*) = red (") ffi map 1 (... |

4 |
et al. Introduction to Parallel Computing
- Kumar
- 1994
(Show Context)
Citation Context ...at can provide a substantial performance improvement on contemporary parallel machines, especially taking into account recent advances in the hardware implementation of the MPI-like global operations =-=[9]-=-. We have given yet another derivation for the maximum segment sum problem. In the sequential setting, related work was done in [2,13]. We have established a connection between two parallel solutions:... |