## Parallelization of Divide-and-Conquer by Translation to Nested Loops (1997)

Venue: | J. Functional Programming |

Citations: | 12 - 6 self |

### BibTeX

@ARTICLE{Herrmann97parallelizationof,

author = {Christoph A. Herrmann and Christian Lengauer},

title = {Parallelization of Divide-and-Conquer by Translation to Nested Loops},

journal = {J. Functional Programming},

year = {1997},

volume = {9},

pages = {9--3}

}

### Years of Citing Articles

### OpenURL

### Abstract

We propose a sequence of equational transformations and specializations which turns a divide-and-conquer skeleton in Haskell into a parallel loop nest in C. Our initial skeleton is often viewed as general divide-and-conquer. The specializations impose a balanced call tree, a fixed degree of the problem division, and elementwise operations. Our goal is to select parallel implementations of divide-and-conquer via a space-time mapping, which can be determined at compile time. The correctness of our transformations is proved by equational reasoning in Haskell; recursion and iteration are handled by induction. Finally, we demonstrate the practicality of the skeleton by expressing Strassen's matrix multiplication in it.

### Citations

2432 | The Design and Analysis of Computer Algorithms - Aho, Holpcroft, et al. - 1974 |

1307 | Lectures on constructive functional programming
- Bird
- 1989
(Show Context)
Citation Context ...gh to handle Strassen's matrix multiplication with distributed I/O data, aside from ours. There has been related work in our own group. First, there is work on the parallelization of the homomorphism =-=[4]-=-, a basic DC skeleton somewhat more restrictive than ours. There exists a theory for the transformational parallelization of homomorphisms [24, 10]. The class of distributable homomorphisms (DH ) [9] ... |

415 |
Algorithmic Skeletons: Structured Management of Parallel Computation
- Cole
- 1989
(Show Context)
Citation Context ...ng the solutions of the subproblems to get the solution of the original problem. Because of the wide applicability of the DC paradigm, it has often been formulated as a so-called algorithmic skeleton =-=[6, 7]-=-, which can be used as a basic building block for programming. One purpose of the skeleton concept is to provide the user with efficient implementations of popular paradigms. In this approach, the alg... |

372 |
Gaussian elimination is not optimal
- Strassen
- 1969
(Show Context)
Citation Context ...ed. In most cases, the degree of the problem division is 2. Examples of higher degrees are, e.g., Karatsuba's polynomial product [2, Sect. 2.6] with a degree of 3 and Strassen's matrix multiplication =-=[25, 14]-=- with a degree of 7. For some algorithms, the degree is not fixed. One example is the multiplication of large integers by Schonhage and Strassen using Fermat's numbers [22], where the division degree ... |

145 | Parallel Programming Using Skeleton Functions
- Darlington, Field, et al.
- 1992
(Show Context)
Citation Context ...ng the solutions of the subproblems to get the solution of the original problem. Because of the wide applicability of the DC paradigm, it has often been formulated as a so-called algorithmic skeleton =-=[6, 7]-=-, which can be used as a basic building block for programming. One purpose of the skeleton concept is to provide the user with efficient implementations of popular paradigms. In this approach, the alg... |

94 | Loop Parallelization in the Polytope Model
- Lengauer
- 1993
(Show Context)
Citation Context ...ory) space. Because the time component of all data dependence vectors is 1 and nests of an outer sequential and an inner parallel loop require global synchronization after each step of the outer loop =-=[18]-=-, it is sufficient to keep memory space just for two successive steps of the outer loop. 4 Instantiation with balanced data division and elementwise operations In this section, we instantiate the call... |

92 |
Schnelle Multiplikation grosser Zahlen
- Schönhage, Strassen
- 1971
(Show Context)
Citation Context ...s matrix multiplication [25, 14] with a degree of 7. For some algorithms, the degree is not fixed. One example is the multiplication of large integers by Schonhage and Strassen using Fermat's numbers =-=[22]-=-, where the division degree is approximately the square root of the input vector size. Whereas, for a particular division degree, a DC skeleton can be defined in Haskell (using tuples) and checked at ... |

69 |
Foundations of Parallel Programming
- Skillicorn
- 1994
(Show Context)
Citation Context ... there is work on the parallelization of the homomorphism [4], a basic DC skeleton somewhat more restrictive than ours. There exists a theory for the transformational parallelization of homomorphisms =-=[24, 10]-=-. The class of distributable homomorphisms (DH ) [9] corresponds to the combine phase of our skeleton dc4 with a binary divide function (this class is called C-algorithms in [11]). For all functions o... |

63 |
editors. Report on the Programming Language Haskell 98. A non-strict Purely Functional Language
- Jones, Hughes
- 1999
(Show Context)
Citation Context ...alanced call tree. 2 Specializing DC In this section, we propose a sequence of specializations of a skeleton for general divideand -conquer. We denote our skeletons in the functional language Haskell =-=[16]-=-. First, we 3 present a general form of DC which is then specialized to enforce a balanced call tree, and subsequently further to enforce a fixed degree of the problem division. 2.1 General DC (dc0) U... |

59 | Powerlist: A structure for parallel recursion
- Misra
- 1994
(Show Context)
Citation Context ...nterface, one might use our fast, parallel C program for the skeleton and still keep its parameters, the customizing functions, in Haskell. Aside from [15], there is other work related to ours: Misra =-=[19]-=-, and Achatz and Schulte [1] restrict themselves to a binary division of data and problems. 29 Mou's [20] approach allows an arbitrary division of problems and a division of multidimensional data into... |

54 |
Algorithmic Language and Program Development
- Bauer, Wössner
- 1982
(Show Context)
Citation Context ...ep new input data (for the subproblems) is generated, the input data is stored in a stack. In the world of sequential processing, a lot of effort has been put into the theory of recursion elimination =-=[21, 3]. One of t-=-he reasons is to avoid the overhead of copying data onto a stack. Here we do not have this problem: the data to be stored in each processor is very small because the "stack" is distributed !... |

27 | Systematic efficient parallelization of scan and other list homomorphisms
- Gorlatch
- 1996
(Show Context)
Citation Context ... [4], a basic DC skeleton somewhat more restrictive than ours. There exists a theory for the transformational parallelization of homomorphisms [24, 10]. The class of distributable homomorphisms (DH ) =-=[9]-=- corresponds to the combine phase of our skeleton dc4 with a binary divide function (this class is called C-algorithms in [11]). For all functions of the DH class, a common hypercube implementation ca... |

19 |
On the synthesis of parallel programs from tensor product formulas for block recursive algorithms
- Gupta, Huang, et al.
(Show Context)
Citation Context ...lel nested loop program for a class of DC algorithms. From dc2 on, each specialized skeleton can be implemented by a parallel loop program, representing a different class of DC problems. Huang et al. =-=[15]-=- have presented a derivation of a parallel implementation of Strassen's matrix multiplication algorithm using tensor product formulas. The result is a loop program similar to ours, but with a nesting ... |

17 | Systematic extraction and implementation of divide-and-conquer parallelism
- Gorlatch
- 1996
(Show Context)
Citation Context ... there is work on the parallelization of the homomorphism [4], a basic DC skeleton somewhat more restrictive than ours. There exists a theory for the transformational parallelization of homomorphisms =-=[24, 10]-=-. The class of distributable homomorphisms (DH ) [9] corresponds to the combine phase of our skeleton dc4 with a binary divide function (this class is called C-algorithms in [11]). For all functions o... |

17 |
Deriving Parallel Programs from Specifications using Cost
- Skillicorn
- 1993
(Show Context)
Citation Context ... function. The (Haskell) semantics of the given source skeleton dc3 and the (functional) target skeleton it3 are equal; the difference is in the efficiency, e.g., with respect to a cost calculus like =-=[23]-=-. 3.5 Transformation to C In this subsection, we transform the functional target skeleton it3 into an imperative skeleton in C. We use correspondences of data structures resp. control structures betwe... |

12 |
Divacon: A parallel language for scientific computing based on divide-and-conquer
- Mou
- 1990
(Show Context)
Citation Context ...customizing functions, in Haskell. Aside from [15], there is other work related to ours: Misra [19], and Achatz and Schulte [1] restrict themselves to a binary division of data and problems. 29 Mou's =-=[20]-=- approach allows an arbitrary division of problems and a division of multidimensional data into two parts per dimension, but does not say anything about a higher division degree. Cole [6] restricts hi... |

10 |
Compile-time transformations and optimization of parallel divide-and-conquer algorithms
- Carpentieri, Mou
- 1991
(Show Context)
Citation Context ... b 0 to b m \Gamma1 , where b 0 = a 0 and (8 i : 0! i !m : b i = b i \Gamma1 `op`a i ). The scan function is a useful auxiliary function in many parallel algorithms, especially sorting algorithms. In =-=[5], a parall-=-el algorithm for scan is presented which fits into skeleton dc4 after applying a method called "broadcast elimination". We present the program for scan below with the additional parameter n ... |

9 | Architecture independent massive parallelization of divide-and-conquer algorithms
- Achatz, Schulte
- 1995
(Show Context)
Citation Context ...ast, parallel C program for the skeleton and still keep its parameters, the customizing functions, in Haskell. Aside from [15], there is other work related to ours: Misra [19], and Achatz and Schulte =-=[1]-=- restrict themselves to a binary division of data and problems. 29 Mou's [20] approach allows an arbitrary division of problems and a division of multidimensional data into two parts per dimension, bu... |

6 |
Pipelines for divide-and-conquer functions
- Harrison, Medina
- 1993
(Show Context)
Citation Context ...rst traversal [21]. This method is not very useful in a parallelization, where a breadth-first traversal is called for. The parallelization technique of Harrison and Khoshnevisan [12] can be extended =-=[8]-=- to handle DC if certain conditions are fulfilled. We start with similar considerations and develop a method for translating DC into a nested linear recursive skeleton, which can easily be interpreted... |

5 | Formal Derivation of Divide-and-Conquer Programs: A Case Study in the Multidimensional FFT’s
- Gorlatch, Bischof
- 1997
(Show Context)
Citation Context ... of homomorphisms [24, 10]. The class of distributable homomorphisms (DH ) [9] corresponds to the combine phase of our skeleton dc4 with a binary divide function (this class is called C-algorithms in =-=[11]). For all-=- functions of the DH class, a common hypercube implementation can be derived by transformation in the Bird-Meertens formalism [9]. The class of "static DC" [11] is an analog of our dc3 skele... |

4 | On the space-time mapping of a class of divide-and-conquer recursions
- Herrmann, Lengauer
- 1996
(Show Context)
Citation Context ...j can be merged. Then, the multiplication of the number of calls (enumerated by p) is compensated by a diminishing number of data elements involved in each call (enumerated by j). In a previous paper =-=[13]-=-, we have presented a geometrical model which illustrated this compensation. E.g., in the case of a binary problem division, the parallel execution of our target skeleton can be interpreted geometrica... |

3 |
A new approach to recursion removal
- Harrison, Khoshnevisan
- 1992
(Show Context)
Citation Context ...s based on a depth-first traversal [21]. This method is not very useful in a parallelization, where a breadth-first traversal is called for. The parallelization technique of Harrison and Khoshnevisan =-=[12]-=- can be extended [8] to handle DC if certain conditions are fulfilled. We start with similar considerations and develop a method for translating DC into a nested linear recursive skeleton, which can e... |

2 | Flexible program and architecture specification for massively parallel systems
- Kindermann
- 1994
(Show Context)
Citation Context ...to provide the user with efficient implementations of popular paradigms. In this approach, the algorithmic skeleton for a paradigm corresponds to an executable, but unintuitive architectural skeleton =-=[17]-=-. To make the correspondence between the algorithmic and the architectural skeleton formally precise, we work in the domain of functional programming, in which skeletons are predefined higher-order po... |

1 |
A family of rules for recursion removal
- Partsch, Pepper
- 1976
(Show Context)
Citation Context ...rameters of this application as well as the other functions in E cannot express B. Early work on transforming recursion with dependent calls into sequential loops was based on a depth-first traversal =-=[21]-=-. This method is not very useful in a parallelization, where a breadth-first traversal is called for. The parallelization technique of Harrison and Khoshnevisan [12] can be extended [8] to handle DC i... |