Results 1 -
2 of
2
Optimization Rules for Programming with Collective Operations
- IPPS/SPDP'99. 13th Int. Parallel Processing Symp. & 10th Symp. on Parallel and Distributed Processing
, 1999
"... We study how several collective operations like broadcast, reduction, scan, etc. can be composed efficiently in complex parallel programs. Our specific contributions are: (1) a formal framework for reasoning about collective operations; (2) a set of optimization rules which save communications by fu ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
We study how several collective operations like broadcast, reduction, scan, etc. can be composed efficiently in complex parallel programs. Our specific contributions are: (1) a formal framework for reasoning about collective operations; (2) a set of optimization rules which save communications by fusing several collective operations into one; (3) performance estimates, which guide the application of optimization rules depending on the machine characteristics; (4) a simple case study with the first results of machine experiments.
(De)Composition Rules for Parallel Scan and Reduction
- In Proc. 3rd Int. Working Conf. on Massively Parallel Programming Models (MPPM'97
, 1998
"... We study the use of well-defined building blocks for SPMD programming of machines with distributed memory. Our general framework is based on homomorphisms, functions that capture the idea of dataparallelism and have a close correspondence with collective operations of the MPI standard, e.g., scan an ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We study the use of well-defined building blocks for SPMD programming of machines with distributed memory. Our general framework is based on homomorphisms, functions that capture the idea of dataparallelism and have a close correspondence with collective operations of the MPI standard, e.g., scan and reduction. We prove two composition rules: under certain conditions, a composition of a scan and a reduction can be transformed into one reduction, and a composition of two scans into one scan. As an example of decomposition, we transform a segmented reduction into a composition of partial reduction and all-gather. The performance gain and overhead of the proposed composition and decomposition rules are assessed analytically for the hypercube and compared with the estimates for some other parallel models.

