## On Optimizing A Class Of Multi-Dimensional Loops With Reductions For Parallel Execution (1997)

Venue: | Parallel Processing Letters |

Citations: | 29 - 23 self |

### BibTeX

@ARTICLE{Lam97onoptimizing,

author = {Chi-Chung Lam and P. Sadayappan and Rephael Wenger},

title = {On Optimizing A Class Of Multi-Dimensional Loops With Reductions For Parallel Execution},

journal = {Parallel Processing Letters},

year = {1997},

volume = {7},

pages = {157--168}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper addresses the compile-time optimization of a form of nested-loop computation that is motivated by a computational physics application. The computations involve multi-dimensional surface and volume integrals where the integrand is a product of a number of array terms. Besides the issue of optimal distribution of the arrays among the processors, there is also scope for reordering of the operations using the commutativity and associativity properties of addition and multiplication, and the application of the distributive law to significantly reduce the number of operations executed. A formalization of the operation minimization problem and proof of its NPcompleteness is provided. A pruning search strategy for determination of an optimal form is developed. An analysis of the communication requirements and a polynomial-time algorithm for determination of optimal distribution of the arrays are also provided. Keywords: loop parallelization, operation minimization, communication op...

### Citations

11508 |
Computers and Intractability: A Guide to the Theory of NPCompleteness
- Garey, Johnson
- 1979
(Show Context)
Citation Context ...oblem can be defined as follows. Given a finite set A, a size s(a) Z + for each , and a positive integer y, determine whether there exists a subset such that . This problem is known to be NP-complete =-=[7]-=-. The Product Partition problem is similar. Given a finite set B and a size s'(b) Z + for each , the Product Partition problem asks whether there exists a subset such that . The Subset Product problem... |

701 |
High Performance Compilers for Parallel Computing
- Wolfe
- 1996
(Show Context)
Citation Context ... j N k N t operations. The nested loop computes the discrete function (For simplicity, we use to denote ): If the above loop were input to an optimizing compiler, it would perform dependence analysis =-=[13]-=- on the loop and determine that the innermost (t-loop) was an independent loop and that the other three loops involved dependences due to reduction operations. Although the loop could be parallelized,... |

99 | Optimizing for parallelism and data locality
- Kennedy, McKinley
- 1992
(Show Context)
Citation Context ...xpression elimination [6]. Other approaches to reduce operation count can be found in [11,15]. Loop transformations that improve locality and parallelism have been studied extensively in recent years =-=[9,14]-=-. The optimal alignment of arrays in evaluating array expression on data parallel architectures is considered in [4,5]. However, we are not aware of any work that considers loop transformation togethe... |

91 |
Automatic array alignment in data-parallel programs
- Chatterjee, Gilbert, et al.
- 1993
(Show Context)
Citation Context ...t improve locality and parallelism have been studied extensively in recent years [9,14]. The optimal alignment of arrays in evaluating array expression on data parallel architectures is considered in =-=[4,5]-=-. However, we are not aware of any work that considers loop transformation together with the application of the distributive law in order to minimize the amount of computation in nested loops. Section... |

89 | Complete register allocation problems
- Sethi
- 1973
(Show Context)
Citation Context ... that the operation minimization problem formalized in the previous section is NP-complete. NP-completeness results for optimal code generation of arithmetic expressions are discussed in [1], [3] and =-=[16]-=-. However, those results focus on the problem of minimizing load operations to registers. In addition, the arithmetic expressions considered there are much more general than the restricted class of su... |

52 |
Code generation for expressions with common subexpressions
- Aho, Johnson
(Show Context)
Citation Context ...tion we prove that the operation minimization problem formalized in the previous section is NP-complete. NP-completeness results for optimal code generation of arithmetic expressions are discussed in =-=[1]-=-, [3] and [16]. However, those results focus on the problem of minimizing load operations to registers. In addition, the arithmetic expressions considered there are much more general than the restrict... |

51 |
Arithmetic complexity of computations
- Winograd
- 1980
(Show Context)
Citation Context ...er. Reduction of arithmetic operations has been traditionally done by compilers using the technique of common subexpression elimination [6]. Other approaches to reduce operation count can be found in =-=[11,15]-=-. Loop transformations that improve locality and parallelism have been studied extensively in recent years [9,14]. The optimal alignment of arrays in evaluating array expression on data parallel archi... |

50 | Optimal Evaluation of Array Expressions on Massively Parallel Machines
- Chatterjee, Gilbert, et al.
- 1995
(Show Context)
Citation Context ...t improve locality and parallelism have been studied extensively in recent years [9,14]. The optimal alignment of arrays in evaluating array expression on data parallel architectures is considered in =-=[4,5]-=-. However, we are not aware of any work that considers loop transformation together with the application of the distributive law in order to minimize the amount of computation in nested loops. Section... |

32 |
Code generation for a one-register machine
- BRUNO, SETHI
- 1976
(Show Context)
Citation Context ...we prove that the operation minimization problem formalized in the previous section is NP-complete. NP-completeness results for optimal code generation of arithmetic expressions are discussed in [1], =-=[3]-=- and [16]. However, those results focus on the problem of minimizing load operations to registers. In addition, the arithmetic expressions considered there are much more general than the restricted cl... |

12 |
Parallel Implementation of Quasiparticle Calculations of Semiconductors and Insulators
- Aulbur
- 1996
(Show Context)
Citation Context ...imization of a particular form of nested loop computations motivated by certain multi-dimensional integral calculations in some computational physics codes modeling electronic properties of materials =-=[2, 8, 12]-=-. In addition to the issue of mapping of data and computations to optimize performance, there is also a need to optimize the total number of arithmetic operations by judiciously applying the distribut... |

11 |
A Data Locality Algorithm
- Wolf, Lam
- 1991
(Show Context)
Citation Context ...xpression elimination [6]. Other approaches to reduce operation count can be found in [11,15]. Loop transformations that improve locality and parallelism have been studied extensively in recent years =-=[9,14]-=-. The optimal alignment of arrays in evaluating array expression on data parallel architectures is considered in [4,5]. However, we are not aware of any work that considers loop transformation togethe... |

7 |
Solving bigger problems ] By decreasing the operation count and increasing the computation bandwidth
- Miller
(Show Context)
Citation Context ...er. Reduction of arithmetic operations has been traditionally done by compilers using the technique of common subexpression elimination [6]. Other approaches to reduce operation count can be found in =-=[11,15]-=-. Loop transformations that improve locality and parallelism have been studied extensively in recent years [9,14]. The optimal alignment of arrays in evaluating array expression on data parallel archi... |

3 |
Space-Time Method for Ab-Initio
- Rojas, Godby, et al.
- 1995
(Show Context)
Citation Context ...imization of a particular form of nested loop computations motivated by certain multi-dimensional integral calculations in some computational physics codes modeling electronic properties of materials =-=[2, 8, 12]-=-. In addition to the issue of mapping of data and computations to optimize performance, there is also a need to optimize the total number of arithmetic operations by judiciously applying the distribut... |

2 |
Leblanc Jr. Crafting a Compiler. Menlo Park
- Fischer, J
- 1991
(Show Context)
Citation Context ...lel machine is desirable. These issues are addressed in this paper. Reduction of arithmetic operations has been traditionally done by compilers using the technique of common subexpression elimination =-=[6]-=-. Other approaches to reduce operation count can be found in [11,15]. Loop transformations that improve locality and parallelism have been studied extensively in recent years [9,14]. The optimal align... |