## Optimal Tiling (1994)

Citations: | 5 - 1 self |

### BibTeX

@MISC{Andonov94optimaltiling,

author = {Rumen Andonov and Sanjay Rajopadhye},

title = {Optimal Tiling},

year = {1994}

}

### OpenURL

### Abstract

Iteration space tiling is a common strategy used by parallelizing compilers to reduce communication overhead. We address the problem of determining the optimal tile size (which minimizes the total execution time of the program), for a particular program schema. We use a realistic model of the architecture which accounts for coprocessors that permit overlapping of communication and computation, context switching times, etc. Determining the optimal tile size is shown to reduce to a non-linear optimization problem. We solve this analytically, yielding a closed form solution that involves only parameters of the architecture and program that are easily determined at compile time. It can thus be used by a compiler before code generation. Although we solve the problem for a particular schema of programs, our results can be generalized to uniform dependence loops and also to certain classes of loop programs with dynamic dependence vectors.

### Citations

91 |
Partitioning and mapping of algorithms into fixed size systolic arrays
- Moldovan, Fortes
- 1986
(Show Context)
Citation Context ...t on distributed memory architectures because of the startup overhead of inter-processor communications. One obvious approach to improving the performance is to partition the computations into blocks =-=[MF86]-=- so that message startup costs are amortized. In particular, we desire a partitioning of the rectangular domain, fj; k j j = 0; . . . ; c; k = 0; . . . ; mg into rectangular blocks of size (r; s) whic... |

51 | Pen)-ultimate tiling
- Boulet, Darte, et al.
- 1994
(Show Context)
Citation Context ... determining the shape of the tile. Indeed, Boulet et al. recently gave an eloquent argument for separating the tiling problem into two steps--- first determine the tile shape, and then the tile size =-=[BDRR93]-=-. They show how the Optimal Tiling 3 first problem can be reduced to a linear optimization problem using an architectureindependent performance model. The second step is not addressed. Our work focuse... |

46 |
Synthesizing systolic arrays from recurrence equations
- Rajopandye, Fujimoto
- 1990
(Show Context)
Citation Context ...complexity and corresponds to a basic block (loop body in a nested loop program). It may require multiple arguments, but each one has the form X(Az+a+W (z)). The Az+a is a classical affine dependency =-=[RF90], but the -=-presence of a data variable W (z) means that the dependency itself is not known statically. Such recurrences are said to have dynamic or "run-time" dependencies [Meg93]. Typical f 0 (6) f 0 ... |

35 | Tiling multidimensional iteration spaces for non share memory machines - Ramanujam, Sadayappan - 1991 |

30 |
Pipelined data-parallel algorithms: Part II - Design
- King, Chou, et al.
- 1990
(Show Context)
Citation Context ...nning programs on rings. Their result however, is not in closed form but requires a simulation step to determine the optimal tile size. Our results are also closely related to the work of King et al. =-=[KCN90]-=-, who determine a similar optimal tile size, using total completion time as the cost measure. However, their architecture model does not accurately model the overlap between computation and communicat... |

25 | Matrix Factorization on a Hypercube Multiprocessor - Geist, Heath - 1985 |

12 | Networks for Parallel Processors: Measurements and Prognostications - Grunwald, Reed - 1988 |

9 |
Procdures de base pour le calcul scientifique sur machines parallles mmoire distribue
- Desprez
- 1994
(Show Context)
Citation Context ...n each node (this implies that for any call to a communication routine, the copy cannot be avoided). However, Desprez notes that empirically the times are very similar to those predicted by our model =-=[Des94]-=-, and it would be interesting to extend the model to such machines, as well as to SIMD machines. Although our results have been developed for a specific program (schema), the same approach can be used... |

9 | Partitionnement deboucles imbriquees, une technique d'optimisation pour les programmes scienti ques - Irigoin - 1995 |

7 | An Optimal Algo-tech-cuit for the Knapsack Problem
- Andonov, Rajopadhye
- 1994
(Show Context)
Citation Context ... 1: Two instances of (1). Observe how the graphs vary with the w i 's 2 Rumen Andonov, Sanjay Rajopadhye examples are dynamic programming algorithm(s) for the knapsack problem(s) and their variations =-=[AR]-=-. Many other problems such as finding longest common subsequence of strings, dynamic time warping, string comparison, etc., also have similar recurrences but without dynamic dependencies. In this pape... |

7 |
Mapping a class of run-time dependencies onto regular arrays
- Megson
- 1993
(Show Context)
Citation Context ...ical affine dependency [RF90], but the presence of a data variable W (z) means that the dependency itself is not known statically. Such recurrences are said to have dynamic or "run-time" dep=-=endencies [Meg93]-=-. Typical f 0 (6) f 0 (5) f 0 (4) f 0 (3) f 0 (2) f 0 (1) f 0 (0) f 3 (2) f 3 (3) f 3 (4) f 3 (5) f 3 (6) f 3 (0) f 3 (1) \Gamma1 \Gamma1 \Gamma1 j k w 1 = 4 w 2 = 3 w 3 = 2 f 0 (6) f 0 (5) f 0 (4) f ... |

7 | parallelization on distributed memory machines: Problem statement - Loop - 1993 |

5 |
Path planning on a ring of processors
- Miguet, Robert
- 1990
(Show Context)
Citation Context ...en, and is a logical extension of their work. We are interested in optimizing the global performance--- the execution time for the whole program. This approach is similar to that of Miguet and Robert =-=[MR90]-=-, who deal with implementing path planning programs on rings. Their result however, is not in closed form but requires a simulation step to determine the optimal tile size. Our results are also closel... |

2 |
Parallel Algorithms for Discrete Optimization Problems
- Aleksandrov
- 1993
(Show Context)
Citation Context ...tion of the optimization problem. Aleksandrov also studies the tile size problem, but for a restrictive model (no overlap between computation and communication) and under some simplifying constraints =-=[Ale93]-=-. Another aspect of our results is that, as mentioned above, our recurrence has dynamic dependencies. We will see that this affects only the tile shape, and our analytical results can be easily adapte... |

1 | Border tiling: for efficient loop execution on distributed memory machines - Ding, Dongen - 1993 |

1 | Iteration space tiling for memory hierarchies. Parallel Processing for Scientific Computing (SIAM), 357--361 - Wolfe - 1987 |