## Index Set Splitting (1999)

Venue: | International Journal of Parallel Programming |

Citations: | 11 - 1 self |

### BibTeX

@ARTICLE{Griebl99indexset,

author = {Martin Griebl and Paul Feautrier and Christian Lengauer},

title = {Index Set Splitting},

journal = {International Journal of Parallel Programming},

year = {1999},

volume = {28},

pages = {607--631}

}

### Years of Citing Articles

### OpenURL

### Abstract

There are many algorithms for the space-time mapping of nested loops. Some of them even make the optimal choices within their framework. We propose a preprocessing phase for algorithms in the polytope model, which extends the model and yields space-time mappings whose schedule is, in some cases, orders of magnitude faster. These are cases in which the dependence graph has small irregularities. The basic idea is to split the iteration domain of the loop nests into parts with a regular dependence structure and apply the existing space-time mapping algorithms to these parts individually. This work is based on a seminal idea in the more limited context of loop parallelization at the code level. We elevate the idea to the model level (our model is the polytope model), which increases its applicability by providing a clearer and wider range of choices at an acceptable analysis cost. Index set splitting is one facet in the e ort to extend the power of the polytope model and to en...

### Citations

876 |
An introduction to the theory of numbers
- Hardy, Wright
- 1979
(Show Context)
Citation Context ...consider instead the obvious minorant: 0 (m) = m Y pm (1 1=p) 19 One can prove that 0 is non-decreasing and that: 0 (x) esx log x ; wheresis Euler's constant (see Theorem 428 of Hardy/Wright [HW90]). Thus, we can dene L 0 by: L 0 = maxfm j 0 (m) ng; and either use this value in the above algorithm or compute a tighter value of L: L = maxfm j m L 0 ; (m) ng: Recall that n is the number... |

447 |
Optimizing Supercompilers for Supercomputers
- Wolfe
- 1989
(Show Context)
Citation Context ...ules found are minima of a nite set of ane functions, and most piecewise ane schedules cannot be cast in this form. Example 1 is a case in point. The idea of index set splitting goes back to Wolfe [Wo=-=l89-=-], and further to Allen/Kennedy [AK87] and Banerjee [Ban79]. Our method expands on these seminal eorts by incorporating them into the polytope model. The work most closely related is by Jemni and Mahj... |

296 | Automatic translation of Fortran programs to vector form
- Allen, Kennedy
- 1987
(Show Context)
Citation Context ...f ane functions, and most piecewise ane schedules cannot be cast in this form. Example 1 is a case in point. The idea of index set splitting goes back to Wolfe [Wol89], and further to Allen/Kennedy [A=-=K87-=-] and Banerjee [Ban79]. Our method expands on these seminal eorts by incorporating them into the polytope model. The work most closely related is by Jemni and Mahjoub [MJ95, MJ96] and deals with parti... |

97 | Loop parallelization in the polytope model - Lengauer - 1993 |

76 |
Eliminating false data dependences using the Omega test
- Pugh, Wonnacott
- 1992
(Show Context)
Citation Context ... the vertical bar by relation union, and the Kleene star by transitive closure. In suitable cases, one can compute the composite dependence in closed form by using, for instance, the Omega calculator =-=[PW9-=-2]. However, since the transitive closure of an ane relation is not always ane, the above computation does not always succeed. Our proposal is to ignore a composite dependence when a closed form canno... |

49 | Automatic parallelization in the polytope model - Feautrier - 1996 |

40 |
Speedup of Ordinary Programs
- Banerjee
- 1979
(Show Context)
Citation Context ...d most piecewise ane schedules cannot be cast in this form. Example 1 is a case in point. The idea of index set splitting goes back to Wolfe [Wol89], and further to Allen/Kennedy [AK87] and Banerjee [=-=Ban79-=-]. Our method expands on these seminal eorts by incorporating them into the polytope model. The work most closely related is by Jemni and Mahjoub [MJ95, MJ96] and deals with partitioning the index set... |

35 | Linear scheduling is nearly optimal
- Darte, Khachiyan, et al.
- 1991
(Show Context)
Citation Context ...ents. A short version has been presented at the Int. Conf. on Parallel Architectures and Compilation Techniques (PACT'99) [GFL99]. 1 For the case of uniform dependences, Darte, Khachiyan and Robert [=-=DKR91-=-] proved that there are methods which yield a schedule with (asymptotically) optimal latency. However, for the case of ane, non-uniform dependences, this optimum is sometimes missed by orders of magni... |

27 |
The systematic design of systolic arrays
- Quinton
- 1987
(Show Context)
Citation Context ...ality condition: 8u; v 2 : usv ) (u) + 1 (v) (1) As a matter of fact, one can omit the integrity condition on schedules since, if satises the causality condition, then so does 0 (u) = b(u)c [Qui87]. From a causal schedule, one can deduce a parallel program whose gure of merit is its latency: L = max u2 (u) min u2 (u) The latency can be interpreted either as the running time on a parallel com... |

23 |
The Languages of Machines: An Introduction to Computability and Formal Languages
- Floyd, Beigel
- 1994
(Show Context)
Citation Context ...y by associating a nite automaton with the dependence graph. The states of this automaton are the statements and the transitions are the dependences. There are well known algorithms (e.g., by Kleene [=-=FB94]-=-) for associating any two states S and T with a regular expression representing all paths from S to T . The letters in this regular expression are dependence names, and the operators are the dot (conc... |

20 |
Dataow analysis of array and scalar references
- Feautrier
- 1991
(Show Context)
Citation Context ...gate every initial split along the path descriptions of the statement dependence graph. The rst task requires no heuristics it can be computed precisely with standard methods of the polytope model [F=-=ea91-=-]. The second task is solved with methods outside the classical polytope model, and incurs a possible loss in precision, for the following reason: in eect, we compute precisely the images of all paths... |

19 | Some ecient solutions to the ane scheduling problem, Part II, Multidimensional time - Feautrier - 1992 |

15 | Communication-Minimal Partitioning of Parallel Loops and Data Arrays for Cache-Coherent Distributed-Memory Multiprocessors
- Barua, Kranz, et al.
- 1996
(Show Context)
Citation Context ...on the index sets. Tiling is still a very active research area [FGRT98]. However, the goal of tiling has been either to increase granularity (e.g., [ARY98]), or to block for cache optimization (e.g., =-=[BKA97]-=-), or simply to map virtual processors to real processors. In all these cases, the idea is to enumerate the given index set in a higher-dimensional space: one set of dimensions for the tiles and anoth... |

12 |
Optimal orthogonal tiling
- Andonov, Rajopadhye, et al.
- 1998
(Show Context)
Citation Context ...ry similar to tiling [AI91]: both techniques partition the index sets. Tiling is still a very active research area [FGRT98]. However, the goal of tiling has been either to increase granularity (e.g., =-=[ARY98]-=-), or to block for cache optimization (e.g., [BKA97]), or simply to map virtual processors to real processors. In all these cases, the idea is to enumerate the given index set in a higher-dimensional ... |

8 | Optimal ne and medium grain parallelism detection in polyhedral reduced dependence graphs - Darte, Vivien - 1997 |

3 | The loop parallelizer LooPo --- announcement
- Griebl, Lengauer
- 1997
(Show Context)
Citation Context ...re experiments on practical examples with our prototype will have to conrm this observation. For this purpose, we are currently implementing the described methods in the prototype parallelizer LooPo [=-=GL9-=-7]. This implementation takes as input the results of the existing dependence analysis modules and is itself very close to the algorithm in Section 7, with the following technical modications: In Ste... |

2 | Darte and Fr#d#ric Vivien. Revisiting the decomposition of - Alain - 1995 |

2 | Restructuring and parallelizing a static conditional loop - Mahjoub, Jemni - 1995 |

1 |
A precise xpoint reaching denition analysis for arrays
- Collard, Griebl
- 1999
(Show Context)
Citation Context ...schedule with more elaborate methods only those strongly connected components of the statement dependence graph which have the highest latency. Other approaches like iterative array data ow analysis [=-=CG99]-=- have a similar potential. 17 In this sense, the polytope model can be viewed as a powerful alternative to be applied to subproblems for which simple heuristics do not yield satisfactory solutions. We... |

1 |
Wolfgang Giloi, Sanjay Rajopadhye, and Lothar Thiele, editors. Tiling for optimal resource utilization
- Ferrante
- 1998
(Show Context)
Citation Context ...nd for nding these splits. 4 Related Work Our notion of index set splitting seems very similar to tiling [AI91]: both techniques partition the index sets. Tiling is still a very active research area [=-=FGRT98]-=-. However, the goal of tiling has been either to increase granularity (e.g., [ARY98]), or to block for cache optimization (e.g., [BKA97]), or simply to map virtual processors to real processors. In al... |

1 | On the parallelization of single dynamic conditional loops. Simulation Practice and Theory - Mahjoub, Jemni - 1996 |