## Constructive Methods for Scheduling Uniform Loop Nests (1994)

Venue: | IEEE Transactions on Parallel and Distributed Systems |

Citations: | 62 - 3 self |

### BibTeX

@ARTICLE{Darte94constructivemethods,

author = {Alain Darte and Yves Robert},

title = {Constructive Methods for Scheduling Uniform Loop Nests},

journal = {IEEE Transactions on Parallel and Distributed Systems},

year = {1994},

volume = {5},

pages = {814--822}

}

### OpenURL

### Abstract

This paper surveys scheduling techniques for loop nests with uniform dependences. First we introduce the hyperplane method and related variants. Then we extend it by using a different affine scheduling for each statement within the nest. In both cases we present a new, constructive and efficient method to determine optimal solutions, i.e. schedules whose total execution time is minimum. 1 Introduction Loop nests lie in the heart of supercompilers-parallelizers for supercomputers. On one hand their importance in terms of applications is evident: in many scientific programs, the time spent in the execution of a small number of loops represents a large fraction of the total execution time, while the potential parallelism of these loops is very important. On the other hand, the regular and repetitive structure of loop nests greatly facilitates the use of dependence analysis techniques and of scheduling and allocation strategies. The general problem of finding the optimal scheduling for a ...

### Citations

1467 | Theory of linear and integer programming - Schrijver - 1986 |

441 |
Optimizing Supercompilers for Supercomputers
- Wolfe
- 1989
(Show Context)
Citation Context ...i ; : : : ; d ni ). Then, one selects �� = ( Q n j=2 (N j + 1); Q n j=3 (N j + 1); : : : ; 1) min(true distance(d i ); 1sism) as scheduling vector. Unimodular transformations Techniques summarized=-= in [20] aim at exchanging t-=-wo loops ("loop interchange") or at applying a skewing of a loop by another ("loop skewing"). These transformations are unimodular: the corresponding matrix is / 0 1 1 0 ! or / 1s0... |

311 | Compiling Fortran D for MIMD distributed-memory machines
- Hiranandani, Kennedy, et al.
- 1992
(Show Context)
Citation Context ...formed at compile-time and they should render the implicit parallelism hidden in the loop nest fully explicit. This is the key to implementing some optimizations before execution: these optimizations =-=[6]-=- aim at reducing the communication time (e.g. through message vectorization), or at decreasing the number of synchronizations (e.g. using structural information to eliminate some tests), or at diminis... |

215 |
The parallel execution of do loops
- Lamport
- 1974
(Show Context)
Citation Context ...m the nest into the following scheme: for time = 0 to time max do execute in parallel all p = (i 1 ; : : : ; i n ) such that b��:pc = time endfor This is nothing else than Lamport's hyperplane met=-=hod [7]. We-=- explain this method and related variants in great details, and we propose an efficient technique for determining an optimal scheduling vector ��, i.e. a scheduling vector for which the total exec... |

153 | Optimizing Synchronous Systems - Leiserson, Saxe - 1983 |

69 |
Regular iterative algorithms and their implementation on processors arrays
- Rao, Kailath
- 1988
(Show Context)
Citation Context ... cycle shrinking techniques described in [12, 13]. Technique 3: affine-by-statement scheduling. Node p of statement S i is scheduled at time b�� i p + c i c. This technique has been first proposed=-= in [16, 11, 15]-=- and is the most powerful. We have shown that determining the optimal solution can still be cast into a linear optimization problem (the price to pay being an increase in the number of unknowns). Exam... |

66 |
Time optimal linear schedules for algorithms with uniform dependencies
- Shang, Fortes
- 1991
(Show Context)
Citation Context ...schedule possible. Its total execution time is T free = 1 + max(oe free (p); p 2 Dom). Next we introduce linear schedules, which have been proposed for the execution of many practical algorithms (see =-=[7, 19] and the references -=-therein). Definition 4 A linear schedule is a mapping oe �� : Dom ! Z defined by oe �� (p) = b��:pc for p 2 Dom such that dependencies are preserved. The vector �� 2 Q 1\Thetan (��... |

51 |
Compiler optimizations for enhancing parallelism and their impact on architecture design
- Polychronopoulos
- 1988
(Show Context)
Citation Context ...icular when there are cycles in the dependence graph (as for Example 1). Many of these methods amount to looking at a particular class of linear scheduling vectors, Selective shrinking This technique =-=[13] con-=-sists in looking at the dependence matrix D = (d 1 ; : : : ; dm ) row by row. As soon as a row i is found whose m components d i;j are all nonzero, the vector �� = 1 min(d i;j ;1jm) e i is selecte... |

45 |
On synthesizing systolic arrays from recurrence equations with linear dependencies
- Rajopadhye, Purushothaman, et al.
- 1986
(Show Context)
Citation Context ... cycle shrinking techniques described in [12, 13]. Technique 3: affine-by-statement scheduling. Node p of statement S i is scheduled at time b�� i p + c i c. This technique has been first proposed=-= in [16, 11, 15]-=- and is the most powerful. We have shown that determining the optimal solution can still be cast into a linear optimization problem (the price to pay being an increase in the number of unknowns). Exam... |

39 |
Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors
- Peir, Cytron
- 1987
(Show Context)
Citation Context ...d by edges corresponding to the dependence vectors. The computation domain is represented by the integer points contained within a convex polyhedron fx; Axsbg. Here is a well-known example taken from =-=[12]-=-, which we will use throughout the text: Example 1: for i = 0 to N do for j = 0 to N do begin f Statement S 1 g a(i; j) = b(i; j \Gamma 6) + d(i \Gamma 1; j + 3) f Statement S 2 g b(i + 1; j \Gamma 1)... |

27 |
The systematic design of systolic arrays
- Quinton
- 1987
(Show Context)
Citation Context ...onsists in introducing a constant for each statement: the instance p of statementsS i is executed at time b��p+ c i c; we keep a unique scheduling vector but we align statements with one another (=-=see [14]-=-). Let us state this formally: let S 1 ; : : : ; S k be the k statements of a uniform loop nest, and let ffl D i be the matrix representing the self-dependences for the statement S i . ffl d i;j be th... |

26 |
Optimal code parallelization using unimodular transformations
- Dowling
- 1990
(Show Context)
Citation Context ... three important techniques for scheduling uniform loop nests: 1. Linear scheduling 2. Linear scheduling with constants 3. Affine-by-statement scheduling These three scheduling classes are well-known =-=[4, 7, 10, 11, 12, 13, 16, 17, 19]-=-. Their interest is twofold: 1. They fit well into a framework that encompasses many loop parallelization techniques. 2. They are very important from a practical point of view, as they preserve the re... |

26 |
On the Parallelism of Nested For-Loops Using Index Shift Method
- Liu, Ho, et al.
- 1990
(Show Context)
Citation Context ...i is executed at time b��:p+c i c. In this section, we study the impact of introducing such constants. Owing to this simple modification, we retrieve and improve the results of the Index Shift Met=-=hod [10, 17] for-=- cycle shrinking. 5.1 The Index Shift Method The Index Shift Method (ISM) [10]) is best explained using our target example. We have found that the optimal linear scheduling vector is �� = (7; 1) a... |

24 |
An introduction to a formal theory of dependence analysis
- Banerjee
(Show Context)
Citation Context ...nt. We restrict ourselves to the simplest case, that of uniform loop nests, which is already complex to compile efficiently. 2.1 Definition We choose as a program model the model proposed by Banerjee =-=[1]-=-, which is the following: for i 1 = l 1 to u 1 do for i 2 = l 2 (i 1 ) to u 2 (i 1 ) do for i 3 = l 3 (i 1 ; i 2 ) to u 3 (i 1 ; i 2 ) do . . . for i n = l n (i 1 ; i 2 ; : : : ; i n\Gamma1 ) to u n (... |

20 |
Task Scheduling Over Distributed Memory Machines
- Chretienne
- 1988
(Show Context)
Citation Context ...e of dependence analysis techniques and of scheduling and allocation strategies. The general problem of finding the optimal scheduling for a task system on a parallel machine is known to be difficult =-=[3]. However, in the ca-=-se of a uniform loop - "uniform" meaning "with a finite number of dependence vectors" until further explained -, it is possible to derive transformations that lead to a parallel sc... |

13 |
Résolution de systèmes d’inéquations linéaires; mode d’emploi du logiciel PIP
- Feautrier, Tawbi
- 1990
(Show Context)
Citation Context ...ed to the search on the domain fAxsbg which can be done without knowing the parameter N , thus at compile-time. For domains with many parameters, one can use a parametrized simplex algorithm like PIP =-=[5]-=-. Example 1: In our target example, the computation domain is a square given by: A = 0 B B B @ \Gamma1 0 1 0 0 \Gamma1 0 1 1 C C C A b = N \Theta b 0 = N \Theta 0 B B B @ 0 1 0 1 1 C C C A The depende... |

9 | Revisiting cycle shrinking
- Robert, Song
- 1994
(Show Context)
Citation Context ...i is executed at time b��:p+c i c. In this section, we study the impact of introducing such constants. Owing to this simple modification, we retrieve and improve the results of the Index Shift Met=-=hod [10, 17] for-=- cycle shrinking. 5.1 The Index Shift Method The Index Shift Method (ISM) [10]) is best explained using our target example. We have found that the optimal linear scheduling vector is �� = (7; 1) a... |

8 |
Sanjay Rajopadhye, and Yannick Saouter. Scheduling affine parameterized recurrences by means of variable dependent timing functions
- Mauras, Quinton
- 1990
(Show Context)
Citation Context ... cycle shrinking techniques described in [12, 13]. Technique 3: affine-by-statement scheduling. Node p of statement S i is scheduled at time b�� i p + c i c. This technique has been first proposed=-= in [16, 11, 15]-=- and is the most powerful. We have shown that determining the optimal solution can still be cast into a linear optimization problem (the price to pay being an increase in the number of unknowns). Exam... |

2 |
programming methods for minimizing execution time of indexed computations
- Linear
- 1990
(Show Context)
Citation Context ...]) propose a complex method to obtain the optimal scheduling vector of problem (), based upon a partition of the solutions space into subcones on which they solve a linear fractional problem. Lisper (=-=[9]-=-) proposes an approach in which all couples of extremal points of the domain are enumerated before solving a linear program for each of them. Here, we propose a more efficient method which consists in... |