## Loop Parallelization in the Polytope Model (1993)

### Cached

### Download Links

- [ftp.uni-passau.de]
- [www.infosun.fim.uni-passau.de]
- [www.infosun.fim.uni-passau.de]
- [www.infosun.fim.uni-passau.de]
- DBLP

### Other Repositories/Bibliography

Venue: | CONCUR '93, Lecture Notes in Computer Science 715 |

Citations: | 100 - 26 self |

### BibTeX

@INPROCEEDINGS{Lengauer93loopparallelization,

author = {Christian Lengauer},

title = {Loop Parallelization in the Polytope Model},

booktitle = {CONCUR '93, Lecture Notes in Computer Science 715},

year = {1993},

pages = {398--416},

publisher = {Springer-Verlag}

}

### Years of Citing Articles

### OpenURL

### Abstract

. During the course of the last decade, a mathematical model for the parallelization of FOR-loops has become increasingly popular. In this model, a (perfect) nest of r FOR-loops is represented by a convex polytope in Z r . The boundaries of each loop specify the extent of the polytope in a distinct dimension. Various ways of slicing and segmenting the polytope yield a multitude of guaranteed correct mappings of the loops' operations in space-time. These transformations have a very intuitive interpretation and can be easily quantified and automated due to their mathematical foundation in linear programming and linear algebra. With the recent availability of massively parallel computers, the idea of loop parallelization is gaining significance, since it promises execution speed-ups of orders of magnitude. The polytope model for loop parallelization has its origin in systolic design, but it applies in more general settings and methods based on it will become a part of futur...

### Citations

457 |
Optimizing supercompilers for supercomputers
- Wolfe
- 1989
(Show Context)
Citation Context ...pace-time mapping the identity---an example of the trivial case in which the space-time mapping is simply a permutation of the source coordinates and the source loop nest can be parallelized directly =-=[56, 60]-=-. To make our illustration more interesting, we choose a different allocation: \Theta 1 1 . It is not processor-minimal. Figure 2 depicts the according segmentations of the index space. The schedule s... |

445 |
Supercompilers for Parallel and Vector Computers
- Zima
- 1991
(Show Context)
Citation Context ...pace-time mapping the identity---an example of the trivial case in which the space-time mapping is simply a permutation of the source coordinates and the source loop nest can be parallelized directly =-=[56, 60]-=-. To make our illustration more interesting, we choose a different allocation: \Theta 1 1 . It is not processor-minimal. Figure 2 depicts the according segmentations of the index space. The schedule s... |

395 |
Loop transformation theory and an algorithm to maximize parallelism
- Wolf, Lam
- 1991
(Show Context)
Citation Context ...polytope (AT \Gamma1 ; b) to an equivalent one, (A 0 ; b 0 ) whose defining inequations refer only to the indices of enclosing loops. Several algorithms for calculating A 0 and b 0 have been proposed =-=[1, 16, 44, 54]-=-; most of them are based on Fourier-Motzkin elimination [47], a technique by which one eliminates variables from the inequalities. Geometrically, this projects the polytope on the different axes of th... |

236 | Dataflow analysis of array and scalar references
- Feautrier
- 1991
(Show Context)
Citation Context ...ed-up. The concepts of the polytope model are more explicit in single-assignment programs than in imperative programs; imperative nested loop programs can be transformed to single-assignment programs =-=[5, 17]-=-. In a single-assignment format, algorithms that are now well understood in the polytope model can be described by a set of recurrence equations, each of the form: (8 x 2IS : v [f (x )] = F v (w [g(x ... |

229 |
The parallel execution of DO loops
- Lamport
- 1974
(Show Context)
Citation Context ...id in the Sixties by the seminal paper of Karp/Miller/Winograd on uniform recurrences [21]. In the Seventies, Lamport was the first to apply this approach to the question of parallelizing compilation =-=[24]-=-. However, only in the early Eighties, after the birth of the systolic array, was the idea picked up and developed further. Significant contributions were made by Kuhn [22], Moldovan [32], Cappello /S... |

210 | Scanning polyhedra with DO loops
- Ancourt, Irigoin
- 1991
(Show Context)
Citation Context ...polytope (AT \Gamma1 ; b) to an equivalent one, (A 0 ; b 0 ) whose defining inequations refer only to the indices of enclosing loops. Several algorithms for calculating A 0 and b 0 have been proposed =-=[1, 16, 44, 54]-=-; most of them are based on Fourier-Motzkin elimination [47], a technique by which one eliminates variables from the inequalities. Geometrically, this projects the polytope on the different axes of th... |

187 | Parametric integer programming
- Feautrier
- 1988
(Show Context)
Citation Context ...tensively [19]. PAF. This is the automatic FORTRAN parallelizer of Paul (A.) Feautrier [16]. The system converts nested DO-loops to single-assignment form [17] and uses parametric integer programming =-=[15]-=- to find a time-minimal, not necessarily affine, shared-memory parallel schedule. The source loops need not be perfectly nested. Presage. The novelty of Presage [53] was that it dealt with violations ... |

172 |
The Organization of Computations for Uniform Recurrence Equations
- Karp, Miller, et al.
- 1967
(Show Context)
Citation Context ... respect to the choices possible in the model. The basis for an automatic synthesis with the polytope model was laid in the Sixties by the seminal paper of Karp/Miller/Winograd on uniform recurrences =-=[21]-=-. In the Seventies, Lamport was the first to apply this approach to the question of parallelizing compilation [24]. However, only in the early Eighties, after the birth of the systolic array, was the ... |

137 |
Theory of Linear and
- Schrijver
- 1986
(Show Context)
Citation Context ...ning inequations refer only to the indices of enclosing loops. Several algorithms for calculating A 0 and b 0 have been proposed [1, 16, 44, 54]; most of them are based on Fourier-Motzkin elimination =-=[47]-=-, a technique by which one eliminates variables from the inequalities. Geometrically, this projects the polytope on the different axes of the target coordinate system to obtain the bounds in each dime... |

130 | A singular loop transformation framework based on non-singular matrices
- Li, Pingali
- 1994
(Show Context)
Citation Context ...in the target polytope. Holes that are "inside" the polytope (see Figure 6) can be bridged by non-unit loop steps but holes at the boundaries require further case analyses in the boundary co=-=mputation [2, 3, 30]-=-. The aim of parallelizing compilation is to avoid the run-time case analyses by making the respective decisions before run time. One idea that researchers are working on at present is to segment the ... |

109 |
The Data Alignment Phase in Compiling Programs for Distributed-Memory Machines
- Chen
- 1991
(Show Context)
Citation Context ...he number of processors in a dimension (partitioning, e.g., [6, 13, 33, 49]). A different reason for reducing the number of processors is to reduce the number of communications (data alignment, e.g., =-=[20, 29, 41, 45, 60]-=-). Other constraints recently considered are on the cache size and bus or communication bandwidth. More general communication patterns. Until recently, distributed computers offered local point-to-poi... |

89 | Compile-time techniques for data distribution in distributed memory machines
- Ramanujam, Sadayappan
- 1991
(Show Context)
Citation Context ...he number of processors in a dimension (partitioning, e.g., [6, 13, 33, 49]). A different reason for reducing the number of processors is to reduce the number of communications (data alignment, e.g., =-=[20, 29, 41, 45, 60]-=-). Other constraints recently considered are on the cache size and bus or communication bandwidth. More general communication patterns. Until recently, distributed computers offered local point-to-poi... |

81 |
Automatic synthesis of systolic arrays from uniform recurrent equations
- Quinton
- 1984
(Show Context)
Citation Context ... birth of the systolic array, was the idea picked up and developed further. Significant contributions were made by Kuhn [22], Moldovan [32], Cappello /Steiglitz [8], Miranker/Winkler [31] and Quinton =-=[36]-=-. The dissertation of Rao in 1985 unified these results in a theory for the automatic synthesis of all systolic arrays [42, 43]. The polytope model is increasingly being recognized as useful for paral... |

77 |
Regular iterative algorithms and their implementations on processor arrays
- Rao
- 1985
(Show Context)
Citation Context ...], Moldovan [32], Cappello /Steiglitz [8], Miranker/Winkler [31] and Quinton [36]. The dissertation of Rao in 1985 unified these results in a theory for the automatic synthesis of all systolic arrays =-=[42, 43]-=-. The polytope model is increasingly being recognized as useful for parallelizing loop programs for massively parallel architectures. Note that the parallelization methods based on this model are stat... |

69 | The mapping of linear recurrence equations on regular arrays
- Quinton, Dongen
- 1989
(Show Context)
Citation Context ...olytope are point-to-point (x ) and local (d ). 2. Affine recurrence equations. Here, the index functions are of the more general form A x +d , where A is a constant matrix and d is a constant vector =-=[38, 39]-=-. This permits data sharing (if A is singular, i.e., defines a non-injective mapping). 3. Piecewise linear/regular algorithms. Here, one permits the index domain to be not convex but partitioned into ... |

48 |
On the design of algorithms for VLSI systolic arrays
- Moldovan
- 1983
(Show Context)
Citation Context ... compilation [24]. However, only in the early Eighties, after the birth of the systolic array, was the idea picked up and developed further. Significant contributions were made by Kuhn [22], Moldovan =-=[32]-=-, Cappello /Steiglitz [8], Miranker/Winkler [31] and Quinton [36]. The dissertation of Rao in 1985 unified these results in a theory for the automatic synthesis of all systolic arrays [42, 43]. The po... |

47 |
On synthesizing systolic arrays from recurrence equations with linear dependencies
- Rajopadhye, Purushothaman, et al.
- 1986
(Show Context)
Citation Context ...olytope are point-to-point (x ) and local (d ). 2. Affine recurrence equations. Here, the index functions are of the more general form A x +d , where A is a constant matrix and d is a constant vector =-=[38, 39]-=-. This permits data sharing (if A is singular, i.e., defines a non-injective mapping). 3. Piecewise linear/regular algorithms. Here, one permits the index domain to be not convex but partitioned into ... |

46 |
Leiserson, "Algorithms for VLSI Processor Arrays
- Kung, E
- 1980
(Show Context)
Citation Context ...es an overview and future perspective of the polytope model and methods based on it. 1 Introduction Fifteen years ago, a first, restrictive form of massive parallelism received a name: systolic array =-=[23]-=-. The restrictions, motivated by the desire to simplify and parametrize the process of designing logic circuitry and borne out by the limitations of hardware technology at the time, included a regular... |

41 |
Communication-free hyperplane partitioning of nested loops
- Huang, Sadayappan
- 1991
(Show Context)
Citation Context ...he number of processors in a dimension (partitioning, e.g., [6, 13, 33, 49]). A different reason for reducing the number of processors is to reduce the number of communications (data alignment, e.g., =-=[20, 29, 41, 45, 60]-=-). Other constraints recently considered are on the cache size and bus or communication bandwidth. More general communication patterns. Until recently, distributed computers offered local point-to-poi... |

40 |
Crystal: Theory and Pragmatics of Generating Efficient Parallel Code
- Chen, Choo, et al.
- 1991
(Show Context)
Citation Context ...niversity started out in the mid-Eighties with a strong orientation towards the polytope model [10], but has since evolved away from it [11]. Now, Crystal is based on a more general equational theory =-=[9]-=-. This permits the treatment of more general source programs at the expense of automation. HiFi. HiFi is being developed at Delft University of Technology. It is an environment for real-world applicat... |

39 |
Systolic Algorithms and Architectures
- Quinton, Robert
- 1991
(Show Context)
Citation Context ...or in the middle is active). All elements in the figure, i.e., the location of the processors 12 and data and the propagation direction and speed of the data are derivable from the space-time mapping =-=[37]-=-. Note that we have to be judicious in the choice of the dependences that we introduce in a localization; it influences the amount of potential parallelism that is retained. E.g., when A is pipelined ... |

37 |
The ALPHA language and its use for the design of systolic arrays
- LEVERGE, MAURAS, et al.
- 1991
(Show Context)
Citation Context ... elements, the target polytope is only the convex hull of the target space: (AT \Gamma1 ; b) oe T (IS). Some parallelization methods based on the polytope model exclude non-unimodular transformations =-=[26, 44]-=-, but this is overly restrictive. There are ways of treating non-unimodularity (see Section 3.2). Consider our space-time mapping and its inverse: T = 1 0 1 1 T \Gamma1 = 1 0 \Gamma1 1 In the coordina... |

32 |
Synthesizing Linear Array Algorithms from Nested for Loop Algorithms
- Lee, Kedem
- 1988
(Show Context)
Citation Context ...6=a(x 0 )) This is the full-dimensional case, in which space takes up r\Gamma1 dimensions and time the remaining one dimension of the target polytope. One can also trade dimensions from space to time =-=[27]-=- and reduce multi-dimensional time to one dimension [42], ending up with fewer dimensions on the target than on the source side. A full-dimensional solution offers a maximum speed-up. Most paralleliza... |

30 |
Compiling parallel programs by optimizing performance
- Chen, Choo, et al.
- 1988
(Show Context)
Citation Context ...the design of Alpha [25]. Crystal. The Crystal project at Yale University started out in the mid-Eighties with a strong orientation towards the polytope model [10], but has since evolved away from it =-=[11]-=-. Now, Crystal is based on a more general equational theory [9]. This permits the treatment of more general source programs at the expense of automation. HiFi. HiFi is being developed at Delft Univers... |

28 | Regular partitioning for synthesizing fixed-size systolic arrays
- Darte
(Show Context)
Citation Context ... The first resource constraints studied were limitations of the dimensionality of the processor layout (projection, e.g., [27, 43, 57]) or the number of processors in a dimension (partitioning, e.g., =-=[6, 13, 33, 49]-=-). A different reason for reducing the number of processors is to reduce the number of communications (data alignment, e.g., [20, 29, 41, 45, 60]). Other constraints recently considered are on the cac... |

23 |
Partitioning of processor arrays: a piecewise regular approach
- Teich, Thiele
- 1993
(Show Context)
Citation Context ... The first resource constraints studied were limitations of the dimensionality of the processor layout (projection, e.g., [27, 43, 57]) or the number of processors in a dimension (partitioning, e.g., =-=[6, 13, 33, 49]-=-). A different reason for reducing the number of processors is to reduce the number of communications (data alignment, e.g., [20, 29, 41, 45, 60]). Other constraints recently considered are on the cac... |

19 |
Spacetime representations of computational structures
- Miranker, Winkler
- 1984
(Show Context)
Citation Context ...ghties, after the birth of the systolic array, was the idea picked up and developed further. Significant contributions were made by Kuhn [22], Moldovan [32], Cappello /Steiglitz [8], Miranker/Winkler =-=[31]-=- and Quinton [36]. The dissertation of Rao in 1985 unified these results in a theory for the automatic synthesis of all systolic arrays [42, 43]. The polytope model is increasingly being recognized as... |

18 |
A design methodology for synthesizing parallel algorithms and architectures
- Chen
- 1986
(Show Context)
Citation Context ...s have received special consideration in the design of Alpha [25]. Crystal. The Crystal project at Yale University started out in the mid-Eighties with a strong orientation towards the polytope model =-=[10]-=-, but has since evolved away from it [11]. Now, Crystal is based on a more general equational theory [9]. This permits the treatment of more general source programs at the expense of automation. HiFi.... |

17 |
Automatic generation of systolic programs from nested loops
- Ribas
- 1990
(Show Context)
Citation Context ... elements, the target polytope is only the convex hull of the target space: (AT \Gamma1 ; b) oe T (IS). Some parallelization methods based on the polytope model exclude non-unimodular transformations =-=[26, 44]-=-, but this is overly restrictive. There are ways of treating non-unimodularity (see Section 3.2). Consider our space-time mapping and its inverse: T = 1 0 1 1 T \Gamma1 = 1 0 \Gamma1 1 In the coordina... |

16 |
Unifying VLSI Array Design with Linear Transformations of Space-Time
- Cappello, Steiglitz
- 1984
(Show Context)
Citation Context ..., only in the early Eighties, after the birth of the systolic array, was the idea picked up and developed further. Significant contributions were made by Kuhn [22], Moldovan [32], Cappello /Steiglitz =-=[8]-=-, Miranker/Winkler [31] and Quinton [36]. The dissertation of Rao in 1985 unified these results in a theory for the automatic synthesis of all systolic arrays [42, 43]. The polytope model is increasin... |

16 |
Calculus of space-optimal mappings of systolic algorithms on processor arrays
- Clauss, Mongenet, et al.
- 1992
(Show Context)
Citation Context ...ta sharing (if A is singular, i.e., defines a non-injective mapping). 3. Piecewise linear/regular algorithms. Here, one permits the index domain to be not convex but partitioned into convex polytopes =-=[12, 46]-=-. This permits a sequence of perfect loop nests instead of just one perfect loop nest. 4. Linearly bounded lattices. Here, one does not embed into Z r but instead into an integral affine transformatio... |

13 |
Optimization and Interconnection Complexity for: Parallel Processors, Single-Stage Networks and Decision Trees
- Kuhn
- 1980
(Show Context)
Citation Context ...f parallelizing compilation [24]. However, only in the early Eighties, after the birth of the systolic array, was the idea picked up and developed further. Significant contributions were made by Kuhn =-=[22]-=-, Moldovan [32], Cappello /Steiglitz [8], Miranker/Winkler [31] and Quinton [36]. The dissertation of Rao in 1985 unified these results in a theory for the automatic synthesis of all systolic arrays [... |

12 |
HIFI: From Parallel Algorithm to Fixed-Size VLSI
- Held, Dewilde, et al.
- 1993
(Show Context)
Citation Context ...1]. The application domain for which the system lends most support at present covers models and methods for the design of regular architectures. In this domain, the polytope model is used extensively =-=[19]-=-. PAF. This is the automatic FORTRAN parallelizer of Paul (A.) Feautrier [16]. The system converts nested DO-loops to single-assignment form [17] and uses parametric integer programming [15] to find a... |

12 |
Reduction operators in alpha
- Verge
- 1992
(Show Context)
Citation Context ... the variable is updated, prevent inconsistencies. To achieve this, one can pipeline shared variable accesses, which results in additional, artificial dependences. This process is called localization =-=[25]-=----or, if all shared variables of the program are being pipelined, also uniformization, since then the resulting set of recurrence equations is uniform [38]. We uniformize the recurrence equations for... |

12 |
On the localization of algorithms for VLSI processor arrays
- Roychowdhury, Rao, et al.
- 1988
(Show Context)
Citation Context ...ta sharing (if A is singular, i.e., defines a non-injective mapping). 3. Piecewise linear/regular algorithms. Here, one permits the index domain to be not convex but partitioned into convex polytopes =-=[12, 46]-=-. This permits a sequence of perfect loop nests instead of just one perfect loop nest. 4. Linearly bounded lattices. Here, one does not embed into Z r but instead into an integral affine transformatio... |

11 |
Partitioning and Mapping Algorithms into Fixed-Size Systolic Arrays
- Moldovan, Fortes
- 1986
(Show Context)
Citation Context ... The first resource constraints studied were limitations of the dimensionality of the processor layout (projection, e.g., [27, 43, 57]) or the number of processors in a dimension (partitioning, e.g., =-=[6, 13, 33, 49]-=-). A different reason for reducing the number of processors is to reduce the number of communications (data alignment, e.g., [20, 29, 41, 45, 60]). Other constraints recently considered are on the cac... |

9 | Revisiting cycle shrinking
- Robert, Song
- 1994
(Show Context)
Citation Context |

8 |
Systenia.tic design of regular VLSI processor arra.ys,” P1i.D Dissertation, Delft lJni- versity of Teclrnology
- Bu
- 1990
(Show Context)
Citation Context ...ed-up. The concepts of the polytope model are more explicit in single-assignment programs than in imperative programs; imperative nested loop programs can be transformed to single-assignment programs =-=[5, 17]-=-. In a single-assignment format, algorithms that are now well understood in the polytope model can be described by a set of recurrence equations, each of the form: (8 x 2IS : v [f (x )] = F v (w [g(x ... |

8 |
Multiprocessor Synchronization for Concurrent Loops
- Wolfe
- 1988
(Show Context)
Citation Context ... and par for the temporal and spatial loop, respectively. Note that the processor layout must be determined at every time step---and so must the termination of all processors (barrier synchronization =-=[55]-=-): 8 seq t = 0 / 1 ! n par p = t / 1 ! t +n (t ; p \Gamma t) The loop statement specifies the points of the source polytope but in the coordinates of the target polytope. This correspondence is define... |

8 | The synthesis of control signals for one-dimensional systolic arrays
- Xue, Lengauer
- 1992
(Show Context)
Citation Context ...s of control signals that specify the operation. This way, the run-time tests reduce to tests for equality with a constant value. There are automatic methods for the synthesis of such control signals =-=[48, 58, 59]-=-. 14 3.3 Model Extensions It seems natural to include parallelization methods based on the polytope model into compilers for massively parallel computers like the Connection Machine, the Maspar or tra... |

7 | A processor-time-minimal systolic array for cubical mesh algorithms
- Cappello
- 1992
(Show Context)
Citation Context ...affinity. The free schedule is often piecewise affine and shorter than the minimal affine schedule. Piecewise affinity is also used in allocations to fold processor layouts for better processor usage =-=[7, 12]-=-. Localization. In a distributed-memory implementation, one might prefer to avoid multiple copies of a variable, e.g., to reduce storage requirements or, if the variable is updated, prevent inconsiste... |

7 |
Optimal systolic implementations of n-dimensionnal recurrences
- Wong, Delosme
- 1985
(Show Context)
Citation Context ...aints. There is also work on the imposition of realistic resource constraints. The first resource constraints studied were limitations of the dimensionality of the processor layout (projection, e.g., =-=[27, 43, 57]-=-) or the number of processors in a dimension (partitioning, e.g., [6, 13, 33, 49]). A different reason for reducing the number of processors is to reduce the number of communications (data alignment, ... |

6 |
Unimodularity and the parallelization of loops
- Barnett, Lengauer
- 1992
(Show Context)
Citation Context ...in the target polytope. Holes that are "inside" the polytope (see Figure 6) can be bridged by non-unit loop steps but holes at the boundaries require further case analyses in the boundary co=-=mputation [2, 3, 30]-=-. The aim of parallelizing compilation is to avoid the run-time case analyses by making the respective decisions before run time. One idea that researchers are working on at present is to segment the ... |

5 |
Control generation in the design of processor arrays
- Teich, Thiele
- 1991
(Show Context)
Citation Context ...s of control signals that specify the operation. This way, the run-time tests reduce to tests for equality with a constant value. There are automatic methods for the synthesis of such control signals =-=[48, 58, 59]-=-. 14 3.3 Model Extensions It seems natural to include parallelization methods based on the polytope model into compilers for massively parallel computers like the Connection Machine, the Maspar or tra... |

5 |
Specifying control signals for systolic arrays by uniform recurrence equations
- Xue
- 1992
(Show Context)
Citation Context ...s of control signals that specify the operation. This way, the run-time tests reduce to tests for equality with a constant value. There are automatic methods for the synthesis of such control signals =-=[48, 58, 59]-=-. 14 3.3 Model Extensions It seems natural to include parallelization methods based on the polytope model into compilers for massively parallel computers like the Connection Machine, the Maspar or tra... |

4 |
Processor clustering for the design of optimal fixed-size systolic arrays
- Bu, Deprettere
- 1992
(Show Context)
Citation Context |

4 |
Semantical analysis and mathematical programming
- Feautrier
- 1989
(Show Context)
Citation Context ...polytope (AT \Gamma1 ; b) to an equivalent one, (A 0 ; b 0 ) whose defining inequations refer only to the indices of enclosing loops. Several algorithms for calculating A 0 and b 0 have been proposed =-=[1, 16, 44, 54]-=-; most of them are based on Fourier-Motzkin elimination [47], a technique by which one eliminates variables from the inequalities. Geometrically, this projects the polytope on the different axes of th... |

4 |
Parallel assignment, reduction and communication
- Rajopadhye, Muddarangegowda
- 1993
(Show Context)
Citation Context ...tion of the source program in order to achieve an efficient mapping onto the architecture. New target language features are needed that can express a variety of communication patterns directly (e.g., =-=[40]-=-). The polytope model needs to be extended to accommodate methods for a compilation of these patterns. 4 Conclusions There are many different models for parallel computation: Petri nets, process algeb... |

4 |
CAD for signal processing architectures
- Thiele
- 1992
(Show Context)
Citation Context ...fect loop nests instead of just one perfect loop nest. 4. Linearly bounded lattices. Here, one does not embed into Z r but instead into an integral affine transformation of it that need not be convex =-=[50]-=-. The intersection of a polytope with an integer lattice is one special case. This enables the imposition of resource limitations, i.e., a partitioning of the parallel solution. 2 An Illustration We p... |

4 |
PRESAGE: A tool for the parallelization of nested loop programs
- Dongen, Petit
- 1990
(Show Context)
Citation Context ...ses parametric integer programming [15] to find a time-minimal, not necessarily affine, shared-memory parallel schedule. The source loops need not be perfectly nested. Presage. The novelty of Presage =-=[53]-=- was that it dealt with violations of affinity in the polytope model---in particular with quasi-affinity [52]. A quasi-affine mapping is obtained by taking an affine mapping and coercing rational to i... |