Results 1  10
of
46
Tiling Multidimensional Iteration Spaces for Multicomputers
, 1992
"... This paper addresses the problem of compiling perfectly nested loops for multicomputers (distributed memory machines). The relatively high communication startup costs in these machines renders frequent communication very expensive. Motivated by this, we present a method of aggregating a number of lo ..."
Abstract

Cited by 109 (21 self)
 Add to MetaCart
This paper addresses the problem of compiling perfectly nested loops for multicomputers (distributed memory machines). The relatively high communication startup costs in these machines renders frequent communication very expensive. Motivated by this, we present a method of aggregating a number of loop iterations into tiles where the tiles execute atomically  a processor executing the iterations belonging to a tile receives all the data it needs before executing any one of the iterations in the tile, executes all the iterations in the tile and then sends the data needed by other processors. Since synchronization is not allowed during the execution of a tile, partitioning the iteration space into tiles must not result in deadlock. We first show the equivalence between the problem of finding partitions and the problem of determining the cone for a given set of dependence vectors. We then present an approach to partitioning the iteration space into deadlockfree tiles so that communicati...
Generation of Efficient Nested Loops from Polyhedra
 International Journal of Parallel Programming
, 2000
"... Automatic parallelization in the polyhedral model is based on affine transformations from an original computation domain (iteration space) to a target spacetime domain, often with a different transformation for each variable. Code generation is an often ignored step in this process that has a signi ..."
Abstract

Cited by 89 (5 self)
 Add to MetaCart
Automatic parallelization in the polyhedral model is based on affine transformations from an original computation domain (iteration space) to a target spacetime domain, often with a different transformation for each variable. Code generation is an often ignored step in this process that has a significant impact on the quality of the final code. It involves making a tradeoff between code size and control code simplification/optimization. Previous methods of doing code generation are based on loop splitting, however they have nonoptimal behavior when working on parameterized programs. We present a general parameterized method for code generation based on dual representation of polyhedra. Our algorithm uses a simple recursion on the dimensions of the domains, and enables fine control over the tradeoff between code size and control overhead.
A Framework for Unifying Reordering Transformations
, 1993
"... We present a framework for unifying iteration reordering transformations such as loop interchange, loop distribution, skewing, tiling, index set splitting and statement reordering. The framework is based on the idea that a transformation can be represented as a schedule that maps the original iterat ..."
Abstract

Cited by 76 (10 self)
 Add to MetaCart
We present a framework for unifying iteration reordering transformations such as loop interchange, loop distribution, skewing, tiling, index set splitting and statement reordering. The framework is based on the idea that a transformation can be represented as a schedule that maps the original iteration space to a new iteration space. The framework is designed to provide a uniform way to represent and reason about transformations. As part of the framework, we provide algorithms to assist in the building and use of schedules. In particular, we provide algorithms to test the legality of schedules, to align schedules and to generate optimized code for schedules. This work is supported by an NSF PYI grant CCR9157384 and by a Packard Fellowship. 1 Introduction Optimizing compilers reorder iterations of statements to improve instruction scheduling, register use, and cache utilization, and to expose parallelism. Many different reordering transformations have been developed and studied, su...
(Pen)ultimate tiling?
 Integration, the VLSI Journal
, 1996
"... In the framework of perfect loop nests with uniform dependences, tiling is a technique used to group elemental computation points so as to increase computation granularity and to reduce the overhead due to communication time. We review existing approaches from the literature, together with the optim ..."
Abstract

Cited by 53 (12 self)
 Add to MetaCart
(Show Context)
In the framework of perfect loop nests with uniform dependences, tiling is a technique used to group elemental computation points so as to increase computation granularity and to reduce the overhead due to communication time. We review existing approaches from the literature, together with the optimization criteria that are used for determining a "good" or "optimal" tiling. Then we explain the need to introduce yet another criterion for defining "optimal tiling" in a scalable environment. Although our criterion is more complex than previously used ones, we are able to prove a theorem on optimality, and to provide a constructive method for defining the "optimal tiling". 1 Introduction 1.1 Tiling: motivation Tiling is a technique used to group elemental computation points so as to increase computation granularity and thereby to reduce communication time. This technique is restricted to perfect loop nests with uniform dependences, which we define as in Banerjee [1] (see also the example...
Finding Legal Reordering Transformations using Mappings
 In Seventh International Workshop on Languages and Compilers for Parallel Computing
"... Traditionally, optimizing compilers attempt to improve the performance of programs by applying source to source transformations, such as loop interchange, loop skewing and loop distribution. Each of these transformations has its own special legality checks and transformation rules which make it ha ..."
Abstract

Cited by 33 (3 self)
 Add to MetaCart
Traditionally, optimizing compilers attempt to improve the performance of programs by applying source to source transformations, such as loop interchange, loop skewing and loop distribution. Each of these transformations has its own special legality checks and transformation rules which make it hard to analyze or predict the effects of compositions of these transformations. To overcome these problems we have developed a framework for unifying iteration reordering transformations. The framework is based on the idea that all reordering transformation can be represented as a mapping from the original iteration space to a new iteration space. The framework is designed to provide a uniform way to represent and reason about transformations. An optimizing compiler would use our framework by finding a mapping that both corresponds to a legal transformation and produces efficient code. We present the mapping selection problem as a search problem by decomposing it into a sequence of smal...
A Hyperplane Based Approach for Optimizing Spatial Locality in Loop Nests
 ICS 98
, 1998
"... This paper presents a data layout optimization technique based on the theory of hyperplanes from linear algebra. Given a program, our framework automatically determines the optimal layouts that can be expressed by hyperplanes for each array that is referenced. We discuss the cases where data transfo ..."
Abstract

Cited by 25 (9 self)
 Add to MetaCart
This paper presents a data layout optimization technique based on the theory of hyperplanes from linear algebra. Given a program, our framework automatically determines the optimal layouts that can be expressed by hyperplanes for each array that is referenced. We discuss the cases where data transformations are preferable to loop transformations and show that under specific conditions a loop nest can be optimized for perfect spatial locality by using data transformations. We divide the problem of optimizing data layout into two independent subproblems: (1) determining optimal layouts, and (2) determining data transformation matrices to implement optimal layouts. By postponing the determination of the transformation matrix to the last stage, our method can be adapted to compilers with different default layouts. Our results on eight programs on SGI Origin 2000 distributedsharedmemory multiprocessor show that the layout optimizations are effective in optimizing spatial locality.
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts
, 1999
"... This paper presents a data layout optimization technique for sequential and parallel programs based on the theory of hyperplanes from linear algebra. Given a program, our framework automatically determines suitable memory layouts that can be expressed by hyperplanes for each array that is referenced ..."
Abstract

Cited by 25 (5 self)
 Add to MetaCart
This paper presents a data layout optimization technique for sequential and parallel programs based on the theory of hyperplanes from linear algebra. Given a program, our framework automatically determines suitable memory layouts that can be expressed by hyperplanes for each array that is referenced. We discuss the cases where data transformations are preferable to loop transformations and show that under certain conditions a loop nest can be optimized for perfect spatial locality by using data transformations. We argue that data transformations can also optimize spatial locality for some arrays without distorting temporal/spatial locality exhibited by others. We divide the problem of optimizing data layout into two independent subproblems: 1) determining optimal static data layouts, and 2) determining data transformation matrices to implement the optimal layouts. By postponing the determination of the transformation matrix to the last stage, our method can be adapted to compilers with different default layouts. We then present an algorithm that considers optimizing parallelism and spatial locality simultaneously. Our results on eight programs on two distributed sharedmemory multiprocessors, the Convex Exemplar SPP2000 and the SGI Origin 2000, show that the layout optimizations are effective in optimizing spatial locality and parallelism.
Fast Address Sequence Generation for DataParallel Programs Using Integer Lattices
 IN LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, C.H. HUANG ET AL. (EDITORS), LECTURE NOTES IN COMPUTER SCIENCE
, 1996
"... In dataparallel languages such as High Performance Fortran and Fortran D, arrays are mapped to processors through a two step process involving alignment followed by distribution. A compiler that generates code for each processor has to compute the sequence of local memory addresses accessed by each ..."
Abstract

Cited by 20 (7 self)
 Add to MetaCart
In dataparallel languages such as High Performance Fortran and Fortran D, arrays are mapped to processors through a two step process involving alignment followed by distribution. A compiler that generates code for each processor has to compute the sequence of local memory addresses accessed by each processor and the sequence of sends and receives for a given processor to access nonlocal data. In this paper, we present a novel approach based on integer lattices. The set of elements referenced can be generated by integer linear combinations of basis vectors. Our linear algorithm determines the basis vectors as a function of the mapping. Using the basis vectors, we derive a loop nest that enumerates the addresses, which are points in the lattice generated by the basis vectors. Experimental results show that our approach is better than that of a recent linear time solution to this problem.