Results 1  10
of
130
Unifying Data and Control Transformations for Distributed SharedMemory Machines
, 1994
"... We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Control transformations involve changing the execution order of programs. We have developed new techniques for compiler optimiz ..."
Abstract

Cited by 176 (10 self)
 Add to MetaCart
(Show Context)
We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Control transformations involve changing the execution order of programs. We have developed new techniques for compiler optimizations for distributed sharedmemory machines, although the same techniques can be used for sequential machines with a memory hierarchy. Our compiler optimizations are based on an algebraic representation of data mappings and a new data locality model. We present a pure data transformation algorithm and an algorithm unifying data and control transformations. While there has been much work on control transformations, the opportunities for data transformations have been largely neglected. In fact, data transformations have the advantage of being applicable to programs that cannot be optimized with control transformations. The unified algorithm, which performs data and control transformations s...
Code generation in the polyhedral model is easier than you think
 In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT’04
, 2004
"... Many advances in automatic parallelization and optimization have been achieved through the polyhedral model. It has been extensively shown that this computational model provides convenient abstractions to reason about and apply program transformations. Nevertheless, the complexity of code generation ..."
Abstract

Cited by 167 (16 self)
 Add to MetaCart
(Show Context)
Many advances in automatic parallelization and optimization have been achieved through the polyhedral model. It has been extensively shown that this computational model provides convenient abstractions to reason about and apply program transformations. Nevertheless, the complexity of code generation has long been a deterrent for using polyhedral representation in optimizing compilers. First, code generators have a hard time coping with generated code size and control overhead that may spoil theoretical benefits achieved by the transformations. Second, this step is usually time consuming, hampering the integration of the polyhedral framework in production compilers or feedbackdirected, iterative optimization schemes. Moreover, current code generation algorithms only cover a restrictive set of possible transformation functions. This paper discusses a general transformation framework able to deal with nonunimodular, noninvertible, nonintegral or even nonuniform functions. It presents several improvements to a stateoftheart code generation algorithm. Two directions are explored: generated code size and code generator efficiency. Experimental evidence proves the ability of the improved method to handle reallife problems. 1.
A practical automatic polyhedral parallelizer and locality optimizer
 In PLDI ’08: Proceedings of the ACM SIGPLAN 2008 conference on Programming language design and implementation
, 2008
"... We present the design and implementation of an automatic polyhedral sourcetosource transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytical mod ..."
Abstract

Cited by 122 (8 self)
 Add to MetaCart
(Show Context)
We present the design and implementation of an automatic polyhedral sourcetosource transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytical modeldriven automatic transformation in the polyhedral model.Unlike previous polyhedral frameworks, our approach is an endtoend fully automatic one driven by an integer linear optimization framework that takes an explicit view of finding good ways of tiling for parallelism and locality using affine transformations. The framework has been implemented into a tool to automatically generate OpenMP parallel code from C program sections. Experimental results from the tool show very high performance for local and parallel execution on multicores, when compared with stateoftheart compiler frameworks from the research community as well as the best native production compilers. The system also enables the easy use of powerful empirical/iterative optimization for general arbitrarily nested loop sequences.
Loop Parallelization in the Polytope Model
 CONCUR '93, Lecture Notes in Computer Science 715
, 1993
"... . During the course of the last decade, a mathematical model for the parallelization of FORloops has become increasingly popular. In this model, a (perfect) nest of r FORloops is represented by a convex polytope in Z r . The boundaries of each loop specify the extent of the polytope in a dis ..."
Abstract

Cited by 106 (25 self)
 Add to MetaCart
(Show Context)
. During the course of the last decade, a mathematical model for the parallelization of FORloops has become increasingly popular. In this model, a (perfect) nest of r FORloops is represented by a convex polytope in Z r . The boundaries of each loop specify the extent of the polytope in a distinct dimension. Various ways of slicing and segmenting the polytope yield a multitude of guaranteed correct mappings of the loops' operations in spacetime. These transformations have a very intuitive interpretation and can be easily quantified and automated due to their mathematical foundation in linear programming and linear algebra. With the recent availability of massively parallel computers, the idea of loop parallelization is gaining significance, since it promises execution speedups of orders of magnitude. The polytope model for loop parallelization has its origin in systolic design, but it applies in more general settings and methods based on it will become a part of futur...
Generation of Efficient Nested Loops from Polyhedra
 International Journal of Parallel Programming
, 2000
"... Automatic parallelization in the polyhedral model is based on affine transformations from an original computation domain (iteration space) to a target spacetime domain, often with a different transformation for each variable. Code generation is an often ignored step in this process that has a signi ..."
Abstract

Cited by 89 (5 self)
 Add to MetaCart
Automatic parallelization in the polyhedral model is based on affine transformations from an original computation domain (iteration space) to a target spacetime domain, often with a different transformation for each variable. Code generation is an often ignored step in this process that has a significant impact on the quality of the final code. It involves making a tradeoff between code size and control code simplification/optimization. Previous methods of doing code generation are based on loop splitting, however they have nonoptimal behavior when working on parameterized programs. We present a general parameterized method for code generation based on dual representation of polyhedra. Our algorithm uses a simple recursion on the dimensions of the domains, and enables fine control over the tradeoff between code size and control overhead.
Semiautomatic composition of loop transformations for deep parallelism and memory hierarchies.
 Int. J. Parallel Program.,
, 2006
"... Modern compilers are responsible for translating the idealistic operational semantics of the source program into a form that makes efficient use of a highly complex heterogeneous machine. Since optimization problems are associated with huge and unstructured search spaces, this combinational task is ..."
Abstract

Cited by 82 (24 self)
 Add to MetaCart
(Show Context)
Modern compilers are responsible for translating the idealistic operational semantics of the source program into a form that makes efficient use of a highly complex heterogeneous machine. Since optimization problems are associated with huge and unstructured search spaces, this combinational task is poorly achieved in general, resulting in weak scalability and disappointing sustained performance. We address this challenge by working on the program representation itself, using a semiautomatic optimization approach to demonstrate that current compilers offen suffer from unnecessary constraints and intricacies that can be avoided in a semantically richer transformation framework. Technically, the purpose of this paper is threefold: (1) to show that syntactic code representations close to the operational semantics lead to rigid phase ordering and cumbersome expression of architectureaware loop transformations, (2) to illustrate how complex transformation sequences may be needed to achieve significant performance benefits, (3) to facilitate the automatic search for program transformation sequences, improving on classical polyhedral representations to better support operation research strategies in a simpler, structured search space. The proposed framework relies on a unified polyhedral representation of loops and statements, using normalization rules to allow flexible and expressive transformation sequencing. This representation allows to extend the scalability of polyhedral dependence analysis, and to delay the (automatic) legality checks until the end of a transformation sequence. Our work leverages on algorithmic advances in polyhedral code generation and has been implemented in a modern research compiler.
A Framework for Unifying Reordering Transformations
, 1993
"... We present a framework for unifying iteration reordering transformations such as loop interchange, loop distribution, skewing, tiling, index set splitting and statement reordering. The framework is based on the idea that a transformation can be represented as a schedule that maps the original iterat ..."
Abstract

Cited by 76 (10 self)
 Add to MetaCart
(Show Context)
We present a framework for unifying iteration reordering transformations such as loop interchange, loop distribution, skewing, tiling, index set splitting and statement reordering. The framework is based on the idea that a transformation can be represented as a schedule that maps the original iteration space to a new iteration space. The framework is designed to provide a uniform way to represent and reason about transformations. As part of the framework, we provide algorithms to assist in the building and use of schedules. In particular, we provide algorithms to test the legality of schedules, to align schedules and to generate optimized code for schedules. This work is supported by an NSF PYI grant CCR9157384 and by a Packard Fellowship. 1 Introduction Optimizing compilers reorder iterations of statements to improve instruction scheduling, register use, and cache utilization, and to expose parallelism. Many different reordering transformations have been developed and studied, su...
Code Generation for Multiple Mappings
 IN FRONTIERS '95: THE 5TH SYMPOSIUM ON THE FRONTIERS OF MASSIVELY PARALLEL COMPUTATION
, 1994
"... There has been a great amount of recent work toward unifying iteration reordering transformations. Many of these approaches represent transformations as affine mappings from the original iteration space to a new iteration space. These approaches show a great deal of promise, but they all rely on the ..."
Abstract

Cited by 75 (2 self)
 Add to MetaCart
(Show Context)
There has been a great amount of recent work toward unifying iteration reordering transformations. Many of these approaches represent transformations as affine mappings from the original iteration space to a new iteration space. These approaches show a great deal of promise, but they all rely on the ability to generate code that iterates over the points in these new iteration spaces in the appropriate order. This problem has been fairly wellstudied in the case where all statements use the same mapping. We have developed an algorithm for the less wellstudied case where each statement uses a potentially different mapping. Unlike many other approaches, our algorithm can also generate code from mappings corresponding to loop blocking. We address the important tradeoff between reducing control overhead and duplicating code.
Access Normalization: Loop Restructuring for NUMA Computers
 ACM Transactions on Computer Systems
, 1993
"... : In scalable parallel machines, processors can make local memory accesses much faster than they can make remote memory accesses. In addition, when a number of remote accesses must be made, it is usually more efficient to use block transfers of data rather than to use many small messages. To run wel ..."
Abstract

Cited by 72 (21 self)
 Add to MetaCart
: In scalable parallel machines, processors can make local memory accesses much faster than they can make remote memory accesses. In addition, when a number of remote accesses must be made, it is usually more efficient to use block transfers of data rather than to use many small messages. To run well on such machines, software must exploit these features. We believe it is too onerous for a programmer to do this by hand, so we have been exploring the use of restructuring compiler technology for this purpose. In this paper, we start with a language like HPFFORTRAN with userspecified data distributionand develop a systematic loop transformation strategy called access normalization that restructures loop nests to exploit locality and block transfers. We demonstrate the power of our techniques using routines from the BLAS (Basic Linear Algebra Subprograms) library. An important feature of our approach is that we model loop transformations using invertible matrices and integer lattice theo...