Results 1  10
of
27
Some efficient solutions to the affine scheduling problem  Part I Onedimensional Time
, 1996
"... Programs and systems of recurrence equations may be represented as sets of actions which are to be executed subject to precedence constraints. In many cases, actions may be labelled by integral vectors in some iteration domain, and precedence constraints may be described by affine relations. A s ..."
Abstract

Cited by 216 (18 self)
 Add to MetaCart
Programs and systems of recurrence equations may be represented as sets of actions which are to be executed subject to precedence constraints. In many cases, actions may be labelled by integral vectors in some iteration domain, and precedence constraints may be described by affine relations. A schedule for such a program is a function which assigns an execution date to each action. Knowledge of such a schedule allows one to estimate the intrinsic degree of parallelism of the program and to compile a parallel version for multiprocessor architectures or systolic arrays. This paper deals with the problem of finding closed form schedules as affine or piecewise affine functions of the iteration vector. An efficient algorithm is presented which reduces the scheduling problem to a parametric linear program of small size, which can be readily solved by an efficient algorithm.
Code generation in the polyhedral model is easier than you think
 In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT’04
, 2004
"... Many advances in automatic parallelization and optimization have been achieved through the polyhedral model. It has been extensively shown that this computational model provides convenient abstractions to reason about and apply program transformations. Nevertheless, the complexity of code generation ..."
Abstract

Cited by 109 (16 self)
 Add to MetaCart
Many advances in automatic parallelization and optimization have been achieved through the polyhedral model. It has been extensively shown that this computational model provides convenient abstractions to reason about and apply program transformations. Nevertheless, the complexity of code generation has long been a deterrent for using polyhedral representation in optimizing compilers. First, code generators have a hard time coping with generated code size and control overhead that may spoil theoretical benefits achieved by the transformations. Second, this step is usually time consuming, hampering the integration of the polyhedral framework in production compilers or feedbackdirected, iterative optimization schemes. Moreover, current code generation algorithms only cover a restrictive set of possible transformation functions. This paper discusses a general transformation framework able to deal with nonunimodular, noninvertible, nonintegral or even nonuniform functions. It presents several improvements to a stateoftheart code generation algorithm. Two directions are explored: generated code size and code generator efficiency. Experimental evidence proves the ability of the improved method to handle reallife problems. 1.
Generation of Efficient Nested Loops from Polyhedra
 International Journal of Parallel Programming
, 2000
"... Automatic parallelization in the polyhedral model is based on affine transformations from an original computation domain (iteration space) to a target spacetime domain, often with a different transformation for each variable. Code generation is an often ignored step in this process that has a signi ..."
Abstract

Cited by 72 (3 self)
 Add to MetaCart
Automatic parallelization in the polyhedral model is based on affine transformations from an original computation domain (iteration space) to a target spacetime domain, often with a different transformation for each variable. Code generation is an often ignored step in this process that has a significant impact on the quality of the final code. It involves making a tradeoff between code size and control code simplification/optimization. Previous methods of doing code generation are based on loop splitting, however they have nonoptimal behavior when working on parameterized programs. We present a general parameterized method for code generation based on dual representation of polyhedra. Our algorithm uses a simple recursion on the dimensions of the domains, and enables fine control over the tradeoff between code size and control overhead.
Semiautomatic composition of loop transformations for deep parallelism and memory hierarchies
 Intl J. of Parallel Programming
, 2006
"... Modern compilers are responsible for translating the idealistic operational semantics of the source program into a form that makes efficient use of a highly complex heterogeneous machine. Since optimization problems are associated with huge and unstructured search spaces, this combinational task is ..."
Abstract

Cited by 50 (18 self)
 Add to MetaCart
Modern compilers are responsible for translating the idealistic operational semantics of the source program into a form that makes efficient use of a highly complex heterogeneous machine. Since optimization problems are associated with huge and unstructured search spaces, this combinational task is poorly achieved in general, resulting in weak scalability and disappointing sustained performance. We address this challenge by working on the program representation itself, using a semiautomatic optimization approach to demonstrate that current compilers offen suffer from unnecessary constraints and intricacies that can be avoided in a semantically richer transformation framework. Technically, the purpose of this paper is threefold: (1) to show that syntactic code representations close to the operational semantics lead to rigid phase ordering and cumbersome expression of architectureaware loop transformations, (2) to illustrate how complex transformation sequences may be needed to achieve significant performance benefits, (3) to facilitate the automatic search for program transformation sequences, improving on classical polyhedral representations to better support operation research strategies in a simpler, structured search space. The proposed framework relies on a unified polyhedral representation of loops and statements, using normalization rules to allow flexible and expressive transformation sequencing. This representation allows to extend the scalability of polyhedral dependence analysis, and to delay the (automatic) legality checks until the end of a transformation sequence. Our work leverages on algorithmic advances in polyhedral code generation and has been implemented in a modern research compiler.
Polyhedral code generation in the real world
 In Proceedings of the International Conference on Compiler Construction (ETAPS CC’06), LNCS
, 2006
"... Abstract. The polyhedral model is known to be a powerful framework to reason about high level loop transformations. Recent developments in optimizing compilers broke some generally accepted ideas about the limitations of this model. First, thanks to advances in dependence analysis for irregular acce ..."
Abstract

Cited by 27 (11 self)
 Add to MetaCart
Abstract. The polyhedral model is known to be a powerful framework to reason about high level loop transformations. Recent developments in optimizing compilers broke some generally accepted ideas about the limitations of this model. First, thanks to advances in dependence analysis for irregular access patterns, its applicability which was supposed to be limited to very simple loop nests has been extended to wide code regions. Then, new algorithms made it possible to compute the target code for hundreds of statements while this code generation step was expected not to be scalable. Such theoretical advances and new software tools allowed actors from both academia and industry to study more complex and realistic cases. Unfortunately, despite strong optimization potential of a given transformation for e.g., parallelism or data locality, code generation may still be challenging or result in high control overhead. This paper presents scalable code generation methods that make possible the application of increasingly complex program transformations. By studying the transformations themselves, we show how it is possible to benefit from their properties to dramatically improve both code generation quality and space/time complexity, with respect to the best stateoftheart code generation tool. In addition, we build on these improvements to present a new algorithm improving generated code performance for strided domains and reindexed schedules. 1
Code Generation in the Polytope Model
 In IEEE PACT
, 1998
"... Automatic parallelization of nested loops, based on a mathematical model, the polytope model, has been improved significantly over the last decade: stateoftheart methods allow flexible distributions of computations in space and time, which lead to highquality parallelism. However, these methods h ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
Automatic parallelization of nested loops, based on a mathematical model, the polytope model, has been improved significantly over the last decade: stateoftheart methods allow flexible distributions of computations in space and time, which lead to highquality parallelism. However, these methods have not found their way into practical parallelizing compilers due to the lack of code generation schemes which are able to deal with the newfound flexibility. To close this gap is the purpose of this paper. 1. Introduction In recent years, methods for automatic parallelization of nested loops based on a mathematical model, the polytope model [9, 12], have been improved significantly. The focus has been on identifying good schedules, i.e., distributions of computations in time, e.g., [6, 8], and allocations, i.e., distributions of computations in space, e.g., [5, 14]. Thus, the spacetime mapping, i.e., the combination of schedule and allocation, derived by stateoftheart techniques oft...
Automatic Data and Computation Decomposition for Distributed Memory Machines
 In Proceedings of the 28th Annual Hawaii International Conference on System Sciences, Maui
, 1995
"... In this paper, we develop an automatic compiletime computation and data decomposition technique for distributed memory machines. Our method can handle complex programs containing perfect and nonperfect loop nests with or without loopcarried dependences. Applying our decomposition algorithms, a pro ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
In this paper, we develop an automatic compiletime computation and data decomposition technique for distributed memory machines. Our method can handle complex programs containing perfect and nonperfect loop nests with or without loopcarried dependences. Applying our decomposition algorithms, a program will be divided into collections (called clusters) of loop nests, such that data redistributions are allowed only between the clusters. Within each cluster of loop nests, decomposition and data locality constraints are formulated as a system of homogeneous linear equations which is solved by polynomial time algorithms. Our algorithm can selectively relax data locality constraints within a cluster to achieve a balance between parallelism and data locality. Such relaxations are guided by exploiting the hierarchical program nesting structures from outer to inner nesting levels to keep the communications at a outermost level possible. This work is central to the ongoing compiler developmen...
Compiling For Massively Parallel Architectures: A Perspective
, 1994
"... : The problem of automatically generating programs for massively parallel computers is a very complicated one, mainly because there are many architectures, each of them seeming to pose its own particular compilation problem. The purpose of this paper is to propose a framework in which to discuss the ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
: The problem of automatically generating programs for massively parallel computers is a very complicated one, mainly because there are many architectures, each of them seeming to pose its own particular compilation problem. The purpose of this paper is to propose a framework in which to discuss the compilation process, and to show that the features which affect it are few and generate a small number of combinations. The paper is oriented toward finegrained parallelization of static control programs, with emphasis on dataflow analysis, scheduling and placement. When going from there to more general programs and to coarser parallelism, one encounters new problems, some of which are discussed in the conclusion. KEYWORDS Massively Parallel Compilers, Automatic Parallelization. @ARTICLEFeau:95, AUTHOR = Paul Feautrier, TITLE =Compiling for Massively Parallel Architectures: a Perspective, JOURNAL = Microprogramming and Microprocessors, YEAR = 1995, NOTE = to appear 1 A FRAMEWO...
A characterization of onetoone modular mappings
 PARALLEL PROCESSING LETTERS
, 1996
"... In this paper, we deal with modular mappings as introduced by LeeandFortes [14, 13, 12], and we build upon their results. Our main contribution is a characterization of onetoone modular mappings that is valid even when the source domain and the target domain of the transformation have the same siz ..."
Abstract

Cited by 8 (7 self)
 Add to MetaCart
In this paper, we deal with modular mappings as introduced by LeeandFortes [14, 13, 12], and we build upon their results. Our main contribution is a characterization of onetoone modular mappings that is valid even when the source domain and the target domain of the transformation have the same size but not the same shape. This characterization is constructive, and a procedure to test the injectivity of a given transformation is presented.
Evaluation of Loop Grouping Methods Based on Orthogonal Projection Spaces
 In Proceedings of the International Conference on Parallel Processing
, 2000
"... This paper compares three similar loopgrouping methods. All methods are based on projecting the ndimensional iteration space J onto a kdimensional one, called the projected space, using (nk) linear independent vectors. The dimension k is selected differently in each method giving various resu ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
This paper compares three similar loopgrouping methods. All methods are based on projecting the ndimensional iteration space J onto a kdimensional one, called the projected space, using (nk) linear independent vectors. The dimension k is selected differently in each method giving various results. The projected space is divided into discrete groups of related iterations, which are assigned to different processors. Two of the methods preserve optimal time completion, by scheduling loop iterations according to the hyperplane method. The theoretical analysis of the experimental results indicates the appropriate method, for specific iteration spaces and target architectures.