Results 1  10
of
104
Some efficient solutions to the affine scheduling problem  Part I Onedimensional Time
, 1996
"... Programs and systems of recurrence equations may be represented as sets of actions which are to be executed subject to precedence constraints. In many cases, actions may be labelled by integral vectors in some iteration domain, and precedence constraints may be described by affine relations. A s ..."
Abstract

Cited by 228 (19 self)
 Add to MetaCart
Programs and systems of recurrence equations may be represented as sets of actions which are to be executed subject to precedence constraints. In many cases, actions may be labelled by integral vectors in some iteration domain, and precedence constraints may be described by affine relations. A schedule for such a program is a function which assigns an execution date to each action. Knowledge of such a schedule allows one to estimate the intrinsic degree of parallelism of the program and to compile a parallel version for multiprocessor architectures or systolic arrays. This paper deals with the problem of finding closed form schedules as affine or piecewise affine functions of the iteration vector. An efficient algorithm is presented which reduces the scheduling problem to a parametric linear program of small size, which can be readily solved by an efficient algorithm.
Code generation in the polyhedral model is easier than you think
 In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT’04
, 2004
"... Many advances in automatic parallelization and optimization have been achieved through the polyhedral model. It has been extensively shown that this computational model provides convenient abstractions to reason about and apply program transformations. Nevertheless, the complexity of code generation ..."
Abstract

Cited by 124 (18 self)
 Add to MetaCart
(Show Context)
Many advances in automatic parallelization and optimization have been achieved through the polyhedral model. It has been extensively shown that this computational model provides convenient abstractions to reason about and apply program transformations. Nevertheless, the complexity of code generation has long been a deterrent for using polyhedral representation in optimizing compilers. First, code generators have a hard time coping with generated code size and control overhead that may spoil theoretical benefits achieved by the transformations. Second, this step is usually time consuming, hampering the integration of the polyhedral framework in production compilers or feedbackdirected, iterative optimization schemes. Moreover, current code generation algorithms only cover a restrictive set of possible transformation functions. This paper discusses a general transformation framework able to deal with nonunimodular, noninvertible, nonintegral or even nonuniform functions. It presents several improvements to a stateoftheart code generation algorithm. Two directions are explored: generated code size and code generator efficiency. Experimental evidence proves the ability of the improved method to handle reallife problems. 1.
Nonlinear Array Dependence Analysis
, 1991
"... Standard array data dependence techniques can only reason about linear constraints. There has also been work on analyzing some dependences involving polynomial constraints. Analyzing array data dependences in realworld programs requires handling many "unanalyzable" terms: subscript arrays ..."
Abstract

Cited by 84 (6 self)
 Add to MetaCart
(Show Context)
Standard array data dependence techniques can only reason about linear constraints. There has also been work on analyzing some dependences involving polynomial constraints. Analyzing array data dependences in realworld programs requires handling many "unanalyzable" terms: subscript arrays, runtime tests, function calls. The standard approach to analyzing such programs has been to omit and ignore any constraints that cannot be reasoned about. This is unsound when reasoning about valuebased dependences and whether privatization is legal. Also, this prevents us from determining the conditions that must be true to disprove the dependence. These conditions could be checked by a runtime test or verified by a programmer or aggressive, demanddriven interprocedural analysis. We describe a solution to these problems. Our solution makes our system sound and more accurate for analyzing valuebased dependences and derives conditions that can be used to disprove dependences. We also give some p...
Fuzzy Array Dataflow Analysis
, 1997
"... This paper deals with more general control structures, such as ifs and while loops, together with unrestricted array subscripting. Notice that we assume that unstructrured programs are preprocessed, and that for instance gotos are first converted in whiles. However, with such unpredictable, dynamic ..."
Abstract

Cited by 75 (18 self)
 Add to MetaCart
This paper deals with more general control structures, such as ifs and while loops, together with unrestricted array subscripting. Notice that we assume that unstructrured programs are preprocessed, and that for instance gotos are first converted in whiles. However, with such unpredictable, dynamic control structures, no exact information can be hoped for in the general case. However, the aim of this paper is threefold. First, we aim at showing that even partial information can be automatically gathered thanks to Fuzzy Array Dataflow Analysis (FADA). This paper extends our previous work [4] on FADA to general, nonaffine array subscripts. The second purpose of this paper is to formalize and generalize these previous formalisms and to prove general results. Third, we will show that the precise, classical ADA is a special case of FADA.
Minimizing Register Requirements under ResourceConstrained RateOptimal Software Pipelining
, 1995
"... The rapid advances in highperformance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct a software pipelined loop schedule which runs ..."
Abstract

Cited by 75 (12 self)
 Add to MetaCart
The rapid advances in highperformance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct a software pipelined loop schedule which runs on the given architecture (with a fixed number of processor resources) at the maximum possible iteration rate (`a la rateoptimal) while minimizing the number of buffers  a close approximation to minimizing the number of registers. The main contributions of this paper are: ffl First, we demonstrate that such problem can be described by a simple mathematical formulation with precise optimization objectives under a periodic linear scheduling framework. The mathematical formulation provides a clear picture which permits one to visualize the overall solution space (for rateoptimal schedules) under different sets of constraints. ffl Secondly, we show that a precise mathematical formulation...
A practical automatic polyhedral parallelizer and locality optimizer
 In PLDI ’08: Proceedings of the ACM SIGPLAN 2008 conference on Programming language design and implementation
, 2008
"... We present the design and implementation of an automatic polyhedral sourcetosource transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytical mod ..."
Abstract

Cited by 72 (5 self)
 Add to MetaCart
(Show Context)
We present the design and implementation of an automatic polyhedral sourcetosource transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytical modeldriven automatic transformation in the polyhedral model.Unlike previous polyhedral frameworks, our approach is an endtoend fully automatic one driven by an integer linear optimization framework that takes an explicit view of finding good ways of tiling for parallelism and locality using affine transformations. The framework has been implemented into a tool to automatically generate OpenMP parallel code from C program sections. Experimental results from the tool show very high performance for local and parallel execution on multicores, when compared with stateoftheart compiler frameworks from the research community as well as the best native production compilers. The system also enables the easy use of powerful empirical/iterative optimization for general arbitrarily nested loop sequences.
SUIF Explorer: an interactive and interprocedural parallelizer
, 1999
"... The SUIF Explorer is an interactive parallelization tool that is more effective than previous systems in minimizing the number of lines of code that require programmer assistance. First, the interprocedural analyses in the SUIF system is successful in parallelizing many coarsegrain loops, thus mini ..."
Abstract

Cited by 66 (5 self)
 Add to MetaCart
The SUIF Explorer is an interactive parallelization tool that is more effective than previous systems in minimizing the number of lines of code that require programmer assistance. First, the interprocedural analyses in the SUIF system is successful in parallelizing many coarsegrain loops, thus minimizing the number of spurious dependences requiring attention. Second, the system uses dynamic execution analyzers to identify those important loops that are likely to be parallelizable. Third, the SUIF Explorer is the first to apply program slicing to aid programmers in interactive parallelization. The system guides the programmer in the parallelization process using a set of sophisticated visualization techniques. This paper demonstrates the effectiveness of the SUIF Explorer with three case studies. The programmer was able to speed up all three programs by examining only a small fraction of the program and privatizing a few variables. 1. Introduction Exploiting coarsegrain parallelism i...
Semiautomatic composition of loop transformations for deep parallelism and memory hierarchies
 Intl J. of Parallel Programming
, 2006
"... Modern compilers are responsible for translating the idealistic operational semantics of the source program into a form that makes efficient use of a highly complex heterogeneous machine. Since optimization problems are associated with huge and unstructured search spaces, this combinational task is ..."
Abstract

Cited by 56 (20 self)
 Add to MetaCart
(Show Context)
Modern compilers are responsible for translating the idealistic operational semantics of the source program into a form that makes efficient use of a highly complex heterogeneous machine. Since optimization problems are associated with huge and unstructured search spaces, this combinational task is poorly achieved in general, resulting in weak scalability and disappointing sustained performance. We address this challenge by working on the program representation itself, using a semiautomatic optimization approach to demonstrate that current compilers offen suffer from unnecessary constraints and intricacies that can be avoided in a semantically richer transformation framework. Technically, the purpose of this paper is threefold: (1) to show that syntactic code representations close to the operational semantics lead to rigid phase ordering and cumbersome expression of architectureaware loop transformations, (2) to illustrate how complex transformation sequences may be needed to achieve significant performance benefits, (3) to facilitate the automatic search for program transformation sequences, improving on classical polyhedral representations to better support operation research strategies in a simpler, structured search space. The proposed framework relies on a unified polyhedral representation of loops and statements, using normalization rules to allow flexible and expressive transformation sequencing. This representation allows to extend the scalability of polyhedral dependence analysis, and to delay the (automatic) legality checks until the end of a transformation sequence. Our work leverages on algorithmic advances in polyhedral code generation and has been implemented in a modern research compiler.
Toward Automatic Distribution
 Parallel Processing Letters
, 1994
"... This paper considers the problem of distributing data and code among the processors of a distributed memory supercomputer. Provided that the source program is amenable to detailed dataflow analysis, one may determine a placement function by an algorithm analogous to Gaussian elimination. Such a func ..."
Abstract

Cited by 53 (3 self)
 Add to MetaCart
(Show Context)
This paper considers the problem of distributing data and code among the processors of a distributed memory supercomputer. Provided that the source program is amenable to detailed dataflow analysis, one may determine a placement function by an algorithm analogous to Gaussian elimination. Such a function completely characterizes the distribution by giving the identity of the virtual processor on which each elementary calculation is executed. One has then to "realize" the virtual processors on the PE. The resulting structure satisfies the "owner compute" rule and is reminiscent of twolevel distribution schemes, like HPF's ALIGN and DISTRIBUTE directives, or the CM2 virtual processor system.
Automatic Parallelization in the Polytope Model
 Laboratoire PRiSM, Université des Versailles StQuentin en Yvelines, 45, avenue des ÉtatsUnis, F78035 Versailles Cedex
, 1996
"... . The aim of this paper is to explain the importance of polytope and polyhedra in automatic parallelization. We show that the semantics of parallel programs is best described geometrically, as properties of sets of integral points in ndimensional spaces, where n is related to the maximum nesting de ..."
Abstract

Cited by 52 (3 self)
 Add to MetaCart
(Show Context)
. The aim of this paper is to explain the importance of polytope and polyhedra in automatic parallelization. We show that the semantics of parallel programs is best described geometrically, as properties of sets of integral points in ndimensional spaces, where n is related to the maximum nesting depth of DO loops. The needed properties translate nicely to properties of polyhedra, for which many algorithms have been designed for the needs of optimization and operation research. We show how these ideas apply to scheduling, placement and parallel code generation. R'esum'e Le but de cet article est d'expliquer le role jou'e par les poly`edres et les polytopes en parall'elisation automatique. On montre que la s'emantique d'un programme parall`ele se d'ecrit naturellement sous forme g'eom'etrique, les propri'et'es du programme 'etant formalis'ees comme des propri'et'es d'ensemble de points dans un espace `a n dimensions. n est li'e `a la profondeur maximale d'imbrication des boucles DO. Les...