Results 1  10
of
133
Some efficient solutions to the affine scheduling problem  Part I Onedimensional Time
, 1996
"... Programs and systems of recurrence equations may be represented as sets of actions which are to be executed subject to precedence constraints. In many cases, actions may be labelled by integral vectors in some iteration domain, and precedence constraints may be described by affine relations. A s ..."
Abstract

Cited by 216 (18 self)
 Add to MetaCart
Programs and systems of recurrence equations may be represented as sets of actions which are to be executed subject to precedence constraints. In many cases, actions may be labelled by integral vectors in some iteration domain, and precedence constraints may be described by affine relations. A schedule for such a program is a function which assigns an execution date to each action. Knowledge of such a schedule allows one to estimate the intrinsic degree of parallelism of the program and to compile a parallel version for multiprocessor architectures or systolic arrays. This paper deals with the problem of finding closed form schedules as affine or piecewise affine functions of the iteration vector. An efficient algorithm is presented which reduces the scheduling problem to a parametric linear program of small size, which can be readily solved by an efficient algorithm.
Loop Parallelization in the Polytope Model
 CONCUR '93, Lecture Notes in Computer Science 715
, 1993
"... . During the course of the last decade, a mathematical model for the parallelization of FORloops has become increasingly popular. In this model, a (perfect) nest of r FORloops is represented by a convex polytope in Z r . The boundaries of each loop specify the extent of the polytope in a dis ..."
Abstract

Cited by 96 (24 self)
 Add to MetaCart
. During the course of the last decade, a mathematical model for the parallelization of FORloops has become increasingly popular. In this model, a (perfect) nest of r FORloops is represented by a convex polytope in Z r . The boundaries of each loop specify the extent of the polytope in a distinct dimension. Various ways of slicing and segmenting the polytope yield a multitude of guaranteed correct mappings of the loops' operations in spacetime. These transformations have a very intuitive interpretation and can be easily quantified and automated due to their mathematical foundation in linear programming and linear algebra. With the recent availability of massively parallel computers, the idea of loop parallelization is gaining significance, since it promises execution speedups of orders of magnitude. The polytope model for loop parallelization has its origin in systolic design, but it applies in more general settings and methods based on it will become a part of futur...
Prefix Sums and Their Applications
"... Experienced algorithm designers rely heavily on a set of building blocks and on the tools needed to put the blocks together into an algorithm. The understanding of these basic blocks and tools is therefore critical to the understanding of algorithms. Many of the blocks and tools needed for parallel ..."
Abstract

Cited by 96 (2 self)
 Add to MetaCart
Experienced algorithm designers rely heavily on a set of building blocks and on the tools needed to put the blocks together into an algorithm. The understanding of these basic blocks and tools is therefore critical to the understanding of algorithms. Many of the blocks and tools needed for parallel
Beyond Induction Variables
, 1992
"... Induction variable detection is usually closely tied to the strength reduction optimization. This paper studies induction variable analysis from a different perspective, that of finding induction variables for data dependence analysis. While classical induction variable analysis techniques have been ..."
Abstract

Cited by 90 (6 self)
 Add to MetaCart
Induction variable detection is usually closely tied to the strength reduction optimization. This paper studies induction variable analysis from a different perspective, that of finding induction variables for data dependence analysis. While classical induction variable analysis techniques have been used successfully up to now, we have found a simple algorithm based on the the Static Single Assignment form of a program that finds all linear induction variables in a loop. Moreover, this algorithm is easily extended to find induction variables in multiple nested loops, to find nonlinear induction variables, and to classify other integer scalar assignments in loops, such as monotonic, periodic and wraparound variables. Some of these other variables are now classified using ad hoc pattern recognition, while others are not analyzed by current compilers. Giving a unified approach improves the speed of compilers and allows a more general classification scheme. We also show how to use these va...
Special Purpose Parallel Computing
 Lectures on Parallel Computation
, 1993
"... A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing ..."
Abstract

Cited by 77 (5 self)
 Add to MetaCart
A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing [365] demonstrated that, in principle, a single general purpose sequential machine could be designed which would be capable of efficiently performing any computation which could be performed by a special purpose sequential machine. The importance of this universality result for subsequent practical developments in computing cannot be overstated. It showed that, for a given computational problem, the additional efficiency advantages which could be gained by designing a special purpose sequential machine for that problem would not be great. Around 1944, von Neumann produced a proposal [66, 389] for a general purpose storedprogram sequential computer which captured the fundamental principles of...
Generation of Efficient Nested Loops from Polyhedra
 International Journal of Parallel Programming
, 2000
"... Automatic parallelization in the polyhedral model is based on affine transformations from an original computation domain (iteration space) to a target spacetime domain, often with a different transformation for each variable. Code generation is an often ignored step in this process that has a signi ..."
Abstract

Cited by 72 (3 self)
 Add to MetaCart
Automatic parallelization in the polyhedral model is based on affine transformations from an original computation domain (iteration space) to a target spacetime domain, often with a different transformation for each variable. Code generation is an often ignored step in this process that has a significant impact on the quality of the final code. It involves making a tradeoff between code size and control code simplification/optimization. Previous methods of doing code generation are based on loop splitting, however they have nonoptimal behavior when working on parameterized programs. We present a general parameterized method for code generation based on dual representation of polyhedra. Our algorithm uses a simple recursion on the dimensions of the domains, and enables fine control over the tradeoff between code size and control overhead.
The Mapping of Linear Recurrence Equations on Regular Arrays
 Journal of VLSI Signal Processing
, 1989
"... The parallelization of many algorithms can be obtained using spacetime transformations which are applied on nested doloops or on recurrence equations. In this paper, we analyze systems of linear recurrence equations, a generalization of uniform recurrence equations. The first part of the paper des ..."
Abstract

Cited by 66 (7 self)
 Add to MetaCart
The parallelization of many algorithms can be obtained using spacetime transformations which are applied on nested doloops or on recurrence equations. In this paper, we analyze systems of linear recurrence equations, a generalization of uniform recurrence equations. The first part of the paper describes a method for finding automatically whether such a system can be scheduled by an affine timing function, independent of the size parameter of the algorithm. In the second part, we describe a powerful method that makes it possible to transform linear recurrences into uniform recurrence equations. Both parts rely on results on integral convex polyhedra. Our results are illustrated on the Gauss elimination algorithm and on the GaussJordan diagonalization algorithm. 1 Introduction Designing efficient algorithms for parallel architectures is one of the main difficulties of the current research in computer science. As the architecture of supercomputers evolves towards massive parallelism...
Automatic Parallelization in the Polytope Model
 Laboratoire PRiSM, Université des Versailles StQuentin en Yvelines, 45, avenue des ÉtatsUnis, F78035 Versailles Cedex
, 1996
"... . The aim of this paper is to explain the importance of polytope and polyhedra in automatic parallelization. We show that the semantics of parallel programs is best described geometrically, as properties of sets of integral points in ndimensional spaces, where n is related to the maximum nesting de ..."
Abstract

Cited by 48 (3 self)
 Add to MetaCart
. The aim of this paper is to explain the importance of polytope and polyhedra in automatic parallelization. We show that the semantics of parallel programs is best described geometrically, as properties of sets of integral points in ndimensional spaces, where n is related to the maximum nesting depth of DO loops. The needed properties translate nicely to properties of polyhedra, for which many algorithms have been designed for the needs of optimization and operation research. We show how these ideas apply to scheduling, placement and parallel code generation. R'esum'e Le but de cet article est d'expliquer le role jou'e par les poly`edres et les polytopes en parall'elisation automatique. On montre que la s'emantique d'un programme parall`ele se d'ecrit naturellement sous forme g'eom'etrique, les propri'et'es du programme 'etant formalis'ees comme des propri'et'es d'ensemble de points dans un espace `a n dimensions. n est li'e `a la profondeur maximale d'imbrication des boucles DO. Les...
Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling
, 1996
"... Tiling is a technique used for exploiting mediumgrain parallelism in nested loops. It relies on a first step that detects sets of permutable nested loops. All algorithms developed so far consider the statements of the loop body as a single block, in other words, they are not able to take advantag ..."
Abstract

Cited by 37 (10 self)
 Add to MetaCart
Tiling is a technique used for exploiting mediumgrain parallelism in nested loops. It relies on a first step that detects sets of permutable nested loops. All algorithms developed so far consider the statements of the loop body as a single block, in other words, they are not able to take advantage of the structure of dependences between different statements. In this report, we overcome this limitation by showing how the structure of the reduced dependence graph can be taken into account for detecting more permutable loops. Our method combines graph retiming techniques and graph scheduling techniques. It can be viewed as an extension of Wolf and Lam's algorithm to the case of loops with multiple statements. Loop independent dependences play a particular role in our study, and we show how the way we handle them can be useful for finegrain loop parallelization as well.
Program optimization and parallelization using idioms
 In Proceedings of the Eighteenth Annual ACM Symposium on the Principles of Programming Languages
, 1991
"... Programs in languages such as Fortran, Pascal, and C were designed and written for a sequential machine model. During the last decade, several methods to vectorize such programs and recover other forms of parallelism that apply to more advanced machine architectures have been developed (particularly ..."
Abstract

Cited by 36 (0 self)
 Add to MetaCart
Programs in languages such as Fortran, Pascal, and C were designed and written for a sequential machine model. During the last decade, several methods to vectorize such programs and recover other forms of parallelism that apply to more advanced machine architectures have been developed (particularly for Fortran, due to its pointerfree semantics). We propose and demonstrate a more powerful translation technique for making such programs run efficiently on parallel machines which support facilities such as parallel prefix operations as well as parallel and vector capabilities. This technique, which is global in nature and involves a modification of the traditional definition of the program dependence graph (PDG), is based on the extraction of parallelizable program structures (“idioms”) from the given (sequential) program. The benefits of our technique extend beyond the abovementioned architectures and can be viewed as a general program optimization method, applicable in many other situations. We show a few examples in which our method indeed outperforms existing analysis techniques.