Results 1  10
of
16
Beyond Induction Variables
, 1992
"... Induction variable detection is usually closely tied to the strength reduction optimization. This paper studies induction variable analysis from a different perspective, that of finding induction variables for data dependence analysis. While classical induction variable analysis techniques have been ..."
Abstract

Cited by 90 (6 self)
 Add to MetaCart
Induction variable detection is usually closely tied to the strength reduction optimization. This paper studies induction variable analysis from a different perspective, that of finding induction variables for data dependence analysis. While classical induction variable analysis techniques have been used successfully up to now, we have found a simple algorithm based on the the Static Single Assignment form of a program that finds all linear induction variables in a loop. Moreover, this algorithm is easily extended to find induction variables in multiple nested loops, to find nonlinear induction variables, and to classify other integer scalar assignments in loops, such as monotonic, periodic and wraparound variables. Some of these other variables are now classified using ad hoc pattern recognition, while others are not analyzed by current compilers. Giving a unified approach improves the speed of compilers and allows a more general classification scheme. We also show how to use these va...
Data Parallel Haskell: a status report
, 2007
"... We describe the design and current status of our effort to implement the programming model of nested data parallelism into the Glasgow Haskell Compiler. We extended the original programmingmodel and its implementation, both of which were first popularised by the NESL language, in terms of expressiv ..."
Abstract

Cited by 78 (18 self)
 Add to MetaCart
We describe the design and current status of our effort to implement the programming model of nested data parallelism into the Glasgow Haskell Compiler. We extended the original programmingmodel and its implementation, both of which were first popularised by the NESL language, in terms of expressiveness as well as efficiency. Our current aim is to provide a convenient programming environment for SMP parallelism, and especially multicore architectures. Preliminary benchmarks show that we are, at least for some programs, able to achieve good absolute performance and excellent speedups.
NonSingular Data Transformations: Definition, Validity and Applications
 In Proc. 6th Workshop on Compilers for Parallel Computers
, 1997
"... This paper describes a unifying framework for nonsingular data transformations. It shows that a wide class of existing transformations may be expressed in this framework, allowing compound transformations to be performed in one step. Validity conditions for such transformations are developed as is ..."
Abstract

Cited by 48 (5 self)
 Add to MetaCart
This paper describes a unifying framework for nonsingular data transformations. It shows that a wide class of existing transformations may be expressed in this framework, allowing compound transformations to be performed in one step. Validity conditions for such transformations are developed as is the form of the transformed program and data. Constructive algorithms to generate data transformations for different applications are described and applied to example programs. It is shown that they can have a significant impact on program performance and may be used in situations where traditional loop transformations are inappropriate. 1 Introduction Recent years have seen a great improvement in loop transformation theory. By using an affine representation of loops, several loop transformations have been incorporated into one single framework [18]. In [2], Banerjee shows that loop interchange, reversal and skewing can be described as unimodular transformations of the iteration space. In ...
Data Alignment: Transformations to Reduce Communication on Distributed Memory Architectures
 In Proceedings of the Scalable High Performance Computing Conference, IEE
, 1992
"... The relative storage, or alignment, of array data in distributed memory critically determines the amount of communication overhead. This paper expresses data alignment in a linear algebraic framework. Aligned data can be viewed as forming a hyperplane in the iteration space. This allows the quantifi ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
The relative storage, or alignment, of array data in distributed memory critically determines the amount of communication overhead. This paper expresses data alignment in a linear algebraic framework. Aligned data can be viewed as forming a hyperplane in the iteration space. This allows the quantification of data alignment and the determination of the existence of transformations to reduce nonlocal access. This has led to a new alignment transformation which is applicable to a wider class of problems than existing techniques. The global impact of such transformations are discussed as is the effect of alignment on partitioning. 1 Introduction Compilation should minimise parallel time by utilising machine parallelism and reducing overhead. The first stage of compilation is therefore to identify and match program parallelism to machine parallelism. We define machine parallelism simply as the number of processors p. It is necessary to identify and divide a nested loop array computation ...
Linear Loop Transformations in Optimizing Compilers for Parallel Machines
, 1995
"... We present the linear loop transformation framework which is the formal basis for state of the art optimization techniques in restructuring compilers for parallel machines. The framework unifies most existing transformations and provides a systematic set of code generation techniques for arbitrary c ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
We present the linear loop transformation framework which is the formal basis for state of the art optimization techniques in restructuring compilers for parallel machines. The framework unifies most existing transformations and provides a systematic set of code generation techniques for arbitrary compound loop transformations. The algebraic representation of the loop structure and its transformation give way to quantitative techniques for optimizing performance on parallel machines. We discuss in detail the techniques for generating the transformed loop and deriving the desired linear transformation. Key Words: Dependence Analysis, Iteration Spaces, Parallelism, Locality, Load Balance, Conventional Loop Transformations, Linear Loop Transformations Corresponding author. y Parallel Systems Group, Department of Computer Science, 10 King's College Road, University of Toronto, Toronto, ON M5S 1A4, CANADA. Email: kulki@cs.toronto.edu Kulkarni and Stumm: Linear Loop Transformations 2 1...
Transformations for Imperfectly Nested Loops
 In Supercomputing
, 1996
"... Loop transformations are critical for compiling highperformance code for modern computers. Existing work has focused on transformations for perfectly nested loops (that is, loops in which all assignment statements are contained within the innermost loop of a loop nest). In practice, most loop nests ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Loop transformations are critical for compiling highperformance code for modern computers. Existing work has focused on transformations for perfectly nested loops (that is, loops in which all assignment statements are contained within the innermost loop of a loop nest). In practice, most loop nests, such as those in matrix factorization codes, are imperfectly nested. In some programs, imperfectly nested loops can be converted into perfectly nested loops by loop distribution, but this is not always legal. In this paper, we present an approach to transforming imperfectly nested loops directly. Our approach is an extension of the linear loop transformation framework for perfectly nested loops, and it models permutation, reversal, skewing, scaling, alignment, distribution and jamming. 1 Introduction Modern compilers perform a variety of loop transformations, like permutation, skewing, reversal, scaling, distribution and jamming, to generate high quality code for highperformance computer...
Automatic Generation Of DataFlow Analyzers: A Tool For Building Optimizers
, 1993
"... Modern compilers generate good code by performing global optimizations. Unlike other functions of the compiler such as parsing and code generation which examine only one statement or one basic block at a time, optimizers examine large parts of a program and coordinate changes in widely separated par ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Modern compilers generate good code by performing global optimizations. Unlike other functions of the compiler such as parsing and code generation which examine only one statement or one basic block at a time, optimizers examine large parts of a program and coordinate changes in widely separated parts of a program. Thus optimizers use more complex data structures and consume more time. To generate the best code, optimizers perform not one global transformation, but many in concert. These transformations can interact in unforeseen ways. This dissertation concerns the building of optimizers that are modular and extensible. It espouses an optimizer architecture, first proposed by Kildall, in which each phase is based on a dataflow analysis (DFA) of the program and on an optimization function that transforms the program. To support the architecture, a set of abstractionsflow values, flow functions, path simplification rules, action routinesis provided. A tool called Sharlit turns a DFA specification consisting of these abstractions into a solver for a DFA problem. At the heart of Sharlit is an algorithm called path simplification, an extension of Tarjan's fast path algorithm. Path simplification unifies several powerful DFA solution techniques. By using path simplification rules, compiler writers can construct a wide range of dataflow analyzers, from simple iterative ones, to solvers that use local analysis, interval analysis, or sparse dataflow evaluation. Sharlit frees compiler writers from the details of how these various solution techniques. The compiler writer can view the program representation as a simple flow graph in which each instruction is a node. Data structures to represent basic blocks and other regions are automatically generated. Sharlit promotes ...
Computational Alignment: A New, Unified Program Transformation for Local and Global Optimization
, 1994
"... Computational Alignment is a new class of program transformations suitable for both local and global optimization. Computational Alignment transforms all of the computations of a portion of the loop body in order to align them to other computations either in the same loop or in another loop. It exte ..."
Abstract

Cited by 7 (6 self)
 Add to MetaCart
Computational Alignment is a new class of program transformations suitable for both local and global optimization. Computational Alignment transforms all of the computations of a portion of the loop body in order to align them to other computations either in the same loop or in another loop. It extends along a new dimension and is significantly more powerful than linear transformations because i) it can transform subsets of dependences and references; ii) it is sensitive to the location of data in that it can move the computation relative to data; iii) it applies to imperfect loop nests; and iv) it is the first loop transformation that can change access vectors. Linear transformations are just a special case of Computational Alignment. Computational Alignment is highly suitable for global optimization because it can transform given loops to access data in similar ways. Two important subclasses of Computational Alignment are presented as well, namely, Freeing and Isomerizing Computatio...
Efficient PolynomialTime Nested Loop Fusion with Full Parallelism
, 1999
"... Data locality and synchronization overhead are two important factors that affect the performance of applications on multiprocessors. Loop fusion is an effective way for reducing synchronization and improving data locality. Traditional fusion techniques, however, either can not address the case when ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
Data locality and synchronization overhead are two important factors that affect the performance of applications on multiprocessors. Loop fusion is an effective way for reducing synchronization and improving data locality. Traditional fusion techniques, however, either can not address the case when fusionpreventing dependencies exist in nested loops, or can not achieve good parallelism after fusion. This paper presents a significant addition to the current loop fusion techniques by presenting several efficient polynomialtime algorithms to solve these problems. These algorithms, based on multidimensional retiming, allow nested loop fusion even in the presence of outmost loopcarried dependencies or fusionpreventing dependencies. The multiple loops are modeled by a multidimensional loop dependence graph. The algorithms are applied to such a graph in order to perform the fusion and to obtain full parallelism in the innermost loop.
A Data Partitioning Algorithm for Distributed Memory Compilation
, 1993
"... This paper proposes a compiler strategy for mapping FORTRAN programs onto distributed memory computers. Once the available parallelism has been identified, it has to be scheduled so as to minimise overhead. The minimisation of different costs will suggest different data and computation partitions. T ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
This paper proposes a compiler strategy for mapping FORTRAN programs onto distributed memory computers. Once the available parallelism has been identified, it has to be scheduled so as to minimise overhead. The minimisation of different costs will suggest different data and computation partitions. This is further complicated, as the effectiveness of the partition will depend on later compiler optimisations. For this reason, partitioning is at the crux point of compilation and has led several researchers to propose that it be left to the user. It is our contention that it is possible to automatically determine a good data partition and furthermore to make use of the analysis in later optimisation stages. This paper describes an automatic data partition algorithm which is based on four different analysis techniques. By determining the relative merit of each form of analysis, a data partitioning decision is made. By integrating this algorithm into an overall compilation strategy, effectiv...