Results 1 - 10
of
19
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution
- IN LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING
, 1994
"... Loop fusion is a program transformation that merges multiple loops into one. It is effective for reducing the synchronization overhead of parallel loops and for improving data locality. This paper presents three results for fusion: (1) a new algorithm for fusing a collection of parallel and seq ..."
Abstract
-
Cited by 110 (10 self)
- Add to MetaCart
Loop fusion is a program transformation that merges multiple loops into one. It is effective for reducing the synchronization overhead of parallel loops and for improving data locality. This paper presents three results for fusion: (1) a new algorithm for fusing a collection of parallel and sequential loops, minimizing parallel loop synchronization while maximizing parallelism; (2) a proof that performing fusion to maximize data locality is NP-hard; and (3) two polynomial-time algorithms for improving data locality. These techniques also apply to loop distribution, which is shown to be essentially equivalent to loop fusion. Our approach is general enough to support other fusion heuristics. Preliminary experimental results validate our approach for improving performance by exploiting data locality and increasing the granularity of parallelism.
Automatic Program Parallelization
, 1993
"... This paper presents an overview of automatic program parallelization techniques. It covers dependence analysis techniques, followed by a discussion of program transformations, including straight-line code parallelization, do loop transformations, and parallelization of recursive routines. The last s ..."
Abstract
-
Cited by 97 (8 self)
- Add to MetaCart
This paper presents an overview of automatic program parallelization techniques. It covers dependence analysis techniques, followed by a discussion of program transformations, including straight-line code parallelization, do loop transformations, and parallelization of recursive routines. The last section of the paper surveys several experimental studies on the effectiveness of parallelizing compilers.
Symbolic Analysis for Parallelizing Compilers
, 1994
"... Symbolic Domain The objects in our abstract symbolic domain are canonical symbolic expressions. A canonical symbolic expression is a lexicographically ordered sequence of symbolic terms. Each symbolic term is in turn a pair of an integer coefficient and a sequence of pairs of pointers to program va ..."
Abstract
-
Cited by 95 (4 self)
- Add to MetaCart
Symbolic Domain The objects in our abstract symbolic domain are canonical symbolic expressions. A canonical symbolic expression is a lexicographically ordered sequence of symbolic terms. Each symbolic term is in turn a pair of an integer coefficient and a sequence of pairs of pointers to program variables in the program symbol table and their exponents. The latter sequence is also lexicographically ordered. For example, the abstract value of the symbolic expression 2ij+3jk in an environment that i is bound to (1; (( " i ; 1))), j is bound to (1; (( " j ; 1))), and k is bound to (1; (( " k ; 1))) is ((2; (( " i ; 1); ( " j ; 1))); (3; (( " j ; 1); ( " k ; 1)))). In our framework, environment is the abstract analogous of state concept; an environment is a function from program variables to abstract symbolic values. Each environment e associates a canonical symbolic value e x for each variable x 2 V ; it is said that x is bound to e x. An environment might be represented by...
Reverse If-Conversion
- in Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation
, 1993
"... In this paper we present a set of isomorphic control transformations that allow the compiler to apply local scheduling techniques to acyclic subgraphs of the control flow graph. Thus, the code motion complexities of global scheduling are eliminated. This approach relies on a new technique, Reverse I ..."
Abstract
-
Cited by 61 (8 self)
- Add to MetaCart
In this paper we present a set of isomorphic control transformations that allow the compiler to apply local scheduling techniques to acyclic subgraphs of the control flow graph. Thus, the code motion complexities of global scheduling are eliminated. This approach relies on a new technique, Reverse If-Conversion (RIC), that transforms scheduled If-Converted code back to the control flow graph representation. This paper presents the predicate internal representation, the algorithms for RIC, and the correctness of RIC. In addition, the scheduling issues are addressed and an application to software pipelining is presented. 1 Introduction Compilers for processors with instruction level parallelism hardware need a large pool of operations to schedule from. In processors without support for conditional execution, branches present a scheduling barrier that limits the pool of operations to the basic block. Since basic blocks tend to have only a few operations, global scheduling techniques are ...
Interactive Parallel Programming Using the ParaScope Editor
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1991
"... The ParaScope project is developing an integrated collection of tools to help scientific programmers implement correct and efficient parallel programs. The centerpiece of this collection is the ParaScope Editor, an intelligent interactive editor for parallel Fortran programs. The ParaScope Editor re ..."
Abstract
-
Cited by 61 (12 self)
- Add to MetaCart
The ParaScope project is developing an integrated collection of tools to help scientific programmers implement correct and efficient parallel programs. The centerpiece of this collection is the ParaScope Editor, an intelligent interactive editor for parallel Fortran programs. The ParaScope Editor reveals to users potential hazards of a proposed parallelization in a program. It also provides a variety of powerful interactive program transformations that have been shown useful in converting programs to parallel form. In addition, the ParaScope Editor supports general user editing through a hybrid text and structure editing facility that incrementally analyzes the modified program for potential hazards. The ParaScope Editor is a new kind of program construction tool -- one that not only manages text, but also presents the user with information about the correctness of the parallel program under development. As such, it can support an exploratory programming style in which users get immediate feedback on their various strategies for parallelization.
Interprocedural Symbolic Analysis
, 1994
"... Compiling for efficient execution on advanced computer architectures requires extensive program analysis and transformation. Most compilers limit their analysis to simple phenomena within single procedures, limiting effective optimization of modular codes and making the programmer's job harder. We p ..."
Abstract
-
Cited by 48 (1 self)
- Add to MetaCart
Compiling for efficient execution on advanced computer architectures requires extensive program analysis and transformation. Most compilers limit their analysis to simple phenomena within single procedures, limiting effective optimization of modular codes and making the programmer's job harder. We present methods for analyzing array side effects and for comparing nonconstant values computed in the same and different procedures. Regular sections, described by rectangular bounds and stride, prove as effective in describing array side effects in Linpack as more complicated summary techniques. On a set of six programs, regular section analysis of array side effects gives 0 to 39 percent reductions in array dependences at call sites, with 10 to 25 percent increases in analysis time. Symbolic analysis is essential to data dependence testing, array section analysis, and other high-level program manipulations. We give methods for building symb...
Automatic and Interactive Parallelization
, 1994
"... The goal of this dissertation is to give programmers the ability to achieve high performance by focusing on developing parallel algorithms, rather than on architecturespecific details. The advantages of this approach also include program portability and legibility. To achieve high performance, we pr ..."
Abstract
-
Cited by 38 (8 self)
- Add to MetaCart
The goal of this dissertation is to give programmers the ability to achieve high performance by focusing on developing parallel algorithms, rather than on architecturespecific details. The advantages of this approach also include program portability and legibility. To achieve high performance, we provide automatic compilation techniques that tailor parallel algorithms to shared-memory multiprocessors with local caches and a common bus. In particular, the compiler maps complete applications onto the specifics of a machine, exploiting both parallelism and memory. To optimize complete applications, we develop novel, general algorithms to transform loops that contain arbitrary conditional control flow. In addition, we provide new interprocedural transformations which enable optimization across procedure boundaries. These techniques provide the basis for a robust automatic parallelizing algorithm that is applicable to complete programs. The algorithm for automatic parallel code generation t...
Loop Distribution with Arbitrary Control Flow
- In Proceedings of Supercomputing '90
, 1990
"... Loop distribution is an integral part of transforming a sequential program into a parallel one. It is used extensively in parallelization, vectorization, and memory management. For loops with control flow, previous methods for loop distribution have significant drawbacks. We present a new algorithm ..."
Abstract
-
Cited by 35 (5 self)
- Add to MetaCart
Loop distribution is an integral part of transforming a sequential program into a parallel one. It is used extensively in parallelization, vectorization, and memory management. For loops with control flow, previous methods for loop distribution have significant drawbacks. We present a new algorithm for loop distribution in the presence of control flow modeled by a control dependence graph. This algorithm is shown optimal in that it generates the minimum number of new arrays and tests possible. We also present a code generation algorithm that produces code for the resulting program without replicating statements or conditions. Although these algorithms are being developed for use in an interactive parallel programming environment for Fortran, they are very general and can be used in automatic parallelization and vectorization systems. Keywords: parallelization, vectorization, transformation, control dependence, data dependence, loop distribution 1 Introduction Loop distribution is a f...
Construction of Thinned Gated Single-Assignment Form
- In Proc. 6rd Workshop on Programming Languages and Compilers for Parallel Computing
, 1993
"... . Analysis of symbolic expressions benefits from a suitable program representation. We show how to build thinned gated singleassignment (TGSA) form, a value-oriented program representation which is more complete than standard SSA form, defined on all reducible programs, and better for representing s ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
. Analysis of symbolic expressions benefits from a suitable program representation. We show how to build thinned gated singleassignment (TGSA) form, a value-oriented program representation which is more complete than standard SSA form, defined on all reducible programs, and better for representing symbolic expressions than program dependence graphs or original GSA form. We present practical algorithms for constructing thinned GSA form from the control dependence graph and SSA form. Extensive experiments on large Fortran programs show these methods to take linear time and space in practice. Our implementation of value numbering on TGSA form drives scalar symbolic analysis in the ParaScope programming environment. 1 Introduction Analysis of non-constant values yields significant benefits in a parallelizing compiler. The design of a symbolic analyzer requires careful choice of a program representation, that these benefits may be gained without using excessive time and space. In symbolic ...
Modulo Scheduling With Isomorphic Control Transformations
, 1994
"... ... over other software pipelining techniques based on global scheduling. The ICTs are applied to Modulo Scheduling to schedule loops with conditional branches. Experimental results show that this approach allows more flexible scheduling and thus better performance than Modulo Scheduling with Hierar ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
... over other software pipelining techniques based on global scheduling. The ICTs are applied to Modulo Scheduling to schedule loops with conditional branches. Experimental results show that this approach allows more flexible scheduling and thus better performance than Modulo Scheduling with Hierarchical Reduction. Modulo Scheduling with ICTs targets processors with no or limited support for conditional execution such as superscalar processors. However, in processors that do not require instruction set compatibility, support for Predicated Execution can be used. This dissertation shows that Modulo Scheduling with Predicated Execution has better performance and lower code expansion than Modulo Scheduling with ICTs on processors without special hardware support.

