Results 1 - 10
of
19
Symbolic Analysis for Parallelizing Compilers
, 1994
"... Symbolic Domain The objects in our abstract symbolic domain are canonical symbolic expressions. A canonical symbolic expression is a lexicographically ordered sequence of symbolic terms. Each symbolic term is in turn a pair of an integer coefficient and a sequence of pairs of pointers to program va ..."
Abstract
-
Cited by 95 (4 self)
- Add to MetaCart
Symbolic Domain The objects in our abstract symbolic domain are canonical symbolic expressions. A canonical symbolic expression is a lexicographically ordered sequence of symbolic terms. Each symbolic term is in turn a pair of an integer coefficient and a sequence of pairs of pointers to program variables in the program symbol table and their exponents. The latter sequence is also lexicographically ordered. For example, the abstract value of the symbolic expression 2ij+3jk in an environment that i is bound to (1; (( " i ; 1))), j is bound to (1; (( " j ; 1))), and k is bound to (1; (( " k ; 1))) is ((2; (( " i ; 1); ( " j ; 1))); (3; (( " j ; 1); ( " k ; 1)))). In our framework, environment is the abstract analogous of state concept; an environment is a function from program variables to abstract symbolic values. Each environment e associates a canonical symbolic value e x for each variable x 2 V ; it is said that x is bound to e x. An environment might be represented by...
Symbolic Analysis: A Basis for Parallelization, Optimization, and Scheduling of Programs
- In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing
, 1993
"... This paper presents an abstract interpretation framework for parallelizing compilers. Within this framework, symbolic analysis is used to solve various flow analysis problems in a unified way. Symbolic analysis also serves as a basis for code generation optimizations and a tool for derivation of com ..."
Abstract
-
Cited by 35 (0 self)
- Add to MetaCart
This paper presents an abstract interpretation framework for parallelizing compilers. Within this framework, symbolic analysis is used to solve various flow analysis problems in a unified way. Symbolic analysis also serves as a basis for code generation optimizations and a tool for derivation of computation cost estimates. A loop scheduling strategy that utilizes symbolic timing information is also presented. 1 Introduction Empirical results indicate that existing parallelizing compilers cause insignificant improvements on the performance of many real application programs [9, 5]. The speedups obtained by manual transformation of these applications [9] show the potential for significantly advancing parallelizing compiler technology. The poor performance of current restructuring compilers can be attributed to two causes: imprecise analysis and inappropriate performance-wise transformations. The causes are not completely independent; namely, imprecise information results in inappropriate...
Symbolic Program Analysis and Optimization for Parallelizing Compilers
- Presented at the 5th Annual Workshop on Languages and Compilers for Parallel Computing
, 1992
"... A program flow analysis framework is proposed for parallelizing compilers. Within this framework, symbolic analysis is used as an abstract interpretation technique to solve many of the flow analysis problems in a unified way. Some of these problems are constant propagation, global forward substituti ..."
Abstract
-
Cited by 34 (3 self)
- Add to MetaCart
A program flow analysis framework is proposed for parallelizing compilers. Within this framework, symbolic analysis is used as an abstract interpretation technique to solve many of the flow analysis problems in a unified way. Some of these problems are constant propagation, global forward substitution, detection of loop invariant computations, and induction variable substitution. The solution space of the above problems is much larger than that handled by existing compiler technology. It covers many of the cases in benchmark codes that other parallelizing compilers can not handle. Employing finite difference methods, the symbolic analyzer derives a functional representation of programs, which is used in dependence analysis. A systematic method for generalized strength reduction based on this representation is also presented. This results in an effective scheme for exploitation of parallelism and optimization of the code. Symbolic analysis also serves as a basis for other code generatio...
A Framework for Exploiting Task- and Data-Parallelism on Distributed Memory Multicomputers
- IEEE Transactions on Parallel and Distributed Systems
, 1997
"... offer significant advantages over shared memory multiprocessors in terms of cost and scalability. Unfortunately, the utilization of all the available computational power in these machines involves a tremendous programming effort on the part of users, which creates a need for sophisticated compiler a ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
offer significant advantages over shared memory multiprocessors in terms of cost and scalability. Unfortunately, the utilization of all the available computational power in these machines involves a tremendous programming effort on the part of users, which creates a need for sophisticated compiler and run-time support for distributed memory machines. In this paper, we explore a new compiler optimization for regular scientific applications–the simultaneous exploitation of task and data parallelism. Our optimization is implemented as part of the PARADIGM HPF compiler framework we have developed. The intuitive idea behind the optimization is the use of task parallelism to control the degree of data parallelism of individual tasks. The reason this provides increased performance is that data parallelism provides diminishing returns as the number of processors used is increased. By controlling the number of processors used for each data parallel task in an application and by concurrently executing these tasks, we make program execution more efficient and, therefore, faster. A practical implementation of a task and data parallel scheme of execution for an application on a distributed memory multicomputer also involves data redistribution. This data redistribution causes an overhead. However, as our experimental results show, this overhead is not a problem; execution of a program using task and data parallelism together can be significantly faster than its execution using data parallelism alone. This makes our proposed optimization practical and extremely useful.
Hardware And Software For Functional And Fine Grain Parallelism
, 1993
"... This thesis examines nonloop parallelism at both fine and coarse levels of granularity in numerical FORTRAN programs. Measurements of the extent of this functional parallelism in a number of FORTRAN codes are presented, as well as compiler and run-time algorithms designed to exploit it. Hardware and ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
This thesis examines nonloop parallelism at both fine and coarse levels of granularity in numerical FORTRAN programs. Measurements of the extent of this functional parallelism in a number of FORTRAN codes are presented, as well as compiler and run-time algorithms designed to exploit it. Hardware and software embodiments of the dynamic scheduling algorithms are developed, along with the compiler optimizations necessary to make these practical. The impact of fine grain functional parallelism on instruction-level archictecture is explored, and it is shown that dynamic instruction scheduling hardware based on the functional parallelism scheduling algorithms can yield a significant improvement over static scheduling on conventional RISC processors when the latency of memory accesses is highly variable. Measurements of the characteristics of a set of FORTRAN benchmark programs indicates that such a hardware realization is feasible in practice. iii TABLE OF CONTENTS CHAPTER PAGE 1 INTRODUC...
Microarchitecture Support for Dynamic Scheduling of Acyclic Task Graphs
- In 25th Annual International Symposium on Microarchitecture
, 1992
"... It can be shown that any program can be broken into its loop structure, plus acyclic dependence graphs representing the body of each loop or subroutine. The parallelism inherent in these acyclic graphs augments the loop-level parallelism available in the program. This paper presents two algorithms f ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
It can be shown that any program can be broken into its loop structure, plus acyclic dependence graphs representing the body of each loop or subroutine. The parallelism inherent in these acyclic graphs augments the loop-level parallelism available in the program. This paper presents two algorithms for dynamic scheduling of such acyclic task graphs containing both data and control dependences, and describes a microarchitecture which implements these algorithms efficiently. Keywords-- Functional parallelism, fine-grain parallelism, microarchitecture, dynamic scheduling, parallelizing compiler. ############################# 1 This work was funded in part by NSF grant CCR 89-57310 PYI, DOE grant DE-FG0285ER25001, and a Shell Doctoral Fellowship (Carl Beckmann). - 2 - 1. Introduction Traditional approaches to parallel processing have focused largely on loop-level parallelism. Another source of parallelism in programs is non-loop, or functional, parallelism [Girk91]. While the amount o...
Theory, Techniques, And Experiments In Solving Recurrences In Computer Programs
, 1997
"... ... work. In the sixth chapter, we consider the application of these same techniques focused on obtaining parallelism in outer time-stepping loops. In the final chapter, we draw this work to a conclusion and discuss future directions in parallelizing compiler technology. ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
... work. In the sixth chapter, we consider the application of these same techniques focused on obtaining parallelism in outer time-stepping loops. In the final chapter, we draw this work to a conclusion and discuss future directions in parallelizing compiler technology.
Achieving Multi-level Parallelization
, 1997
"... . Many modern machine architectures feature parallel processing at both the fine-grain and coarse-grain level. In order to efficiently utilize these multiple levels, a parallelizing compiler must orchestrate the interactions of fine-grain and coarse-grain transformations. The goal of the PROMIS comp ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
. Many modern machine architectures feature parallel processing at both the fine-grain and coarse-grain level. In order to efficiently utilize these multiple levels, a parallelizing compiler must orchestrate the interactions of fine-grain and coarse-grain transformations. The goal of the PROMIS compiler project is to develop a multi-source, multitarget parallelizing compiler in which the front-end and back-end are integrated via a single unified intermediate representation. In this paper, we examine the appropriateness of the Hierarchical Task Graph as that representation. 1 Introduction The design of the internal representation (IR) of a parallelizing compiler is driven, in a large part, by the compiler's target granularity. For example, a compiler which uses source language transformations, such as converting sequential loops to DOALL loops, will need to store information about source level statements and expressions, as well as information about control flow structures. If transfor...
The PROMIS Compiler Prototype
, 1997
"... Source code parallelizers and instruction level parallelizers each have specific advantages. Usually, a compiler is designed to be one or the other based on the target architecture and/or algorithms. A compiler that is designed to generate near-optimal code for modern, multi-level machines must have ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Source code parallelizers and instruction level parallelizers each have specific advantages. Usually, a compiler is designed to be one or the other based on the target architecture and/or algorithms. A compiler that is designed to generate near-optimal code for modern, multi-level machines must have the capabilities of both. This paper describes the prototype of the PROMIS compiler. The prototype was designed to show that loop level and instruction level parallelization can be combined to produce results better than either one alone. In addition, it shows how communication between the levels can produce additional speedup. 1 Introduction Parallelizing compilers automatically restructure sequential code to exploit any inherent parallelism. The granularity, or task size, of the parallel code is largely determined by the compiler designer; and is chosen according to the target architecture and/or application. For example, a compiler for a VLIW (Very Long Instruction Word) machine, would...

