Results 1 -
9 of
9
Determining Average Program Execution Times and their Variance
, 1989
"... This paper presents a general framework for determining average program execution times and their variance, based on the program's interval structure and control dependence graph. Average execution times and variance values are computed using frequency information from an optimized counter-based exe ..."
Abstract
-
Cited by 84 (0 self)
- Add to MetaCart
This paper presents a general framework for determining average program execution times and their variance, based on the program's interval structure and control dependence graph. Average execution times and variance values are computed using frequency information from an optimized counter-based execution profile of the program. 1 Introduction It is important for a compiler to obtain estimates of execution times for subcomputations of an input program, if it is to attempt optimizations related to overhead values in the target architecture. In earlier work [SH86a, SH86b, Sar87, Sar89], we used estimates of execution times to facilitate the automatic partitioning and scheduling of programs written in the singleassignment language, Sisal, for parallel execution on multiprocessors. In this paper, we present a general framework for estimating average execution times in a program. This approach is based on the interval structure [ASU86] and the control dependence relation [FOW87], both of w...
Analysis and Optimization of Explicitly Parallel Programs Using the Parallel Program Graph Representation
- Proc. of the 10th International Workshop on Languages and Compilers for Parallel Computing, LNCS
, 1997
"... As more and more programs are written in explicitly parallel programming languages, it becomes essential to extend the scope of sequential analysis and optimization techniques to explicitly parallel programs. Since the definition of a program dependence graph (PDG) is strongly tied to its underly ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
As more and more programs are written in explicitly parallel programming languages, it becomes essential to extend the scope of sequential analysis and optimization techniques to explicitly parallel programs. Since the definition of a program dependence graph (PDG) is strongly tied to its underlying sequential program, the PDG is an inadequate intermediate representation for analysis and optimization of explicitly parallel programs. In this paper, we propose the use of Parallel Program Graphs (PPGs) as a general parallel program representation for analysis and optimization of explicitly parallel programs. PPGs are comprised of parallel control flow edges and synchronization edges, and can represent a broad class of deterministic parallel programs. We highlight the main differences between PPGs and PDGs and show how PPGs are strictly more general than PDGs. We also present a solution to the reaching definitions analysis problem for PPGs to illustrate how PPGs can be used to p...
An Empirical Study of Precise Interprocedural Array Analysis
- Scientific Programming
, 1994
"... In this paper we examine the role played by the interprocedural analysis of array accesses in the automatic parallelization of Fortran programs. We use the ptran system to provide measurements of several benchmarks to compare different methods of representing interprocedurally accessed arrays. We ex ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
In this paper we examine the role played by the interprocedural analysis of array accesses in the automatic parallelization of Fortran programs. We use the ptran system to provide measurements of several benchmarks to compare different methods of representing interprocedurally accessed arrays. We examine issues concerning the effectiveness of automatic parallelization using these methods and the efficiency of a precise summarization method. 1 Introduction Effective program parallelization, like any compiler optimization, can benefit from increased precision during its analysis phase. However, increased precision often implies an increase in compilation time and/or storage, forcing a tradeoff between precision and efficiency. If the benefits of increased precision outweigh the degradation in efficiency, a precise analysis should be utilized. In this work we assess the effectiveness and efficiency of a precise form of interprocedural array analysis for automatic parallelization. Specifi...
Static Single Assignment Form for Explicitly Parallel Programs: Theory and Practice
- IN CONFERENCE RECORD OF THE 20 TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES OF PROGRAMMING LANGUAGES (POPL'93
, 1994
"... To sensibly reason about parallel programs, a coherent intermediate form needs to be developed. We describe and prove correctness and safety of algorithms to convert programs that use the Parallel Computing Forum Parallel Sections construct into a parallel Static Single Assignment (SSA) form. We ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
To sensibly reason about parallel programs, a coherent intermediate form needs to be developed. We describe and prove correctness and safety of algorithms to convert programs that use the Parallel Computing Forum Parallel Sections construct into a parallel Static Single Assignment (SSA) form. We define what the concept of dominator and dominance frontier mean in parallel programs. How to extend the SSA form to handle parallel updates and still preserve the SSA properties is described by introducing a new parallel merge operator, the /-function. The minimal placement points for /-functions are identified and proved correct by introducing the meet of two nodes in a Parallel Precedence Graph (PPG), which is the dual concept of join in a sequential control flow graph (CFG). The resulting intermediate form allows compilers to apply classical scalar optimization algorithms to explicitly parallel programs. We also discuss issues encountered while implementing these constructs in Na...
Communicators: Object-Based Multiparty Interactions for Parallel Programming
, 1991
"... Contemporary parallel programming languages often provide only few low-level primitives for pairwise communication and synchronization. These primitives are not always suitable for the interactions being programmed. Programming would be easier if it was possible to tailor communication and synchr ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Contemporary parallel programming languages often provide only few low-level primitives for pairwise communication and synchronization. These primitives are not always suitable for the interactions being programmed. Programming would be easier if it was possible to tailor communication and synchronization mechanisms to fit the needs of the application, much as abstract data types are used to create application-specific data structures and operations. This should also include the possibility of expressing interactions among multiple processes at once. Communicators support this paradigm by creating abstract communication objects that provide a framework for interprocess multiparty interactions. The behavior of these objects is defined in terms of interactions, in which multiple processes can enrole. Interactions are performed when all the roles are filled by ready processes. Nondeterminism is used when the order of interaction performance is immaterial. Interactions can also ...
Loop Distribution with Multiple Exits
- In Proceedings Supercomputing ’92
, 1992
"... We present a loop distribution algorithm that accommodates loops with multiple exits. Our algorithm utilizes and appropriately transforms abstract representations of the program rendering these structures suitable for further program transformations. We present results from implementing this algorit ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We present a loop distribution algorithm that accommodates loops with multiple exits. Our algorithm utilizes and appropriately transforms abstract representations of the program rendering these structures suitable for further program transformations. We present results from implementing this algorithm in the PTRAN system at IBM Research. 1 Introduction The transformation of loop distribution has long been recognized for its role in restructuring programs to increase their concurrency [4, 5, 7, 16] and locality [17, 18]. Unfortunately, loop distribution cannot be counted among the unimodular transformations [6, 20] whose aggregate transformational effects can be summarized by a single transformation. Thus, loop distribution must be performed separately from the unimodular transformations. To accomplish its transformations, a program restructuring system must develop certain analytical information about a program concerning its control and data flow properties, typically represented by...
Model Checking as a Tool Used by Parallelizing Compilers
- 2nd International Workshop on Formal methods for parallel programming: Theory and Applications
, 1997
"... In this paper we describe the usage of temporal logic and model checking in a parallelizing compiler to analyze the structure of a source program and locate opportunities for optimization and parallelization. The source program is represented as a process graph in which the nodes are sequential proc ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper we describe the usage of temporal logic and model checking in a parallelizing compiler to analyze the structure of a source program and locate opportunities for optimization and parallelization. The source program is represented as a process graph in which the nodes are sequential processes and the edges are control and data dependence relationships between the computations at the nodes. By labeling the nodes and edges with descriptive atomic propositions and by specifying the conditions necessary for optimizations and parallelizations as temporal logic formulas, we can use a model checker to locate nodes of the process graph where particular optimizations can be made. To discover opportunities for new optimizations or modify existing ones in this parallelizing compiler, we need only specify their conditions as temporal logic formulas. We do not need to add or modify the code of the compiler. This greatly simplifies the process of locating optimization and parallelization...
Tree-Based Code Optimization
, 1992
"... Nearly all algorithms for code optimization use a control flow graph. In this thesis, I will show that with very minor restrictions on program structure, an abstract syntax tree can be used instead, leading to algorithms that are often much simpler than their graphbased counterparts. The conclus ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Nearly all algorithms for code optimization use a control flow graph. In this thesis, I will show that with very minor restrictions on program structure, an abstract syntax tree can be used instead, leading to algorithms that are often much simpler than their graphbased counterparts. The conclusion is that abstract syntax trees, not control flow graphs, should be the fundamental data structure in code optimization. 1 Introduction Most optimizing compilers consist of a front-end that does syntactic and semantic analysis, and a back-end that does optimization and machine code generation [ASU86]. The main data structure in the front-end is an abstract syntax tree (AST), while in the back-end it is a control flow graph (CFG), which consists of nodes representing computations and edges representing control flow. Thus, code optimization operates on a data structure---the CFG---in which the program has essentially been "flattened" into a tangle of GOTOs (edges). Unlike an AST, a CFG c...
A High Performance Application Representation for Reconfigurable Systems
, 2004
"... Modern reconfigurable computing systems feature powerful hybrid architectures with multiple microprocessor cores, large reconfigurable logic arrays and distributed memory hierarchies. Mapping applications to these complex systems requires a representation that allows both hardware and software synth ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Modern reconfigurable computing systems feature powerful hybrid architectures with multiple microprocessor cores, large reconfigurable logic arrays and distributed memory hierarchies. Mapping applications to these complex systems requires a representation that allows both hardware and software synthesis. Additionally, this representation must enable optimizations that exploit fine and coarse grained parallelism in order to effectively utilize the performance of the underlying reconfigurable architecture. Our work explores a representation based on the program dependence graph (PDG) incorporated with the static single-assignment (SSA) for synthesis to high performance reconfigurable devices. The PDG effectively describes control dependencies, while SSA yields precise data dependencies. When used together, these two representations provide a powerful, synthesizable form that exploits both fine and coarse grained parallelism. Compared to other commonly used representations for reconfigurable systems, the PDG+SSA form creates faster execution time, while using similar area.

