Results 1 - 10
of
65
Dataflow Analysis of Array and Scalar References
- International Journal of Parallel Programming
, 1991
"... Given a program written in a simple imperative language (assignment statements, for loops, affine indices and loop limits), this paper presents an algorithm for analyzing the patterns along which values flow as the execution proceeds. For each array or scalar reference, the result is the name an ..."
Abstract
-
Cited by 188 (2 self)
- Add to MetaCart
Given a program written in a simple imperative language (assignment statements, for loops, affine indices and loop limits), this paper presents an algorithm for analyzing the patterns along which values flow as the execution proceeds. For each array or scalar reference, the result is the name and iteration vector of the source statement as a function of the iteration vector of the referencing statement. The paper discusses several applications of the method: conversion of a program to a set of recurrence equations, array and scalar expansion, program verification and parallel program construction. Keywords dataflow analysis, semantics analysis, array expansion. 1 Introduction It is a well known fact that scientific programs spend most of their running time in executing loops operating on arrays. Hence if a restructuring or optimizing compiler is to do a good job, it must be able to do a thorough analysis of the addressing patterns in such loops. If taken in full generality, ...
Automatic Data Partitioning on Distributed Memory Multiprocessors
, 1991
"... An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Most of the current projects leave this tedious problem almost entirely to the user. In this paper ..."
Abstract
-
Cited by 102 (6 self)
- Add to MetaCart
An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Most of the current projects leave this tedious problem almost entirely to the user. In this paper, we present a novel approach to the problem of automatic data partitioning. We introduce the notion of constraints on data distribution, and show how, based on performance considerations, a compiler identifies constraints to be imposed on the distribution of various data structures. These constraints are then combined by the compiler to obtain a complete and consistent picture of the data distribution scheme, one that offers good performance in terms of the overall execution time.
Tiling Multidimensional Iteration Spaces for Multicomputers
, 1992
"... This paper addresses the problem of compiling perfectly nested loops for multicomputers (distributed memory machines). The relatively high communication startup costs in these machines renders frequent communication very expensive. Motivated by this, we present a method of aggregating a number of lo ..."
Abstract
-
Cited by 99 (20 self)
- Add to MetaCart
This paper addresses the problem of compiling perfectly nested loops for multicomputers (distributed memory machines). The relatively high communication startup costs in these machines renders frequent communication very expensive. Motivated by this, we present a method of aggregating a number of loop iterations into tiles where the tiles execute atomically -- a processor executing the iterations belonging to a tile receives all the data it needs before executing any one of the iterations in the tile, executes all the iterations in the tile and then sends the data needed by other processors. Since synchronization is not allowed during the execution of a tile, partitioning the iteration space into tiles must not result in deadlock. We first show the equivalence between the problem of finding partitions and the problem of determining the cone for a given set of dependence vectors. We then present an approach to partitioning the iteration space into deadlock-free tiles so that communicati...
Beyond Induction Variables: Detecting and Classifying Sequences Using a Demand-driven SSA Form
- ACM Transactions on Programming Languages and Systems
, 1995
"... this paper we present a practical technique for detecting a broader class of linear induction variables than is usually recognized, as well as several other sequence forms, including periodic, polynomial, geometric, monotonic, and wrap-around variables. Our method is based on Factored Use-Def (FUD) ..."
Abstract
-
Cited by 99 (5 self)
- Add to MetaCart
this paper we present a practical technique for detecting a broader class of linear induction variables than is usually recognized, as well as several other sequence forms, including periodic, polynomial, geometric, monotonic, and wrap-around variables. Our method is based on Factored Use-Def (FUD) chains, a demand-driven representation of the popular Static Single Assignment form. In this form, strongly connected components of the associated SSA graph correspond to sequences in the source program: we describe a simple yet efficient algorithm for detecting and classifying these sequences. We have implemented this algorithm in Nascent, our restructuring Fortran 90+ compiler, and we present some results showing the effectiveness of our approach.
Automatic Program Parallelization
, 1993
"... This paper presents an overview of automatic program parallelization techniques. It covers dependence analysis techniques, followed by a discussion of program transformations, including straight-line code parallelization, do loop transformations, and parallelization of recursive routines. The last s ..."
Abstract
-
Cited by 97 (8 self)
- Add to MetaCart
This paper presents an overview of automatic program parallelization techniques. It covers dependence analysis techniques, followed by a discussion of program transformations, including straight-line code parallelization, do loop transformations, and parallelization of recursive routines. The last section of the paper surveys several experimental studies on the effectiveness of parallelizing compilers.
Optimizing for Parallelism and Data Locality
- In Proceedings of the 1992 ACM International Conference on Supercomputing
, 1992
"... Previous research has used program transformation to introduce parallelism and to exploit data locality. Unfortunately, these two objectives have usually been considered independently. This work explores the tradeoffs between effectively utilizing parallelism and memory hierarchy on shared-memory mu ..."
Abstract
-
Cited by 92 (13 self)
- Add to MetaCart
Previous research has used program transformation to introduce parallelism and to exploit data locality. Unfortunately, these two objectives have usually been considered independently. This work explores the tradeoffs between effectively utilizing parallelism and memory hierarchy on shared-memory multiprocessors. We present a simple, but surprisingly accurate, memory model to determine cache line reuse from both multiple accesses to the same memory location and from consecutive memory access. The model is used in memory optimizing and loop parallelization algorithms that effectively exploit data locality and parallelism in concert. We demonstrate the efficacy of this approach with very encouraging experimental results. 1 Introduction Transformations to exploit parallelism and to improve data locality are two of the most valuable compiler techniques in use today. Independently, each of these optimizations has been shown to result in dramatic improvements. This paper seeks to combine t...
Beyond Induction Variables
, 1992
"... Induction variable detection is usually closely tied to the strength reduction optimization. This paper studies induction variable analysis from a different perspective, that of finding induction variables for data dependence analysis. While classical induction variable analysis techniques have been ..."
Abstract
-
Cited by 85 (6 self)
- Add to MetaCart
Induction variable detection is usually closely tied to the strength reduction optimization. This paper studies induction variable analysis from a different perspective, that of finding induction variables for data dependence analysis. While classical induction variable analysis techniques have been used successfully up to now, we have found a simple algorithm based on the the Static Single Assignment form of a program that finds all linear induction variables in a loop. Moreover, this algorithm is easily extended to find induction variables in multiple nested loops, to find nonlinear induction variables, and to classify other integer scalar assignments in loops, such as monotonic, periodic and wraparound variables. Some of these other variables are now classified using ad hoc pattern recognition, while others are not analyzed by current compilers. Giving a unified approach improves the speed of compilers and allows a more general classification scheme. We also show how to use these va...
PASSION: Parallel And Scalable Software for Input-Output
, 1994
"... \We are developing a software system called PASSION: Parallel And Scalable Software for Input-Output which provides software support for high performance parallel I/O. PASSION provides support at the language, compiler, runtime as well as file system level. PASSION provides runtime procedures for pa ..."
Abstract
-
Cited by 72 (35 self)
- Add to MetaCart
\We are developing a software system called PASSION: Parallel And Scalable Software for Input-Output which provides software support for high performance parallel I/O. PASSION provides support at the language, compiler, runtime as well as file system level. PASSION provides runtime procedures for parallel access to files (read/write), as well as for out-of-core computations. These routines can either be used together with a compiler to translate out-of-core data parallel programs written in a language like HPF, or used directly by application programmers. A number of optimizations such as Two-Phase Access, Data Sieving, Data Prefetching and Data Reuse have been incorporated in the PASSION Runtime Library for improved performance. PASSION also provides an initial framework for runtime support for out-of-core irregular problems. The goal of the PASSION compiler is to automatically translate out- of-core data parallel programs to node programs for distributed memory machines, with calls to the PASSION Runtime Library. At the language level, PASSION suggests extensions to HPF for out-of-core programs. At the file system level, PASSION provides support for buffering and prefetching data from disks. A portable parallel file system is also being developed as part of this project, which can be used across homogeneous or heterogeneous networks of workstations. PASSION also provides support for integrating data and task parallelism using parallel I/O techniques. We have used PASSION to implement a number of out-of-core applications such as a Laplace's equation solver, 2D FFT, Matrix Multiplication, LU Decomposition, image processing applications as well as unstructured mesh kernels in molecular dynamics and computational fluid dynamics. We are currently in the process of using PASSION in applications in CFD (3D turbulent flows), molecular structure calculations, seismic computations, and earth and space science applications such as Four-Dimensional Data Assimilation. PASSION is currently available on the Intel Paragon, Touchstone Delta and iPSC/860. Efforts are underway to port it to the IBM SP-1 and SP-2 using the Vesta Parallel File System.
A Practical Data Flow Framework for Array Reference Analysis and its Use in Optimizations
- In ACM SIGPLAN'93 Conf. on Prog. Lang. Design and Implementation
, 1993
"... Data flow analysis techniques have traditionally been restricted to the analysis of scalar variables. This restriction, however, imposes a limitation on the kinds of optimizations that can be performed in loops containing array references. We present a data flow framework for array reference analysi ..."
Abstract
-
Cited by 55 (2 self)
- Add to MetaCart
Data flow analysis techniques have traditionally been restricted to the analysis of scalar variables. This restriction, however, imposes a limitation on the kinds of optimizations that can be performed in loops containing array references. We present a data flow framework for array reference analysis that provides the information needed in various optimizations targeted at sequential or fine-grained parallel architectures. The framework extends the traditional scalar framework by incorporating iteration distance values into the analysis to qualify the computed data flow solution during the fixed point iteration. Analyses phrased in this framework are capable of discovering recurrent access patterns among array references that evolve during the execution of a loop. The framework is practical in that the fixed point solution requires at most three passes over the body of structured loops. Applications of our framework are discussed for register allocation, load/store optimizations, and controlled loop unrolling.
Global Communication Analysis and Optimization
- In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation
, 1996
"... Reducing communication cost is crucial to achieving good performance on scalable parallel machines. This paper presents a new compiler algorithm for global analysis and optimization of communication in data-parallel programs. Our algorithm is distinct from existing approaches in that rather than han ..."
Abstract
-
Cited by 46 (2 self)
- Add to MetaCart
Reducing communication cost is crucial to achieving good performance on scalable parallel machines. This paper presents a new compiler algorithm for global analysis and optimization of communication in data-parallel programs. Our algorithm is distinct from existing approaches in that rather than handling loop-nests and array references one by one, it considers all communication in a procedure and their interactions under different placements before making a final decision on the placement of any communication. It exploits the flexibility resulting from this advanced analysis to eliminate redundancy, reduce the number of messages, and reduce contention for cache and communication buffers, all in a unified framework. In contrast, single loop-nest analysis often retains redundant communication, and more aggressive dataflow analysis on array sections can generate too many messages or cache and buffer contention. The algorithm has been implemented in the IBM pHPF compiler for High Performan...

