Results 1 -
8 of
8
Compiler Transformations for High-Performance Computing
- ACM Computing Surveys
, 1994
"... In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimization for uniprocessors reduce the number of instructions executed by the program using transformations based on the analysis of scalar quantities and data-flow techniques. ..."
Abstract
-
Cited by 332 (4 self)
- Add to MetaCart
In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimization for uniprocessors reduce the number of instructions executed by the program using transformations based on the analysis of scalar quantities and data-flow techniques. In contrast, optimization for
Compile-Time Techniques for Data Distribution in Distributed Memory Machines
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1991
"... This paper addresses the problem of partitioning data for distributed memory machines (multicomputers). In current day multicomputers, interprocessor communication is more time-consuming than instruction execution. If insufficient attention is paid to the data allocation problem, then the amount of ..."
Abstract
-
Cited by 81 (13 self)
- Add to MetaCart
This paper addresses the problem of partitioning data for distributed memory machines (multicomputers). In current day multicomputers, interprocessor communication is more time-consuming than instruction execution. If insufficient attention is paid to the data allocation problem, then the amount of time spent in interprocessor communication might be so high as to seriously undermine the benefits of parallelism. It is therefore worthwhile for a compiler to analyze patterns of data usage to determine allocation, in order to minimize interprocessor communication. We present a machineindependent analysis of communication-free partitions. We present a matrix notation to describe array accesses in fully parallel loops which lets us derive sufficient conditions for communication-free partitioning (decomposition) of arrays. In the case of a commonly occurring class of accesses, we present a problem formulation to minimize communication costs, when communication-free partitioning of arrays is not possible.
Precise Compile-Time Performance Prediction for Superscalar-Based Computers
- in Proc. ACM SIGPLAN PLDI'94
, 1994
"... Optimizing compilers (particularly parallel compilers) are constrained by their ability to predict performance consequences of the transformations they apply. Many factors, such as unknowns in control structures, dynamic behavior of programs, and complexity of the underlying hardware, make it very d ..."
Abstract
-
Cited by 45 (0 self)
- Add to MetaCart
Optimizing compilers (particularly parallel compilers) are constrained by their ability to predict performance consequences of the transformations they apply. Many factors, such as unknowns in control structures, dynamic behavior of programs, and complexity of the underlying hardware, make it very difficult for compilers to estimate the performance of the transformations accurately and efficiently. In this paper, we present a performance prediction framework that combines several innovative approaches to solve this problem. First, the framework employs a detailed, architecture-specific, but portable, cost model that can be used to estimate the cost of straight line code efficiently. Second, aggregated costs of loops and conditional statements are computed and represented symbolically. This avoids unnecessary, premature guesses and preserves the precision of the prediction. Third, symbolic comparison allows compilers to choose the best transformation dynamically and systematically. Some...
Data Modelling Of Parallel Computer Architectures
, 1992
"... This report contains a compilation of research results that I have attained in the Processor Modelling (ProcMod) subproject of the parTool [15] project during the period May--December 1991. Some of the ideas have been inspired by the ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This report contains a compilation of research results that I have attained in the Processor Modelling (ProcMod) subproject of the parTool [15] project during the period May--December 1991. Some of the ideas have been inspired by the
Automatic parallelization for distributed-memory systems: Experiences and current reseach
, 1993
"... Distributed-memory systems (DMMPs) are powerful tools for solving large-scale scientific and engineering problems. However, these machines are difficult to program since the data must be distributed across the processors and message-passing operations must be inserted for communicating non-local dat ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Distributed-memory systems (DMMPs) are powerful tools for solving large-scale scientific and engineering problems. However, these machines are difficult to program since the data must be distributed across the processors and message-passing operations must be inserted for communicating non-local data. In this paper, we discuss the automatic parallelization of Fortran programs for DMMPs, based on the programming paradigms associated with Vienna Fortran and High Performance Fortran. After introducing the state of the art, as represented by currently implemented systems, we will identify a number of limitations of this technology. In addition to insufficient functionality for handling many real applications, a major deficiency of current systems is the lack of intelligence in selecting good transformation strategies. We argue that a knowledge-based approach to compiling will contribute to more powerful and intelligent automatic parallelization systems in the future.
A Method for Parallel Program Generation with an Application to the
"... This paper describes a translation method for the automatic parallelization of programs based on a separately specified representation of the data. The method unifies the concept of data-representation on the algorithm-level as well as machine-level, based on the so-called view concept. It is shown ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper describes a translation method for the automatic parallelization of programs based on a separately specified representation of the data. The method unifies the concept of data-representation on the algorithm-level as well as machine-level, based on the so-called view concept. It is shown that given a decomposition of the data, application of the translation method to the view-based Booster programming language results in efficient SPMD-code for distributed - as well as shared-memory architectures. It will be argued that the method is not restricted to Booster, but can also be applied to other languages.
Vectorization and Parallelization of Irregular Problems via Graph Coloring
"... Efficient implementations of irregular problems on vector and parallel architectures generally are hard to realize. An important class of irregular problems are Gau-Seidel iteration schemes applied to irregular data sets. The unstructured data dependences arising there prevent restructuring compiler ..."
Abstract
- Add to MetaCart
Efficient implementations of irregular problems on vector and parallel architectures generally are hard to realize. An important class of irregular problems are Gau-Seidel iteration schemes applied to irregular data sets. The unstructured data dependences arising there prevent restructuring compilers from generating efficient code for vector or parallel machines. It is shown, how to structure the data dependences by decomposing the data set using graph coloring techniques and by specifying a particular execution order already on the algorithm level. Methods to master the irregularities originating from different types of tasks are proposed. An example of application is given and possible future developments are mentioned. Contents 1 Introduction 1 2 Preliminaries 2 2.1 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 Data Dependences . . . . . . . . . . . . . . . . . . . . . . . . 3 3 Iteration Schemes 4 4 Gau-Seidel Iterations for Regular Problems 6 5 Ga...
Knowledge-Based Automatic Parallelization by Pattern Recognition
"... We present the top-down design of a new system which performs automatic parallelization of numerical Fortran77, Fortran90 or C source programs for execution on distributed-memory message-passing multiprocessors such as e.g. the INTEL iPSC/860 or the TMC CM-5. The key idea is a high--level pattern ma ..."
Abstract
- Add to MetaCart
We present the top-down design of a new system which performs automatic parallelization of numerical Fortran77, Fortran90 or C source programs for execution on distributed-memory message-passing multiprocessors such as e.g. the INTEL iPSC/860 or the TMC CM-5. The key idea is a high--level pattern matching approach which in some useful way permits partial reverseengineering of a wide class of numerical programs. With only a few hundred patterns, we will be able to completely match many important numerical algorithms. This is also applicable to so-called dusty deck sources that may be 'encrypted' by various former machine-specific optimizations. We show

