Results 1 - 10
of
13
The PARADIGM Compiler for Distributed-Memory Message Passing Multicomputers
- IEEE Computer
, 1994
"... The PARADIGM compiler project provides an automated means to parallelize programs, written in a serial programming model, for efficient execution on distributed-memory multicomputers. In addition to performing traditional compiler optimizations, PARADIGM is unique in that it addresses many other is ..."
Abstract
-
Cited by 98 (9 self)
- Add to MetaCart
The PARADIGM compiler project provides an automated means to parallelize programs, written in a serial programming model, for efficient execution on distributed-memory multicomputers. In addition to performing traditional compiler optimizations, PARADIGM is unique in that it addresses many other issues within a unified platform: automatic data distribution, synthesis of high-level communication, communication optimizations, irregular computations, functional and data parallelism, and multithreaded execution. This paper describes the techniques used and provides experimental evidence of their effectiveness. 1 Introduction Distributed-memory massively parallel multicomputers can provide the high levels of performance required to solve the Grand Challenge computational science problems [16]. Distributed-memory multicomputers such as the Intel iPSC/860, the Intel Paragon, the IBM SP-1 and the Thinking Machines CM-5 offer significant advantages over shared-memory multiprocessors in terms...
Automatic Generation of Efficient Array Redistribution Routines for Distributed Memory Multicomputers
, 1995
"... Appropriate data distribution has been found to be critical for obtaining good performance on Distributed Memory Multicomputers like the CM-5, Intel Paragon and IBM SP-1. It has also been found that some programs need to change their distributions during execution for better performance (redistribut ..."
Abstract
-
Cited by 53 (4 self)
- Add to MetaCart
Appropriate data distribution has been found to be critical for obtaining good performance on Distributed Memory Multicomputers like the CM-5, Intel Paragon and IBM SP-1. It has also been found that some programs need to change their distributions during execution for better performance (redistribution). This work focuses on automatically generating efficient routines for redistribution. We present a new mathematical representation for regular distributions called PITFALLS and then discuss algorithms for redistribution based on this representation. One of the significant contributions of this work is being able to handle arbitrary source and target processor sets while performing redistribution. Another important contribution is the ability to handle an arbitrary number of dimensions for the array involved in the redistribution in a scalable manner. Our implementation of these techniques is based on an MPI-like communication library. The results presented show the low overheads for our redistribution algorithm as compared to naive runtime methods.
Advanced Compilation Techniques in the PARADIGM Compiler for Distributed-Memory Multicomputers
, 1995
"... The PARADIGM compiler project provides an automated means to parallelize programs, written in a serial programming model, for efficient execution on distributed-memory multicomputers. A previous implementation of the compiler based on the PTD representation allowed symbolic array sizes, affine loop ..."
Abstract
-
Cited by 34 (2 self)
- Add to MetaCart
The PARADIGM compiler project provides an automated means to parallelize programs, written in a serial programming model, for efficient execution on distributed-memory multicomputers. A previous implementation of the compiler based on the PTD representation allowed symbolic array sizes, affine loop bounds and array subscripts, and variable number of processors, provided that arrays were single- or multi-dimensionally block distributed. The techniques presented here extend the compiler to also accept multidimensional cyclic and block-cyclic distributions within a uniform symbolic framework. These extensions demand more sophisticated symbolic manipulation capabilities. A novel aspect of our approach is to meet this demand by interfacing PARADIGM with a powerful off-the-shelf symbolic package, Mathematica(TM). This paper describes some of the Mathematica(TM) routines that performs various transformations, shows how they are invoked and used by the compiler to overcome the new challenges, and presents experimental results for code involving cyclic and block-cyclic arrays as evidence of the feasibility of the approach.
A Convex Programming Approach for Exploiting Data and Functional Parallelism on Distributed Memory Multicomputers
, 1994
"... Compilers have focussed on the exploitation of one of functional or data parallelism in the past. The PARADIGM compiler project at the University of Illinois is among the #rst to incorporate techniques for simultaneous exploitation of both. The work in this paper describes the techniques used in the ..."
Abstract
-
Cited by 29 (8 self)
- Add to MetaCart
Compilers have focussed on the exploitation of one of functional or data parallelism in the past. The PARADIGM compiler project at the University of Illinois is among the #rst to incorporate techniques for simultaneous exploitation of both. The work in this paper describes the techniques used in the PARADIGM compiler and analyzes the optimality of these techniques. It is the #rst of its kind to use realistic cost models and includes data transfer costs which all previous researchers have neglected. Preliminary results on the CM-5 show the e#cacy of our methods and the signi#cant advantages of using functional and data parallelism together for execution of real applications. 1. INTRODUCTION Distributed memory multicomputers such as the Intel Paragon, the IBM SP-1 and the Thinking Machines CM-5 o#er signi#cant advantages over shared memory multiprocessors in terms of cost and scalability. Unfortunately,to extract all that computational power from these machines, users have to write e#...
A Global Communication Optimization Technique Based on Data-Flow Analysis and Linear Algebra
, 1998
"... Reducing communication overhead is extremely important in distributed-memory message-passing architectures. In this paper, we present a technique to improve communication that considers data access patterns of the entire program. Our approach is based on a combination of traditional data-flow analys ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Reducing communication overhead is extremely important in distributed-memory message-passing architectures. In this paper, we present a technique to improve communication that considers data access patterns of the entire program. Our approach is based on a combination of traditional data-flow analysis and a linear algebra framework, and works on structured programs with conditional statements and nested loops but without arbitrary goto statements. The distinctive features of the solution are the accuracy in keeping communication set information, support for general alignments and distributions including block-cyclic distributions and the ability to simulate some of the previous approaches with suitable modifications. We also show how optimizations such as message vectorization, message coalescing and redundancy elimination are supported by our framework. Experimental results on several benchmarks show that our technique is effective in reducing the number of messages (an average of 32%...
A Framework for Exploiting Data and Functional Parallelism on Distributed Memory Multicomputers
, 1994
"... Recent research efforts have shown the benefits of integrating functional and data parallelism over using either pure data parallelism or pure functional parallelism. The work in this paper presents a theoretical framework for deciding on a good execution strategy for a given program based on the av ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Recent research efforts have shown the benefits of integrating functional and data parallelism over using either pure data parallelism or pure functional parallelism. The work in this paper presents a theoretical framework for deciding on a good execution strategy for a given program based on the available functional and data parallelism in the program. The framework is based on assumptions about the form of computation and communication cost functions for multicomputer systems. We present mathematical functions for these costs and show that these functions are realistic. The framework also requires specification of the available functional and data parallelism for a given problem. For this purpose, we have developed a graphical programming tool. Currently, we have tested our approach using three benchmark programs on the Thinking Machines CM-5 and Intel Paragon. Results presented show that the approach is very effective and can provide a two- to three-fold increase in speedups over ap...
Compiler and Run-Time Support for Irregular Computations
, 1995
"... There are many important applications in computational fluid dynamics, circuit simulation and structural analysis that can be more accurately modeled using iterations on unstructured grids. In these problems, regular compiler analysis for Massively Parallel Processors (MPP) with distributed address ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
There are many important applications in computational fluid dynamics, circuit simulation and structural analysis that can be more accurately modeled using iterations on unstructured grids. In these problems, regular compiler analysis for Massively Parallel Processors (MPP) with distributed address space fails because communication can only be determined at run-time. However, in many of these applications the communication pattern repeats for every iteration. Therefore, equivalent optimizations to the regular case can be achieved with a combination of run-time support (RTS) and compiler analysis.
PARADIGM (version 2.0): A New HPF Compilation System
- In Proc. 1999 International Parallel Processing Symposium (IPPS'99
, 1999
"... In this paper, a we present sample performance figures for a new linear algebra-based compilation framework implemented in a research HPF compiler called PARADIGM. The metrics considered include compilation times, execution times, and communication costs. We compare all of these metrics against comm ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this paper, a we present sample performance figures for a new linear algebra-based compilation framework implemented in a research HPF compiler called PARADIGM. The metrics considered include compilation times, execution times, and communication costs. We compare all of these metrics against commercial, industrial strength compilers such aspghpf (v 2.2) andxlhpf (v 1.01) and show the superior benefits of PARADIGM (v 2.0) in all of the metrics used. We also demonstrate how robustly our framework performs in the presence of arbitrary alignments and distributions. The framework’s symbolic manipulation capability is derived from an off-the-shelf commercial symbolic analysis software called Mathematica. b Measured metrics for a few popular benchmarks such as Automatic Differentiation and Integration (ADI), Euler Fluxes, TOMCATV and 2-D Explicit Hydrodynamics (EXPL) have been presented.
Optimizing Communication Using Global Dataflow Analysis
, 1997
"... In distributed-memory message passing architectures reducing communication cost is extremely important. In this paper, we present a technique to improve communication globally. Our approach is based on a combination of linear algebra framework and dataflow analysis, and can take arbitrary control fl ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In distributed-memory message passing architectures reducing communication cost is extremely important. In this paper, we present a technique to improve communication globally. Our approach is based on a combination of linear algebra framework and dataflow analysis, and can take arbitrary control flow into account. The distinctive feature of the algorithm is its accuracy in keeping communication set information, its support for general alignments and distributions including block-cyclic distributions, and capability of simulating some of the previous approaches by appropriate modifications. The method is currently being implemented in the PARADIGM compiler. We show how optimizations such as message vectorization, message coalescing, redundancy elimination can be supported by our new framework. Experimental results on an IBM SP-2 show that our technique is effective in reducing both the number as well as the volume of the communication.
Communication Generation for Data-Parallel Languages
, 1996
"... Data-parallel languages allow programmers to use the familiar machine-independent programming style to develop programs for multiprocessor systems. These languages relieve users of the tedious task of inserting interprocessor communication and delegate this crucial and error-prone task to the compil ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Data-parallel languages allow programmers to use the familiar machine-independent programming style to develop programs for multiprocessor systems. These languages relieve users of the tedious task of inserting interprocessor communication and delegate this crucial and error-prone task to the compilers for the languages. Since remote access in hierarchical multiprocessor systems is orders of magnitude slower than access to a processor's local memory, interprocessor communication introduces significant overheads to the total execution time. The success of data-parallel languages depends heavily on the compiler's ability to reduce the communication overhead. This dissertation describes novel techniques for communication generation. It covers issues related to communication analysis, placement, and optimization. The techniques have been implemented in the Rice Fortran D95 research compiler -- a High Performance Fortran (HPF) compiler -- being developed at the Rice University. A major cont...

