Results 1 - 10
of
35
Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures
- Journal of Parallel and Distributed Computing
, 1993
"... This paper describes a number of optimizations that can be used to support the efficient execution of irregular problems on distributed memory parallel machines. These primitives (1) coordinate interprocessor data movement, (2) manage the storage of, and access to, copies of off-processor data, (3) ..."
Abstract
-
Cited by 134 (17 self)
- Add to MetaCart
This paper describes a number of optimizations that can be used to support the efficient execution of irregular problems on distributed memory parallel machines. These primitives (1) coordinate interprocessor data movement, (2) manage the storage of, and access to, copies of off-processor data, (3) minimize interprocessor communication requirements and (4) support a shared name space. We present a detailed performance and scalability analysis of the communication primitives. This performance and scalability analysis is carried out using a workload generator, kernels from real applications and a large unstructured adaptive application (the molecular dynamics code CHARMM). 1 Introduction Over the past few years we have developed a methodology to produce efficient distributed memory code for sparse and unstructured problems in which array accesses are made through a level of indirection. In such problems the dependency structure is determined by variable values known only at runtime. In...
Efficient Support for Irregular Applications on Distributed-Memory Machines
, 1995
"... Irregular computation problems underlie many important scientific applications. Although these problems are computationally expensive, and so would seem appropriate for parallel machines, their irregular and unpredictable run-time behavior makes this type of parallel program difficult to write and a ..."
Abstract
-
Cited by 81 (12 self)
- Add to MetaCart
Irregular computation problems underlie many important scientific applications. Although these problems are computationally expensive, and so would seem appropriate for parallel machines, their irregular and unpredictable run-time behavior makes this type of parallel program difficult to write and adversely affects run-time performance. This paper explores three issues -- partitioning, mutual exclusion, and data transfer -- crucial to the efficient execution of irregular problems on distributed-memory machines. Unlike previous work, we studied the same programs running in three alternative systems on the same hardware base (a Thinking Machines CM-5): the CHAOS irregular application library, Transparent Shared Memory (TSM), and eXtensible Shared Memory (XSM). CHAOS and XSM performed equivalently for all three applications. Both systems were somewhat (13%) to significantly faster (991%) than TSM.
Distributed memory compiler design for sparse problems
- IEEE Transactions on Computers
, 1995
"... This paper addresses the issue of compiling concurrent loop nests in the presence of complicated array references and irregularly distributed arrays. Arrays accessed within loops may contain accesses that make it impossible to precisely determine the reference pattern at compile time. This paper pro ..."
Abstract
-
Cited by 66 (10 self)
- Add to MetaCart
This paper addresses the issue of compiling concurrent loop nests in the presence of complicated array references and irregularly distributed arrays. Arrays accessed within loops may contain accesses that make it impossible to precisely determine the reference pattern at compile time. This paper proposes a run time support mechanism that is used e ectively by a compiler to generate e cient code in these situations. The compiler accepts as input aFortran 77 program enhanced with speci cations for distributing data, and outputs a message passing program that runs on the nodes of a distributed memory machine. The runtime support for the compiler consists of a library of primitives designed to support irregular patterns of distributed array accesses and irregularly distributed array partitions. Avariety of performance results on the Intel iPSC/860 are presented.
A Framework for Optimizing Parallel I/O
, 1994
"... There has been a great deal of recentinterest in parallel I/O. This paper discusses issues in the design and implementation of a portable I/O library designed to optimize the performance of multiprocessor architectures that include multiple disks or disk arrays. The major emphasis of the paper is ..."
Abstract
-
Cited by 58 (8 self)
- Add to MetaCart
There has been a great deal of recentinterest in parallel I/O. This paper discusses issues in the design and implementation of a portable I/O library designed to optimize the performance of multiprocessor architectures that include multiple disks or disk arrays. The major emphasis of the paper is on optimizations that are made possible by the use of collective I/O, so that I/O requests for multiple processors can be combined to improve performance. Performance measurements from benchmarking our implementation of an I/O library that currently performs collective local optimizations, called Jovian, on three application templates are also presented.
Runtime support and compilation methods for user-specified irregular data distributions
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1995
"... This paper describes two new ideas by which a High Performance Fortran compiler can deal with irregular computa-tions effectively. The first mechanism invokes a user specified mapping procedure via a set of proposed compiler directives. The directives allow use of program arrays to describe graph c ..."
Abstract
-
Cited by 55 (11 self)
- Add to MetaCart
This paper describes two new ideas by which a High Performance Fortran compiler can deal with irregular computa-tions effectively. The first mechanism invokes a user specified mapping procedure via a set of proposed compiler directives. The directives allow use of program arrays to describe graph connec-tivity, spatial location of array elements, and computational load. The second mechanism is a conservative method for compiling irregular loops in which dependence arises only due to reduction operations. This mechanism in many cases enables a compiler to recognize that it is possible to reuse previously computed infor-mation from inspectors (e.g., communication schedules, loop it-eration partitions, and information that associates off-processor data copies with on-processor buffer locations). This paper also presents performance results for these mechanisms from a For-tran 90D compiler implementation.
An Integrated Runtime and Compile-time Approach for Parallelizing Structured and Block Structured Applications
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1995
"... Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). In this paper, we present a combined runtime and compile-time approach for parallel ..."
Abstract
-
Cited by 54 (12 self)
- Add to MetaCart
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). In this paper, we present a combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion. Wehave designed and implemented a runtime library which can be used to port these applications on distributed memory machines. The library is currently implemented on several different systems. To further ease the task of application programmers, wehave developed methods for integrating this runtime library with compilers for HPF-like parallel programming languages. We discuss howwehaveintegrated this runtime library with the Fortran 90D compiler being developed at Syracuse University. We present experimental results to demonstrate the efficacy of our approach. Wehave exper...
Interprocedural Symbolic Analysis
, 1994
"... Compiling for efficient execution on advanced computer architectures requires extensive program analysis and transformation. Most compilers limit their analysis to simple phenomena within single procedures, limiting effective optimization of modular codes and making the programmer's job harder. We p ..."
Abstract
-
Cited by 48 (1 self)
- Add to MetaCart
Compiling for efficient execution on advanced computer architectures requires extensive program analysis and transformation. Most compilers limit their analysis to simple phenomena within single procedures, limiting effective optimization of modular codes and making the programmer's job harder. We present methods for analyzing array side effects and for comparing nonconstant values computed in the same and different procedures. Regular sections, described by rectangular bounds and stride, prove as effective in describing array side effects in Linpack as more complicated summary techniques. On a set of six programs, regular section analysis of array side effects gives 0 to 39 percent reductions in array dependences at call sites, with 10 to 25 percent increases in analysis time. Symbolic analysis is essential to data dependence testing, array section analysis, and other high-level program manipulations. We give methods for building symb...
A Manual for the CHAOS Runtime Library
, 1995
"... Procedures are presented that are designed to help users efficiently program irregular problems (e.g. unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These procedures are also designed for use in compilers for distr ..."
Abstract
-
Cited by 31 (9 self)
- Add to MetaCart
Procedures are presented that are designed to help users efficiently program irregular problems (e.g. unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These procedures are also designed for use in compilers for distributed memory multiprocessors. The portable CHAOS procedures are designed to support dynamic data distributions and to automatically generate send and receive messsage by capturing communications patterns at runtime.
Interprocedural Partial Redundancy Elimination and Its Application To Distributed Memory Compilation
- University of Maryland
, 1995
"... Partial Redundancy Elimination #PRE# is a general scheme for suppressing partial redundancies which encompasses traditional optimizations likeloopinvariant code motion and redundant code elimination. In this paper we address the problem of performing this optimization interprocedurally.We use interp ..."
Abstract
-
Cited by 28 (7 self)
- Add to MetaCart
Partial Redundancy Elimination #PRE# is a general scheme for suppressing partial redundancies which encompasses traditional optimizations likeloopinvariant code motion and redundant code elimination. In this paper we address the problem of performing this optimization interprocedurally.We use interprocedural partial redundancy elimination for placement of communication and communication preprocessing statements while compiling for distributed memory parallel machines. 1 Introduction Partial Redundancy Elimination #PRE# is a well known technique for optimizing code by suppressing partially redundant computations. It encompasses traditional optimizations like invariant code motion and redundant computation elimination. It is widely used in optimizing compilers for performing common subexpression elimination and strength reduction. More recently, it has been used for more complex code placement tasks like placement of communication statements while compiling for parallel machines #...
Construction of Thinned Gated Single-Assignment Form
- In Proc. 6rd Workshop on Programming Languages and Compilers for Parallel Computing
, 1993
"... . Analysis of symbolic expressions benefits from a suitable program representation. We show how to build thinned gated singleassignment (TGSA) form, a value-oriented program representation which is more complete than standard SSA form, defined on all reducible programs, and better for representing s ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
. Analysis of symbolic expressions benefits from a suitable program representation. We show how to build thinned gated singleassignment (TGSA) form, a value-oriented program representation which is more complete than standard SSA form, defined on all reducible programs, and better for representing symbolic expressions than program dependence graphs or original GSA form. We present practical algorithms for constructing thinned GSA form from the control dependence graph and SSA form. Extensive experiments on large Fortran programs show these methods to take linear time and space in practice. Our implementation of value numbering on TGSA form drives scalar symbolic analysis in the ParaScope programming environment. 1 Introduction Analysis of non-constant values yields significant benefits in a parallelizing compiler. The design of a symbolic analyzer requires careful choice of a program representation, that these benefits may be gained without using excessive time and space. In symbolic ...

