Results 1 - 10
of
27
Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures
- Journal of Parallel and Distributed Computing
, 1993
"... This paper describes a number of optimizations that can be used to support the efficient execution of irregular problems on distributed memory parallel machines. These primitives (1) coordinate interprocessor data movement, (2) manage the storage of, and access to, copies of off-processor data, (3) ..."
Abstract
-
Cited by 134 (17 self)
- Add to MetaCart
This paper describes a number of optimizations that can be used to support the efficient execution of irregular problems on distributed memory parallel machines. These primitives (1) coordinate interprocessor data movement, (2) manage the storage of, and access to, copies of off-processor data, (3) minimize interprocessor communication requirements and (4) support a shared name space. We present a detailed performance and scalability analysis of the communication primitives. This performance and scalability analysis is carried out using a workload generator, kernels from real applications and a large unstructured adaptive application (the molecular dynamics code CHARMM). 1 Introduction Over the past few years we have developed a methodology to produce efficient distributed memory code for sparse and unstructured problems in which array accesses are made through a level of indirection. In such problems the dependency structure is determined by variable values known only at runtime. In...
A Practical Data Flow Framework for Array Reference Analysis and its Use in Optimizations
- In ACM SIGPLAN'93 Conf. on Prog. Lang. Design and Implementation
, 1993
"... Data flow analysis techniques have traditionally been restricted to the analysis of scalar variables. This restriction, however, imposes a limitation on the kinds of optimizations that can be performed in loops containing array references. We present a data flow framework for array reference analysi ..."
Abstract
-
Cited by 55 (2 self)
- Add to MetaCart
Data flow analysis techniques have traditionally been restricted to the analysis of scalar variables. This restriction, however, imposes a limitation on the kinds of optimizations that can be performed in loops containing array references. We present a data flow framework for array reference analysis that provides the information needed in various optimizations targeted at sequential or fine-grained parallel architectures. The framework extends the traditional scalar framework by incorporating iteration distance values into the analysis to qualify the computed data flow solution during the fixed point iteration. Analyses phrased in this framework are capable of discovering recurrent access patterns among array references that evolve during the execution of a loop. The framework is practical in that the fixed point solution requires at most three passes over the body of structured loops. Applications of our framework are discussed for register allocation, load/store optimizations, and controlled loop unrolling.
Runtime support and compilation methods for user-specified irregular data distributions
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1995
"... This paper describes two new ideas by which a High Performance Fortran compiler can deal with irregular computa-tions effectively. The first mechanism invokes a user specified mapping procedure via a set of proposed compiler directives. The directives allow use of program arrays to describe graph c ..."
Abstract
-
Cited by 55 (11 self)
- Add to MetaCart
This paper describes two new ideas by which a High Performance Fortran compiler can deal with irregular computa-tions effectively. The first mechanism invokes a user specified mapping procedure via a set of proposed compiler directives. The directives allow use of program arrays to describe graph connec-tivity, spatial location of array elements, and computational load. The second mechanism is a conservative method for compiling irregular loops in which dependence arises only due to reduction operations. This mechanism in many cases enables a compiler to recognize that it is possible to reuse previously computed infor-mation from inspectors (e.g., communication schedules, loop it-eration partitions, and information that associates off-processor data copies with on-processor buffer locations). This paper also presents performance results for these mechanisms from a For-tran 90D compiler implementation.
An Integrated Runtime and Compile-time Approach for Parallelizing Structured and Block Structured Applications
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1995
"... Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). In this paper, we present a combined runtime and compile-time approach for parallel ..."
Abstract
-
Cited by 54 (12 self)
- Add to MetaCart
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). In this paper, we present a combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion. Wehave designed and implemented a runtime library which can be used to port these applications on distributed memory machines. The library is currently implemented on several different systems. To further ease the task of application programmers, wehave developed methods for integrating this runtime library with compilers for HPF-like parallel programming languages. We discuss howwehaveintegrated this runtime library with the Fortran 90D compiler being developed at Syracuse University. We present experimental results to demonstrate the efficacy of our approach. Wehave exper...
Runtime Compilation Techniques for Data Partitioning and Communication Schedule Reuse
- PROCEEDINGS OF THE 1993 ACM/IEEE CONFERENCE ON SUPERCOMPUTING
, 1993
"... In this paper, we describe two new ideas by which HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specified mapping procedure via a set of compiler directives. The directives allow the user to use progmm arrays to describe graph connectivity, spatial ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
In this paper, we describe two new ideas by which HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specified mapping procedure via a set of compiler directives. The directives allow the user to use progmm arrays to describe graph connectivity, spatial location of army elements and computational load. The second is a simple conservative method that in many casea enables a compiler to recognize that it is possible to reuse previously computed results from inspectors (e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.
Compiler and Runtime Support for Structured and Block Structured Applications
- IN PROCEEDINGS SUPERCOMPUTING '93
, 1993
"... Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid or adaptive codes) and/or irregularly coupled(called Irregularly CoupledRegular Meshes). We have designed and implemented a runtime library for parallelizing this general class of appli ..."
Abstract
-
Cited by 35 (15 self)
- Add to MetaCart
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid or adaptive codes) and/or irregularly coupled(called Irregularly CoupledRegular Meshes). We have designed and implemented a runtime library for parallelizing this general class of applications on distributed memory parallel machines in an efficient and machine independent manner. In this paper we present how this runtime library can be integrated with compilers for High PerformanceFortran (HPF) style parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University and provide experimental data on a block structured NavierStokes solver template and a small multigrid example parallelized using this compiler and run on an Intel iPSC/860. We show that the compiler parallelizedcode performs within 20% of the code parallelized by inserting calls to the runtime library manually.
A Unified Framework for Optimizing Communication in Data-Parallel Programs
- IEEE Transactions on Parallel and Distributed Systems
, 1996
"... This paper presents a framework, based on global array data-flow analysis, to reduce communication costs in a program being compiled for a distributed memory machine. We introduce available section descriptor, a novel representation of communication involving array sections. This representation al ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
This paper presents a framework, based on global array data-flow analysis, to reduce communication costs in a program being compiled for a distributed memory machine. We introduce available section descriptor, a novel representation of communication involving array sections. This representation allows us to apply techniques for partial redundancy elimination to obtain powerful communication optimizations. With a single framework, we are able to capture optimizations like (i) vectorizing communication, (ii) eliminating communication that is redundant on any control flow path, (iii) reducing the amount of data being communicated, (iv) reducing the number of processors to which data must be communicated, and (v) moving communication earlier to hide latency, and to subsume previous communication. We show that the bidirectional problem of eliminating partial redundancies can be decomposed into simpler unidirectional problems even in the context of an array section representation, w...
A Unified Data-Flow Framework for Optimizing Communication
- In Proceedings of the Seventh Workshop on Languages and Compilers for Parallel Computing
, 1996
"... . This paper presents a framework, based on global array dataflow analysis, to reduce communication costs in a program being compiled for a distributed memory machine. This framework applies techniques for partial redundancy elimination to available section descriptors, a novel representation of ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
. This paper presents a framework, based on global array dataflow analysis, to reduce communication costs in a program being compiled for a distributed memory machine. This framework applies techniques for partial redundancy elimination to available section descriptors, a novel representation of communication involving array sections. With a single framework, we are able to capture numerous optimizations like (i) vectorizing communication, (ii) eliminating communication that is redundant on any control flow path, (iii) reducing the amount of data being communicated, (iv) reducing the number of processors to which data must be communicated, and (v) moving communication earlier to hide latency, and to subsume previous communication. Further, the explicit representation of availability of data in our framework allows processors other than the owners also to send values needed by other processors, leading to additional opportunities for optimizing communication. Another contr...
A Manual for the CHAOS Runtime Library
, 1995
"... Procedures are presented that are designed to help users efficiently program irregular problems (e.g. unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These procedures are also designed for use in compilers for distr ..."
Abstract
-
Cited by 31 (9 self)
- Add to MetaCart
Procedures are presented that are designed to help users efficiently program irregular problems (e.g. unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These procedures are also designed for use in compilers for distributed memory multiprocessors. The portable CHAOS procedures are designed to support dynamic data distributions and to automatically generate send and receive messsage by capturing communications patterns at runtime.
Improving the Performance of DSM Systems via Compiler Involvement
- In Proceedings of Supercomputing '94
, 1994
"... Distributed shared memory (DSM) systems provide an illusion of shared memory on distributed memory systems such as workstation networks and some parallel computers such as the Cray T3D and Convex SPP-1. This illusion is provided either by enhancements to hardware, software, or a combination thereof. ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
Distributed shared memory (DSM) systems provide an illusion of shared memory on distributed memory systems such as workstation networks and some parallel computers such as the Cray T3D and Convex SPP-1. This illusion is provided either by enhancements to hardware, software, or a combination thereof. On these systems, users can write programs using a shared memory style of programming instead of message passing which is tedious and error prone. Our experience with one such system, TreadMarks, has shown that a large class of applications do not perform well on these systems. TreadMarks is a software distributed shared memory system designed by Rice University researchers to run on networks of workstations and massively parallel computers. Due to the distributed nature of the memory system, shared memory synchronization primitives such as locks and barriers often cause significant amounts of communication. We have provided a set of powerful primitives that will alleviate the problems with...

