Results 1 -
9 of
9
The Implementation and Evaluation of Fusion and Contraction in Array Languages
, 1998
"... Array languages such as Fortran 90, HPF and ZPL have many benefits in simplifying array-based computations and expressing data parallelism. However, they can suffer large performance penalties because they introduce intermediate arrays---both at the source level and during the compilation process--- ..."
Abstract
-
Cited by 38 (9 self)
- Add to MetaCart
Array languages such as Fortran 90, HPF and ZPL have many benefits in simplifying array-based computations and expressing data parallelism. However, they can suffer large performance penalties because they introduce intermediate arrays---both at the source level and during the compilation process---which increase memory usage and pollute the cache. Most compilers address this problem by simply scalarizing the array language and relying on a scalar language compiler to perform loop fusion and array contraction. We instead show that there are advantages to performing a form of loop fusion and array contraction at the array level. This paper describes this approach and explains its advantages. Experimental results show that our scheme typically yields runtime improvements of greater than 20% and sometimes up to 400%. In addition, it yields superior memory use when compared against commercial compilers and exhibits comparable memory use when compared with scalar languages. We also explore ...
ZPL: A Machine Independent Programming Language for Parallel Computers
- IEEE Transactions on Software Engineering
, 2000
"... The goal of producing architecture-independent parallel programs is complicated by the competing need for high performance. The ZPL programming language achieves both goals by building upon an abstract parallel machine and by providing programming constructs that allow the programmer to "see" thi ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
The goal of producing architecture-independent parallel programs is complicated by the competing need for high performance. The ZPL programming language achieves both goals by building upon an abstract parallel machine and by providing programming constructs that allow the programmer to "see" this underlying machine. This paper describes ZPL and provides a comprehensive evaluation of the language with respect to its goals of performance, portability, and programming convenience. In particular, we describe ZPL's machine-independent performance model, describe the programming benefits of ZPL's region-based constructs, summarize the compilation benefits of the language's high-level semantics, and summarize empirical evidence that ZPL has achieved both high performance and portability on diverse machines such as the IBM SP-2, Cray T3E, and SGI Power Challenge. Index Terms: portable, efficient, parallel programming language. This research was supported by DARPA Grant F30602-97-1-0152, a grant of HPC time from the Arctic Region Supercomputing Center, NSF Grant CCR--9707056, and ONR grant N00014-99-1-0402. 1 1
The Design and Implementation of a Region-Based Parallel Language
, 2001
"... This is to certify that I have examined this copy of a doctoral dissertation by ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
This is to certify that I have examined this copy of a doctoral dissertation by
High-Level Programming Language Abstractions for Advanced and Dynamic Parallel Computations
- DISSERTATION
, 2005
"... This thesis presents a combination of p-independent and p-dependent extensions to ZPL. ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
This thesis presents a combination of p-independent and p-dependent extensions to ZPL.
A Compiler Abstraction for Machine Independent Parallel Communication Generation
- In Tenth International Workshop on Languages and Compilers for Parallel Computing
, 1998
"... . In this paper, we consider the problem of generating efficient, portable communication in compilers for parallel languages. We introduce the Ironman abstraction, which separates data transfer from its implementing communication paradigm. This is done by annotating the compiler-generated code w ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
. In this paper, we consider the problem of generating efficient, portable communication in compilers for parallel languages. We introduce the Ironman abstraction, which separates data transfer from its implementing communication paradigm. This is done by annotating the compiler-generated code with legal ranges for data transfer in the form of calls to the Ironman library. On each target platform, these library calls are instantiated to perform the transfer using the machine's optimal communication paradigm. We confirm arguments against generating message passing calls in the compiler based on our experiences using PVM and MPI --- specifically, the observation that these interfaces do not perform well on machines that are not built with a message passing communication paradigm. The overhead for using Ironman, as opposed to a machine-specific back end, is demonstrated to be negligible. We give performance results for a number of benchmarks running with PVM, MPI, and machin...
Relative Debugging for Data-Parallel Programs: A ZPL Case Study
, 2000
"... this article, we describe Guard99, a new implementation of relative debugging for parallel platforms that lets a programmer locate errors by observing the divergence in key data structures as two programs are simultaneously executed. The implementation uses a client--server, machine-independent arch ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
this article, we describe Guard99, a new implementation of relative debugging for parallel platforms that lets a programmer locate errors by observing the divergence in key data structures as two programs are simultaneously executed. The implementation uses a client--server, machine-independent architecture, so it can debug
Achieving Robust Performance in Parallel Programming Languages
, 2001
"... Despite more than two decades of research effort, the question remains: how can we realize the potential of large-scale parallel machines? It can be done now, but only at great expense (i.e., development time and effort) and with limited portability, rendering the exploitation of parallelism impra ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Despite more than two decades of research effort, the question remains: how can we realize the potential of large-scale parallel machines? It can be done now, but only at great expense (i.e., development time and effort) and with limited portability, rendering the exploitation of parallelism impractical for most users. Advanced-ZPL (A--ZPL) is a parallel programming language intended to address this problem. It's design was guided by a predictive performance model that clearly defines the role of the programmer and the compiler, called the programmer-compiler separation. The former is responsible for abstract parallel and sequential algorithmic issues, while the latter manages the tractable elements of mapping abstract representations to a particular machine. This dissertation evaluates the design and implementation of A--ZPL in the light of this design criteria. Specifically, we examine two aspects of the language and the compiler implications: efficient loop generation and pipelining wavefront computations. We find the language is highly effective both relatively and absolutely as a direct consequence of considering the programmer-compiler separation.
Quantifying the Effects of Communication Optimizations
- In Proceedings of the International Conference on Parallel Processing
, 1997
"... Using a specially constructed machine independent communication optimizer that allows control over optimization selection, we quantify the performance benefit of three well known communication optimizations: redundant communication removal, communication combination, and communication pipelining. Th ..."
Abstract
- Add to MetaCart
Using a specially constructed machine independent communication optimizer that allows control over optimization selection, we quantify the performance benefit of three well known communication optimizations: redundant communication removal, communication combination, and communication pipelining. The numbers are shown relative to the base performance of benchmark programs using the standard communication optimization of message vectorization. The effects on the number of calls to communication routines, both static and dynamic, are tabulated. We consider a variety of communication primitives including those found in Intel's NX library, PVM and the T3D's SHMEM library. The results show substantial improvement, with two combinations of optimizations being most effective. 1 Introduction In this paper, we quantify the effectiveness of three well-known communication optimizations: redundant communication removal, communication combination, and communication pipelining. In particular, each...

