Results 1 - 10
of
10
Single Assignment C -- efficient support for high-level array operations in a functional setting
, 2003
"... ..."
ZPL's WYSIWYG performance model
- In Third International Workshop on High-Level Parallel Programming Models and Supportive Environments
, 1998
"... ZPL is an array language designed for high performance scientific and engineering computations. Unlike other parallel languages, ZPL is founded on a machine model (CTA) distinct from any implementing hardware. The machine model, which abstracts contemporary parallel computers, makes it possible to c ..."
Abstract
-
Cited by 23 (21 self)
- Add to MetaCart
ZPL is an array language designed for high performance scientific and engineering computations. Unlike other parallel languages, ZPL is founded on a machine model (CTA) distinct from any implementing hardware. The machine model, which abstracts contemporary parallel computers, makes it possible to correlate ZPL programs with machine behavior. Using this association, programmers can know approximately how code will perform on a typical parallel machine, allowing them to make informed decisions between alternative programming solutions. This paper describes ZPL's syntactic cues to the programmer which convey performance information. The what-you-see-is-what-you-get (WYSIWYG) characteristics of ZPL operations are illustrated on four machines: the Cray T3E, IBM SP-2, SGI Power Challenge and Intel Paragon. Additionally, the WYSIWYG performance model is used to evaluate two algorithms for matrix multiplication, one of which is considered to be the most scalable of portable parallel solutions. Experiments show that the performance model correctly predicts the faster solution on all four platforms for a range of problem sizes. 1
Portable Performance of Data Parallel Languages
- In SC97: High Performance Networking and Computing
, 1997
"... : A portable program executes on different platforms and yields consistent performance. With the focus on portability, this paper presents an in-depth study of the performance of three NAS benchmarks (EP, MG, FT) compiled with three commercial HPF compilers (APR, PGI, IBM) on the IBM SP2. Each benc ..."
Abstract
-
Cited by 19 (11 self)
- Add to MetaCart
: A portable program executes on different platforms and yields consistent performance. With the focus on portability, this paper presents an in-depth study of the performance of three NAS benchmarks (EP, MG, FT) compiled with three commercial HPF compilers (APR, PGI, IBM) on the IBM SP2. Each benchmark is evaluated in two versions: using DO loops and using F90 constructs and/or HPF's Forall statement. Base-line comparison is provided by versions of the benchmarks written in Fortran/MPI and ZPL, a data parallel language developed at the University of Washington. While some F90/Forall programs achieve scalable performance with some compilers, the results indicate a considerable portability problem in HPF programs. Two sources for the problem are identified. First, Fortran's semantics require extensive analysis and optimization to arrive at a parallel program; therefore relying on the compiler's capability alone leads to unpredictable performance. Second, the wide differences in the par...
Factor-Join: A Unique Approach to Compiling Array Languages for Parallel Machines
- IN WORKSHOP ON LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING
, 1996
"... This paper describes a new approach to compiling and optimizing array languages for parallel machines. This approach first decomposes array language operations into factors, where each factor corresponds to a different communication or computation structure. Optimizations are then achieved by co ..."
Abstract
-
Cited by 13 (8 self)
- Add to MetaCart
This paper describes a new approach to compiling and optimizing array languages for parallel machines. This approach first decomposes array language operations into factors, where each factor corresponds to a different communication or computation structure. Optimizations are then achieved by combining, or joining, these factors. Because
A case study: Effects of WITH-loop-folding on the NAS Benchmark MG in SAC
- Proceedings of IFL `98, LNCS 1595
, 1999
"... Sac is a functional C variant with efficient support for high-level array operations. This paper investigates the applicability of a Sac specific optimization technique called with-loop-folding to real world applications. As an example program which originates from the Numerical Aerodynamic Simula ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
Sac is a functional C variant with efficient support for high-level array operations. This paper investigates the applicability of a Sac specific optimization technique called with-loop-folding to real world applications. As an example program which originates from the Numerical Aerodynamic Simulation (NAS) Program developed at NASA Ames Research Center, the so-called NAS benchmark MG is chosen. It comprises a kernel from the NAS Program which implements 3-dimensional multigrid relaxation. Several run-time measurements exploit two different benefits of with-loop-folding: First, an overall speed-up of about 20 % can be observed. Second, a comparison between the run-times of a hand-optimized specification and of Apl-like specifications yields identical run-times, although a naive compilation that does not apply with-loop-folding leads to slowdowns of more than an order of magnitude. Furthermore, With-loop-folding makes a slight variation of the algorithm feasible which substantially simplifies the program specification and requires less memory during execution. Finally, the optimized run-times are compared against run-times gained from the original Fortran program, which shows that for different problem sizes, the code generated from the Sac program does not only reach the execution times of the code generated from the Fortran program but even outperforms them by about 10%.
Abstractions for Portable, Scalable Parallel Programming
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1998
"... In parallel programming, the need to manage communication, load imbalance, and irregularities in the computation puts substantial demands on the programmer. Key properties of the architecture, such as the number of processors and the cost of communication, must be exploited to achieve good performan ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
In parallel programming, the need to manage communication, load imbalance, and irregularities in the computation puts substantial demands on the programmer. Key properties of the architecture, such as the number of processors and the cost of communication, must be exploited to achieve good performance, but coding these properties directly into a program compromises the portability and flexibility of the code because significant changes are then needed to port or enhance the program. We describe a parallel programming model that supports the concise, independent description of key aspects of a parallel program---including data distribution, communication, and boundary conditions---without reference to machine idiosyncrasies. The independence of such components improves portability by allowing the components of a program to be tuned i...
Design of Graph ZPL: Extensions to ZPL to Handle Irregular and Dynamic Data Structures
"... On distributed memory MIMD machines, ZPL is a powerful language for expressing parallel algorithms that can be described with regular parallel arrays. But this is not enough for many data parallel applications that require irregular or dynamic data structures. This paper is a report on the author's ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
On distributed memory MIMD machines, ZPL is a powerful language for expressing parallel algorithms that can be described with regular parallel arrays. But this is not enough for many data parallel applications that require irregular or dynamic data structures. This paper is a report on the author's Graph ZPL project whose goals have been to select and design new language features to be added to ZPL that would enable handling such irregular and dynamic data structures. We first analyzed the successful design decisions made in ZPL and determined a set of design principles to follow in our design. Then we considered several data-parallel applications and determined a set of capabilities that needed to be added to ZPL. We design in detail language extensions to ZPL to provide these capabilities. They include a graph data type, irregular block partitioning for parallel arrays, and operations for dynamic repartitioning of graphs and parallel arrays. We give the reasoning behind our design choices andillustrate our extensions by writing several applications in Graph ZPL. Implementing Graph ZPL seems feasible with the modern level of technology. In comparison with other approaches to providing support for data-parallel computations, Graph ZPL results in cleaner and less error-prone programs which are easier to write and maintain, at the expense of of limiting the class of algorithms that the programmer can express.
ZPL's WYSIWYG
- In Third International Workshop on High-Level Parallel Programming Models and Supportive Environments
, 1997
"... ZPL is an array language designed for high performance scientific and engineering computations. Unlike other parallel languages, ZPL is founded on a machine model (CTA) distinct from any implementing hardware. The machine model, which abstracts contemporary parallel computers, makes it possible to c ..."
Abstract
- Add to MetaCart
ZPL is an array language designed for high performance scientific and engineering computations. Unlike other parallel languages, ZPL is founded on a machine model (CTA) distinct from any implementing hardware. The machine model, which abstracts contemporary parallel computers, makes it possible to correlate ZPL programs with machine behavior. Using this association, programmers can know approximately how code will perform on a typical parallel machine, allowing them to make informed decisions between alternative programming solutions. This paper describes ZPL's syntactic cues to the programmer which convey performance information. The what-you-see-is-what-you-get (WYSIWYG) characteristics of ZPL operations are illustrated on four machines: the Cray T3E, IBM SP-2, SGI Power Challenge and Intel Paragon. Additionally, the WYSIWYG performance model is used to evaluate two algorithms for matrix multiplication, one of which is considered to be the most scalable of portable parallel solutions...
ZPL's WYSIWYG Performance Model
- In Third International Workshop on High-Level Parallel Programming Models and Supportive Environments
, 1998
"... ZPL is a parallel array language designed for high performance scientific and engineering computations. Unlike other parallel languages, ZPL is founded on a machine model (the CTA) that accurately abstracts contemporary MIMD parallel computers. This makes it possible to correlate ZPL programs with m ..."
Abstract
- Add to MetaCart
ZPL is a parallel array language designed for high performance scientific and engineering computations. Unlike other parallel languages, ZPL is founded on a machine model (the CTA) that accurately abstracts contemporary MIMD parallel computers. This makes it possible to correlate ZPL programs with machine behavior. As a result, programmers can reason about how code will perform on a typical parallel machine and thereby make informed decisions between alternative programming solutions. This paper describes ZPL's performance model and its syntactic cues for conveying operation cost. The what-you-seeis -what-you-get (WYSIWYG) nature of ZPL operations is demonstrated on the IBM SP-2, Intel Paragon, SGI Power Challenge, and Cray T3E. Additionally, the model is used to evaluate two algorithms for matrix multiplication. Experiments show that the performance model correctly predicts the faster solution on all four platforms for a range of problem sizes.
Vic*: Running Out-Of-Core Instead Of Running Out Of Core
"... This thesis describes the design and implementation of virtual memory for ViC*, a version of the data-parallel language C*. For programs with parallel data sets that exceed the size of main memory, ViC* provides better performance than demand-paged virtual memory, with less programming e#ort than tr ..."
Abstract
- Add to MetaCart
This thesis describes the design and implementation of virtual memory for ViC*, a version of the data-parallel language C*. For programs with parallel data sets that exceed the size of main memory, ViC* provides better performance than demand-paged virtual memory, with less programming e#ort than traditional out-of-core methods. ViC* extends the C* language with outofcore shapes, which place parallel data on disk. A ViC* compiler translates access to out-of-core data into calls on the ViC* parallel I/O system, transforming programs to improve data-access patterns. The compiler and runtime system identify special cases for which e#cient access algorithms are available. At runtime, the ViC* library manages data layout and I/O to make e#cient use of the disk bandwidth. ViC* programs can operate on larger data sets than typical in-core programs. Tests show that ViC* performance for out-of-core data scales well as the size of data increases. The language-based approach gives the programmer control over data placement in-core or out-of-core, while supplying improved or optimal algorithms for out-of-core access. ViC* demonstrates the benefit of an integrated approach to virtual memory, and provides a basis for ongoing research on out-of-core programming and algorithms. ii Acknowledgements I've been fortunate to be part of a strong community at Dartmouth, one that saw me through what may have seemed an unlikely undertaking. Many of them contributed to this thesis. Foremost among them is my advisor, Tom Cormen, whom I thank for his years of encouragement, advice, and support. Professor Kotz and other faculty provided feedback as well as background. Graduate students James Clippinger, Anna Poplawski, Melissa Hirschl, and others before them, added software and suggestions. T...

