Results 1 -
5 of
5
Compiling Nested Data-Parallel Programs for Shared-Memory Multiprocessors
- ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS
, 1993
"... While data parallelism is well-suited from algorithmic, architectural, and linguistic considerations to serve as a basis for portable parallel programming, its characteristic fine-grained parallelism makes the efficient implementation of data-parallel languages on MIMD machines a challenging task. ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
While data parallelism is well-suited from algorithmic, architectural, and linguistic considerations to serve as a basis for portable parallel programming, its characteristic fine-grained parallelism makes the efficient implementation of data-parallel languages on MIMD machines a challenging task. The design, implementation, and evaluation of an optimizing compiler are presented for an applicative nested data-parallel language called VCODE targeted at the Encore Multimax, a shared-memory multiprocessor. The source language supports nested aggregate data types; aggregate operations including elementwise forms, scans, reductions, and permutations; and conditionals and recursion for control flow. A small set of graph-theoretic compile-time optimizations reduce the overheads on MIMDmachines in several ways: by increasing the grain size of the output program, by reducing synchronization and storage requirements, and by improving locality of reference. The two key ideas behind these...
Size and Access Inference for Data-Parallel Programs
- IN ACM SIGPLAN '91 CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 1991
"... Data-parallel programming languages have many desirable features, such as single-thread semantics and the ability to express fine-grained parallelism. However, it is challenging to implement such languages efficiently on conventional MIMD multiprocessors, because these machines incur a high overhead ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Data-parallel programming languages have many desirable features, such as single-thread semantics and the ability to express fine-grained parallelism. However, it is challenging to implement such languages efficiently on conventional MIMD multiprocessors, because these machines incur a high overhead for small grain sizes. This paper presents compile-time analysis techniques for data-parallel program graphs that reduce these overheads in two ways: by stepping up the grain size, and by relaxing the synchronous nature of the computation without altering the program semantics. The algorithms partition the program graph into clusters of nodes such that all nodes in a cluster have the same loop structure, and further refine these clusters into epochs based on generation and consumption patterns of data vectors. This converts the fine-grain parallelism in the original program to medium-grain loop parallelism, which is better suited to MIMD machines. A compiler has been implemented based on th...
A case study: Effects of WITH-loop-folding on the NAS Benchmark MG in SAC
- Proceedings of IFL `98, LNCS 1595
, 1999
"... Sac is a functional C variant with efficient support for high-level array operations. This paper investigates the applicability of a Sac specific optimization technique called with-loop-folding to real world applications. As an example program which originates from the Numerical Aerodynamic Simula ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
Sac is a functional C variant with efficient support for high-level array operations. This paper investigates the applicability of a Sac specific optimization technique called with-loop-folding to real world applications. As an example program which originates from the Numerical Aerodynamic Simulation (NAS) Program developed at NASA Ames Research Center, the so-called NAS benchmark MG is chosen. It comprises a kernel from the NAS Program which implements 3-dimensional multigrid relaxation. Several run-time measurements exploit two different benefits of with-loop-folding: First, an overall speed-up of about 20 % can be observed. Second, a comparison between the run-times of a hand-optimized specification and of Apl-like specifications yields identical run-times, although a naive compilation that does not apply with-loop-folding leads to slowdowns of more than an order of magnitude. Furthermore, With-loop-folding makes a slight variation of the algorithm feasible which substantially simplifies the program specification and requires less memory during execution. Finally, the optimized run-times are compared against run-times gained from the original Fortran program, which shows that for different problem sizes, the code generated from the Sac program does not only reach the execution times of the code generated from the Fortran program but even outperforms them by about 10%.
Data Parallel Programming: A Survey and a Proposal for a New Model
, 1993
"... We give a brief description of what we consider to be data parallel programming and processing, trying to pinpoint the typical problems and pitfalls that occur. We then proceed with a short annotated history of data parallel programming, and sketch a taxonomy in which data parallel languages can be ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We give a brief description of what we consider to be data parallel programming and processing, trying to pinpoint the typical problems and pitfalls that occur. We then proceed with a short annotated history of data parallel programming, and sketch a taxonomy in which data parallel languages can be classified. Finally we present our own model of data parallel programming, which is based on the view of parallel data collections as functions. We believe that this model has a number of distinct advantages, such as being abstract, independent of implicitly assumed machine models, and general.
A New Approach to Vector Code Generation for Applicative Languages
, 1994
"... In an earlier paper we developed an intermediate representation for languages based on composition, and showed how the representation could facilitate generating code for functional languages, such as FP. In this paper we follow the same philosophical approach, using instead the applicative language ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In an earlier paper we developed an intermediate representation for languages based on composition, and showed how the representation could facilitate generating code for functional languages, such as FP. In this paper we follow the same philosophical approach, using instead the applicative language APL. Further, we show how this intermediate representation simplifies the task of generating code for highly parallel machines, such as the Connection Machine. keywords: code generation, vector machines, applicative and functional languages 1 Introduction In an earlier paper [Bud88b] we introduced an intermediate representation specifically designed for use with languages based on composition. Examples of such languages include functional languages, such as FP [Bac78], and applicative languages, such as APL [Ive80]. We then argued that it was not only easy to convert from such languages into this representation, but also easy to generate actual machine code from the representation. The ear...

