Results 1 -
6 of
6
Work-Efficient Nested Data-Parallelism
- IN PROCEEDINGS OF THE FIFTH SYMPOSIUM ON THE FRONTIERS OF MASSIVELY PARALLEL PROCESSING (FRONTIERS 95). IEEE
, 1995
"... An apply-to-all construct is the key mechanism for expressing data-parallelism, but data-parallel programming languages like HPF and C* significantly restrict which operations can appear in the construct. Allowing arbitrary operations substantially simplifies the expression of irregular and nested d ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
An apply-to-all construct is the key mechanism for expressing data-parallelism, but data-parallel programming languages like HPF and C* significantly restrict which operations can appear in the construct. Allowing arbitrary operations substantially simplifies the expression of irregular and nested data-parallel computations. The technique of flattening nested parallelism introduced by Blelloch, compiles data-parallel programs with unrestricted apply-to-all constructs into vector operations, and has achieved notable success, particularly with irregular data-parallel programs. However, these programs must be carefully constructed so that flattening them does not lead to suboptimal work complexity due to unnecessary replication in index operations. We present new flattening transformations that generate programs with correct work complexity. Because these transformations may introduce concurrent reads in parallel indexing, we developed a randomized indexing that reduces concurrent reads w...
Piecewise Execution of Nested Data-Parallel Programs
- Languages and Compilers for Parallel Computing, volume 1033 of Lecture Notes in Computer Science
, 1995
"... The technique of flattening nested data parallelism combines all the independent operations in nested apply-to-all constructs and generates large amounts of potential parallelism for both regular and irregular expressions. However, the resulting data-parallel programs can have enormous memory req ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
The technique of flattening nested data parallelism combines all the independent operations in nested apply-to-all constructs and generates large amounts of potential parallelism for both regular and irregular expressions. However, the resulting data-parallel programs can have enormous memory requirements, limiting their utility. In this paper, we presentpiecewise execution, an automatic method of partially sefializing data-parallel programs so that they achieve maximum parallelism within storage limitations. By computing large intermediate sequences in pieces, our approach requires asymptotically less memory to perform the same amount of work. By using characteristics of the underlying parallel architecture to drive the computation size, we retain effective use of a parallel machine at each step. This dramatically expands the class of nested data-parallel programs that can be executed using the flattening technique. With the addition of piecewise I/O operations, these techniques can be applied to generate out-of-core execution on large datasets.
Specification and development of parallel algorithms with the Proteus system
- In Specification of Parallel Algorithms
, 1994
"... Abstract. The Proteus language is a wide-spectrum parallel programming notation that supports the expression of both high-level architectureindependent speci cations and lower-level architecture-speci c implementations. A methodology based on successive re nement and interactive experimentation supp ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Abstract. The Proteus language is a wide-spectrum parallel programming notation that supports the expression of both high-level architectureindependent speci cations and lower-level architecture-speci c implementations. A methodology based on successive re nement and interactive experimentation supports the development of parallel algorithms from speci cation to various e cient architecture-dependent implementations. The Proteus system combines the language and tools supporting this methodology. This paper presents a brief overview of the Proteus system and describes its use in the exploration and development of several non-trivial algorithms, including the fast multipole algorithm for N-body computations. 1.
Provably Correct Vectorization of Nested-Parallel Programs
, 1996
"... The work/step framework provides a high-level cost model for nested data-parallel programming languages, allowing programmers to understand the e#ciency of their codes without concern for the eventual mapping of tasks to processors. Vectorization, or flattening, is the key technique for compiling ne ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
The work/step framework provides a high-level cost model for nested data-parallel programming languages, allowing programmers to understand the e#ciency of their codes without concern for the eventual mapping of tasks to processors. Vectorization, or flattening, is the key technique for compiling nested-parallel languages. This paper presents a formal study of vectorization, considering three low-level targets: the erew, bounded-contention crew, and crew variants of the vram. For each, we describe a variant of the cost model and prove the correctness of vectorization for that model. The models impose di#erent constraints on the set of programs and implementations that can be considered; we discuss these in detail. 1 Introduction Many complexity models (or cost models) have been proposed for parallel programs. High-level models such as Blelloch's step/work metrics for nesl [3, 2] and Skillicorn's calculus for bmf [13] are based on a rich, highly-parallel expression language with compo...
The Proteus System for the Development of Parallel Applications
, 1994
"... Target Language In our methodology we have identified a small set of specifications that comprise the abstract target language (ATL) of the refinement system. These are specifications of types such as arrays, lists, tuples, integers, characters, etc., that commonly appear in programming languages. ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Target Language In our methodology we have identified a small set of specifications that comprise the abstract target language (ATL) of the refinement system. These are specifications of types such as arrays, lists, tuples, integers, characters, etc., that commonly appear in programming languages. The refinement expresses a system as a definitional extension the ATL specs. Thus by associating a model---a concrete type in a specific programming language---with each ATL specification the complete system specification is compiled. 5.2.3 Proteus to DPL Translation The translation of Proteus to DPL consists of a series of major steps: 1. Expansion of iterator expressions into image and filter expressions. 2. Conversion to data-parallel form. 3. An interpretation of sequences into the nested sequence vocabulary of DPL. 4. Addition of storage management code. 5. Conversion into C. Source Mediating Target CORE-SEQ SEQ-AS-ARRAY ARRAY SEQ Component 1 System Component 2 CORE-SEQ SEQ-AS-ARRAY ...
Practical Issues in the Flattening of Nested Parallelism with
"... ) Rickard E. Faith Daniel W. Palmer Jan F. Prins Lars S. Nyland March 28, 1995 Abstract The "flattening" of nested data-parallelism reduces a very broad class of data-parallel expressions to parallel vector operations. The technique can be understood as the application of a small set of program ..."
Abstract
- Add to MetaCart
) Rickard E. Faith Daniel W. Palmer Jan F. Prins Lars S. Nyland March 28, 1995 Abstract The "flattening" of nested data-parallelism reduces a very broad class of data-parallel expressions to parallel vector operations. The technique can be understood as the application of a small set of program transformations. In this paper, we explore several practical issues involved in the implementation of these transformations, with an emphasis on the production of optimized vector code. 1 Introduction A notation supports the expression of data parallelism if it includes aggregate values such as arrays or sequences and an apply-to-all construct that applies operations to all elements of an aggregate. However, data-parallel languages such as HPF and C restrict aggregates to rectangular arrays, thereby limiting the ability to specify and execute the irregular and dynamic data-parallelism that is key to the efficient solution of many problems. With the introduction of nested aggregates, a ful...

