Results 1 - 10
of
14
Implementation of a Portable Nested Data-Parallel Language
- Journal of Parallel and Distributed Computing
, 1994
"... This paper gives an overview of the implementation of Nesl, a portable nested data-parallel language. This language and its implementation are the first to fully support nested data structures as well as nested dataparallel function calls. These features allow the concise description of parallel alg ..."
Abstract
-
Cited by 155 (26 self)
- Add to MetaCart
This paper gives an overview of the implementation of Nesl, a portable nested data-parallel language. This language and its implementation are the first to fully support nested data structures as well as nested dataparallel function calls. These features allow the concise description of parallel algorithms on irregular data, such as sparse matrices and graphs. In addition, they maintain the advantages of data-parallel languages: a simple programming model and portability. The current Nesl implementation is based on an intermediate language called Vcode and a library of vector routines called Cvl. It runs on the Connection Machine CM-2, the Cray Y-MP C90, and serial machines. We compare initial benchmark results of Nesl with those of machine-specific code on these machines for three algorithms: least-squares line-fitting, median finding, and a sparse-matrix vector product. These results show that Nesl's performance is competitive with that of machine-specific codes for regular dense da...
A provable time and space efficient implementation of nesl
- In International Conference on Functional Programming
, 1996
"... In this paper we prove time and space bounds for the implementation of the programming language NESL on various parallel machine models. NESL is a sugared typed J-calculus with a set of array primitives and an explicit parallel map over arrays. Our results extend previous work on provable implementa ..."
Abstract
-
Cited by 60 (7 self)
- Add to MetaCart
In this paper we prove time and space bounds for the implementation of the programming language NESL on various parallel machine models. NESL is a sugared typed J-calculus with a set of array primitives and an explicit parallel map over arrays. Our results extend previous work on provable implementation bounds for functional languages by considering space and by including arrays. For modeling the cost of NESL we augment a standard call-by-value operational semantics to return two cost measures: a DAG representing the sequential dependence in the computation, and a measure of the space taken by a sequential implementation. We show that a NESL program with w work (nodes in the DAG), d depth (levels in the DAG), and s sequential space can be implemented on a p processor butterfly network, hypercube, or CRCW PRAM usin O(w/p + d log p) time and 0(s + dp logp) reachable space. For programs with sufficient parallelism these bounds are optimal in that they give linew speedup and use space within a constant factor of the sequential space. 1
VCODE: A Data-Parallel Intermediate Language
- In Proceedings of the 3rd Symposium on the Frontiers of Massively Parallel Computation
, 1990
"... This paper describes VCODE, a data-parallel intermediate language. VCODE is designed to allow easy porting of data-parallel languages, such as C*, PARALATION LISP, and Fortran 8x, to a wide class of parallel machines, and for experimenting with compiling such languages. Our design goal was to define ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
This paper describes VCODE, a data-parallel intermediate language. VCODE is designed to allow easy porting of data-parallel languages, such as C*, PARALATION LISP, and Fortran 8x, to a wide class of parallel machines, and for experimenting with compiling such languages. Our design goal was to define a simple language whose primitives can be implemented efficiently, but is still powerful enough to express the features of existing data-parallel languages. It contains about 50 instructions, most of which manipulate arbitrarily long vectors of atomic values, and includes a set of segmented instructions that are crucial for implementing data-parallel languages that permit nested parallelism, such as PARALATION LISP and CM-Lisp. An initial version of a VCODE interpreter has been implemented on the Thinking Machines Corporation Connection Machine, the CRAY Y-MP, the Encore Multimax and several uniprocessor machines. The paper outlines the VCODE language, discusses many of the design issues, i...
Size and Access Inference for Data-Parallel Programs
- IN ACM SIGPLAN '91 CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 1991
"... Data-parallel programming languages have many desirable features, such as single-thread semantics and the ability to express fine-grained parallelism. However, it is challenging to implement such languages efficiently on conventional MIMD multiprocessors, because these machines incur a high overhead ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Data-parallel programming languages have many desirable features, such as single-thread semantics and the ability to express fine-grained parallelism. However, it is challenging to implement such languages efficiently on conventional MIMD multiprocessors, because these machines incur a high overhead for small grain sizes. This paper presents compile-time analysis techniques for data-parallel program graphs that reduce these overheads in two ways: by stepping up the grain size, and by relaxing the synchronous nature of the computation without altering the program semantics. The algorithms partition the program graph into clusters of nodes such that all nodes in a cluster have the same loop structure, and further refine these clusters into epochs based on generation and consumption patterns of data vectors. This converts the fine-grain parallelism in the original program to medium-grain loop parallelism, which is better suited to MIMD machines. A compiler has been implemented based on th...
Abstractions for Portable, Scalable Parallel Programming
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1998
"... In parallel programming, the need to manage communication, load imbalance, and irregularities in the computation puts substantial demands on the programmer. Key properties of the architecture, such as the number of processors and the cost of communication, must be exploited to achieve good performan ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
In parallel programming, the need to manage communication, load imbalance, and irregularities in the computation puts substantial demands on the programmer. Key properties of the architecture, such as the number of processors and the cost of communication, must be exploited to achieve good performance, but coding these properties directly into a program compromises the portability and flexibility of the code because significant changes are then needed to port or enhance the program. We describe a parallel programming model that supports the concise, independent description of key aspects of a parallel program---including data distribution, communication, and boundary conditions---without reference to machine idiosyncrasies. The independence of such components improves portability by allowing the components of a program to be tuned i...
Parallel Functional Programming by Partitioning
, 1997
"... Caliban is a declarative language which addresses the area of static distributed memory parallel computing. It is an annotation language that allows the pro-grammer to partition a functional program and data amongst the computational resources available. It is integrated into the source language so ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Caliban is a declarative language which addresses the area of static distributed memory parallel computing. It is an annotation language that allows the pro-grammer to partition a functional program and data amongst the computational resources available. It is integrated into the source language so that the full power of the host language can be used to express the partitioning of the program. Partial evaluation is used to determine a complete version of the annotation at compile time. Program transformation is then used to make the parallelism ex-plicit. This thesis describes the Caliban language and its pilot implementation. It then continues by presenting extensions and improvements to the basic language. Implementation techniques for the improved language are discussed in relation to an implementation on the Fujitsu AP1000 distributed memory multiprocessor computer. Two application case studies together with some performance results are presented. Finally, there is a critical appraisal of the language and its ap-proach. Caliban has good support for general data and computation partitioning. It also aids software reuse with its ability to abstract common computational structures into higher order forms which are concretised at compile time by partial evaluation. However, there do remain some open issues relating to evaluation order control. Finally, Caliban can be implemented reasonably e ciently on standard parallel hardware. Acknowledgements First and foremost I would liketothankPaul Kelly,my supervisor, for his encour-agement, support and direction. It was his work that formed the genesis of the work presented here and it is with his supervision that I have journeyed through the world of parallel programming as an undergraduate, research assistant and PhD student. I also thank my second supervisor Susan Eisenbach for her help. Many thanks go to the members of the Advanced Languages and Architectures Section for providing a stimulating and friendly environment to work in. Also
A Calculus for Exploiting Data Parallelism on Recursively Defined Data
- In Proc. International Workshop on Theory and Practice on Parallel Programming, LNCS
, 1994
"... Array based data parallel programming can be generalized in two ways to make it an appropriate paradigm for parallel processing of general recursively defined data. The first is the introduction of a parallel evaluation mechanism for dynamically allocated recursively defined data. It achieves the ef ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Array based data parallel programming can be generalized in two ways to make it an appropriate paradigm for parallel processing of general recursively defined data. The first is the introduction of a parallel evaluation mechanism for dynamically allocated recursively defined data. It achieves the effect of applying the same function to all the subterms of a given datum in parallel. The second is a new notion of recursion, which we call parallel recursion, for parallel evaluation of recursively defined data. In contrast with ordinary recursion, which only uses the final results of the recursive calls of its immediate subterms, the new recursion repeatedly transforms a recursive datum represented by a system of equations to another recursive datum by applying the same function to each of the equation simultaneously, until the final result is obtained. This mechanism exploits more parallelism and achieves significant speedup compared to the conventional parallel evaluation of recursive ...
Reference Manual (Version 1.1)
, 1990
"... This report introduces VCODE, an intermediate language for data-parallel computations. VCODE is designed to allow easy porting of data-parallel languages, such as C*, PARALATION LISP, and Fortran 8x, to a wide class of parallel machines. It is designed with the joint goals of being simple, expressiv ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
This report introduces VCODE, an intermediate language for data-parallel computations. VCODE is designed to allow easy porting of data-parallel languages, such as C*, PARALATION LISP, and Fortran 8x, to a wide class of parallel machines. It is designed with the joint goals of being simple, expressive, and efficiently implementable. It contains about 50 instructions, most of which manipulate arbitrarily long vectors of atomic values, and includes a set of segmented instructions that are crucial for implementing data-parallel languages that permit nested parallelism, such as PARALATION LISP and CM-Lisp. The report outlines the VCODE language, discusses many of the design issues, illustrates how data-parallel languages can be mapped onto it, and describes how it can be implemented on massively parallel machines. A complete definition of VCODE is given in the appendix. This research was sponsored by the Avionics Lab, Wright Research and Development Center, Aeronautical Systems Division (AF...
Relative Debugging for Data-Parallel Programs: A ZPL Case Study
, 2000
"... this article, we describe Guard99, a new implementation of relative debugging for parallel platforms that lets a programmer locate errors by observing the divergence in key data structures as two programs are simultaneously executed. The implementation uses a client--server, machine-independent arch ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
this article, we describe Guard99, a new implementation of relative debugging for parallel platforms that lets a programmer locate errors by observing the divergence in key data structures as two programs are simultaneously executed. The implementation uses a client--server, machine-independent architecture, so it can debug
Data Structures for Parallel Recursion
, 1997
"... vii Chapter 1 Introduction 1 1.1 Synchronous Parallel Programming . . . . . . . . . . . . . . . . . . . 4 1.2 Basic Definitions and Notations . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.2 Operator Priority . . . . . . . ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
vii Chapter 1 Introduction 1 1.1 Synchronous Parallel Programming . . . . . . . . . . . . . . . . . . . 4 1.2 Basic Definitions and Notations . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.2 Operator Priority . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.3 Notation and Proof Style . . . . . . . . . . . . . . . . . . . . 9 1.3 Cost Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.1 Parallel Algorithm Complexity . . . . . . . . . . . . . . . . . 14 1.3.2 Parallel Computation Models . . . . . . . . . . . . . . . . . . 17 Chapter 2 Powerlists 20 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.1 Induction Principle for PowerLists . . . . . . . . . . . . . . . . 25 2.1.2 Data Movement and Permutation Functions . . . . . . . . . . 26 2.2 Hypercubes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3 A Cost Calculus for P...

