Results 1  10
of
38
Implementation of a Portable Nested DataParallel Language
 Journal of Parallel and Distributed Computing
, 1994
"... This paper gives an overview of the implementation of Nesl, a portable nested dataparallel language. This language and its implementation are the first to fully support nested data structures as well as nested dataparallel function calls. These features allow the concise description of parallel alg ..."
Abstract

Cited by 178 (27 self)
 Add to MetaCart
This paper gives an overview of the implementation of Nesl, a portable nested dataparallel language. This language and its implementation are the first to fully support nested data structures as well as nested dataparallel function calls. These features allow the concise description of parallel algorithms on irregular data, such as sparse matrices and graphs. In addition, they maintain the advantages of dataparallel languages: a simple programming model and portability. The current Nesl implementation is based on an intermediate language called Vcode and a library of vector routines called Cvl. It runs on the Connection Machine CM2, the Cray YMP C90, and serial machines. We compare initial benchmark results of Nesl with those of machinespecific code on these machines for three algorithms: leastsquares linefitting, median finding, and a sparsematrix vector product. These results show that Nesl's performance is competitive with that of machinespecific codes for regular dense da...
Transforming HighLevel DataParallel Programs into Vector Operations
 Proceedings Principles and Practices of Parallel Programming 93, ACM
, 1993
"... Fullyparallel execution of a highlevel dataparallel language based on nested sequences, higher order functions and generalized iterators can be realized in the vector model using a suitable representation of nested sequences and a small set of transformational rules to distribute iterators throug ..."
Abstract

Cited by 53 (21 self)
 Add to MetaCart
Fullyparallel execution of a highlevel dataparallel language based on nested sequences, higher order functions and generalized iterators can be realized in the vector model using a suitable representation of nested sequences and a small set of transformational rules to distribute iterators through the constructs of the language. 1.
Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors
, 1993
"... In this paper we present a new technique for sparse matrix multiplication on vector multiprocessors based on the efficient implementation of a segmented sum operation. We describe how the segmented sum can be implemented on vector multiprocessors such that it both fully vectorizes within each proces ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
In this paper we present a new technique for sparse matrix multiplication on vector multiprocessors based on the efficient implementation of a segmented sum operation. We describe how the segmented sum can be implemented on vector multiprocessors such that it both fully vectorizes within each processor and parallelizes across processors. Because of our method's insensitivity to relative row size, it is better suited than the Ellpack/Itpack or the Jagged Diagonal algorithms for matrices which have a varying number of nonzero elements in each row. Furthermore, our approach requires less preprocessing (no more time than a single sparse matrixvector multiplication), less auxiliary storage, and uses a more convenient data representation (an augmented form of the standard compressed sparse row format). We have implemented our algorithm (SEGMV) on the Cray YMP C90, and have compared its performance with other methods on a variety of sparse matrices from the HarwellBoeing collection and in...
WorkEfficient Nested DataParallelism
 IN PROCEEDINGS OF THE FIFTH SYMPOSIUM ON THE FRONTIERS OF MASSIVELY PARALLEL PROCESSING (FRONTIERS 95). IEEE
, 1995
"... An applytoall construct is the key mechanism for expressing dataparallelism, but dataparallel programming languages like HPF and C* significantly restrict which operations can appear in the construct. Allowing arbitrary operations substantially simplifies the expression of irregular and nested d ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
An applytoall construct is the key mechanism for expressing dataparallelism, but dataparallel programming languages like HPF and C* significantly restrict which operations can appear in the construct. Allowing arbitrary operations substantially simplifies the expression of irregular and nested dataparallel computations. The technique of flattening nested parallelism introduced by Blelloch, compiles dataparallel programs with unrestricted applytoall constructs into vector operations, and has achieved notable success, particularly with irregular dataparallel programs. However, these programs must be carefully constructed so that flattening them does not lead to suboptimal work complexity due to unnecessary replication in index operations. We present new flattening transformations that generate programs with correct work complexity. Because these transformations may introduce concurrent reads in parallel indexing, we developed a randomized indexing that reduces concurrent reads w...
Java as an Intermediate Language
, 1996
"... We present our experiences in using Java as an intermediate language for the highlevel programming language Nesl. First, we ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
We present our experiences in using Java as an intermediate language for the highlevel programming language Nesl. First, we
On the Distributed Implementation of Aggregate Data Structures by Program Transformation
 In Proceedings of the 4th IPPS/SDP International Workshop on HighLevel Parallel Programming Models and Supportive Environments,IPPS/SDP99
, 1999
"... . A critical component of many dataparallel programming languages are operations that manipulate aggregate data structures as a wholethis includes Fortran 90, Nesl, and languages based on BMF. These operations are commonly implemented by a library whose routines operate on a distributed represen ..."
Abstract

Cited by 14 (7 self)
 Add to MetaCart
. A critical component of many dataparallel programming languages are operations that manipulate aggregate data structures as a wholethis includes Fortran 90, Nesl, and languages based on BMF. These operations are commonly implemented by a library whose routines operate on a distributed representation of the aggregate structure; the compiler merely generates the control code invoking the library routines and all machinedependent code is encapsulated in the library. While this approach is convenient, we argue that by breaking the abstraction enforced by the library and by presenting some of internals in the form of a new intermediate language to the compiler backend, we can optimize on all levels of the memory hierarchy and achieve more flexible data distribution. The new intermediate language allows us to present these optimisations elegantly as program transformations. We report on first results obtained by our approach in the implementation of nested data parallelis...
A portable MPIbased parallel vector template library
, 1995
"... This paper discusses the design and implementation of a polymorphic collection library for distributed addressspace parallel computers. The library provides a dataparallel programming model for C++ by providing three main components: a single generic collection class, generic algorithms over colle ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
This paper discusses the design and implementation of a polymorphic collection library for distributed addressspace parallel computers. The library provides a dataparallel programming model for C++ by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the builtin types of C or of userdefined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed addressspace memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for ...
Specification and development of parallel algorithms with the Proteus system
 In Specification of Parallel Algorithms
, 1994
"... Abstract. The Proteus language is a widespectrum parallel programming notation that supports the expression of both highlevel architectureindependent speci cations and lowerlevel architecturespeci c implementations. A methodology based on successive re nement and interactive experimentation supp ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
Abstract. The Proteus language is a widespectrum parallel programming notation that supports the expression of both highlevel architectureindependent speci cations and lowerlevel architecturespeci c implementations. A methodology based on successive re nement and interactive experimentation supports the development of parallel algorithms from speci cation to various e cient architecturedependent implementations. The Proteus system combines the language and tools supporting this methodology. This paper presents a brief overview of the Proteus system and describes its use in the exploration and development of several nontrivial algorithms, including the fast multipole algorithm for Nbody computations. 1.
An Efficient Implementation of Nested Data Parallelism for Irregular DivideandConquer Algorithms
 FIRST INTERNATIONAL WORKSHOP ON HIGHLEVEL PROGRAMMING MODELS AND SUPPORTIVE ENVIRONMENTS, APRIL 1996.
, 1996
"... This paper presents work in progress on a new method of implementing irregular divideandconquer algorithms in a nested dataparallel language model on distributedmemory multiprocessors. The main features discussed are the recursive subdivision of asynchronous processor groups to match the change f ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
This paper presents work in progress on a new method of implementing irregular divideandconquer algorithms in a nested dataparallel language model on distributedmemory multiprocessors. The main features discussed are the recursive subdivision of asynchronous processor groups to match the change from dataparallel to controlparallel behavior over the lifetime of an algorithm, switching from parallel code to serial code when the group size is one (with the opportunityto use a more efficient serial algorithm) , and a simple managerbased runtime loadbalancing system. Sample algorithms translated from the highlevel nested dataparallel language NESL into C and MPI using this method are significantly faster than the current NESL system, and show the potential for further speedup.