Results 1  10
of
17
Implementation of a Portable Nested DataParallel Language
 Journal of Parallel and Distributed Computing
, 1994
"... This paper gives an overview of the implementation of Nesl, a portable nested dataparallel language. This language and its implementation are the first to fully support nested data structures as well as nested dataparallel function calls. These features allow the concise description of parallel alg ..."
Abstract

Cited by 177 (26 self)
 Add to MetaCart
This paper gives an overview of the implementation of Nesl, a portable nested dataparallel language. This language and its implementation are the first to fully support nested data structures as well as nested dataparallel function calls. These features allow the concise description of parallel algorithms on irregular data, such as sparse matrices and graphs. In addition, they maintain the advantages of dataparallel languages: a simple programming model and portability. The current Nesl implementation is based on an intermediate language called Vcode and a library of vector routines called Cvl. It runs on the Connection Machine CM2, the Cray YMP C90, and serial machines. We compare initial benchmark results of Nesl with those of machinespecific code on these machines for three algorithms: leastsquares linefitting, median finding, and a sparsematrix vector product. These results show that Nesl's performance is competitive with that of machinespecific codes for regular dense da...
Nesl: A Nested DataParallel Language
, 1990
"... This report describes Nesl, a stronglytyped, applicative, dataparallel language. Nesl is intended to be used as a portable interface for programming a variety of parallel and vector supercomputers, and as a basis for teaching parallel algorithms. Parallelism is supplied through a simple set of dat ..."
Abstract

Cited by 134 (4 self)
 Add to MetaCart
This report describes Nesl, a stronglytyped, applicative, dataparallel language. Nesl is intended to be used as a portable interface for programming a variety of parallel and vector supercomputers, and as a basis for teaching parallel algorithms. Parallelism is supplied through a simple set of dataparallel constructs based on vectors, including a mechanism for applying any function over the elements of a vector in parallel, and a broad set of parallel functions that manipulate vectors. Nesl fully supports nested vectors and nested parallelismthe ability to take a parallel function and then apply it over multiple instances in parallel. Nested parallelism is important for implementing algorithms with complex and dynamically changing data structures, such as required in many graph or sparse matrix algorithms. Nesl also provides a mechanism for calculating the asymptotic running time for a program on various parallel machine models, including the parallel random access machine (PRAM...
NESL: A nested dataparallel language (version 2.6
, 1993
"... The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Wright Laboratory or the U. S. Government. Keywords: Dataparallel, parallel algorithms, supe ..."
Abstract

Cited by 95 (7 self)
 Add to MetaCart
The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Wright Laboratory or the U. S. Government. Keywords: Dataparallel, parallel algorithms, supercomputers, nested parallelism, This report describes Nesl, a stronglytyped, applicative, dataparallel language. Nesl is intended to be used as a portable interface for programming a variety of parallel and vector computers, and as a basis for teaching parallel algorithms. Parallelism is supplied through a simple set of dataparallel constructs based on sequences, including a mechanism for applying any function over the elements of a sequence in parallel and a rich set of parallel functions that manipulate sequences. Nesl fully supports nested sequences and nested parallelismâ€”the ability to take a parallel function and apply it over multiple instances in parallel. Nested parallelism is important for implementing algorithms with irregular nested loops (where the inner loop lengths depend on the outer iteration) and for divideandconquer algorithms. Nesl also provides a performance model for calculating the asymptotic performance of a program on
Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors
, 1993
"... In this paper we present a new technique for sparse matrix multiplication on vector multiprocessors based on the efficient implementation of a segmented sum operation. We describe how the segmented sum can be implemented on vector multiprocessors such that it both fully vectorizes within each proces ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
In this paper we present a new technique for sparse matrix multiplication on vector multiprocessors based on the efficient implementation of a segmented sum operation. We describe how the segmented sum can be implemented on vector multiprocessors such that it both fully vectorizes within each processor and parallelizes across processors. Because of our method's insensitivity to relative row size, it is better suited than the Ellpack/Itpack or the Jagged Diagonal algorithms for matrices which have a varying number of nonzero elements in each row. Furthermore, our approach requires less preprocessing (no more time than a single sparse matrixvector multiplication), less auxiliary storage, and uses a more convenient data representation (an augmented form of the standard compressed sparse row format). We have implemented our algorithm (SEGMV) on the Cray YMP C90, and have compared its performance with other methods on a variety of sparse matrices from the HarwellBoeing collection and in...
The Design and Implementation of a RegionBased Parallel Language
 UNIVERSITY OF WASHINGTON
, 2001
"... This dissertation describes the design and implementation of ZPL. ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
This dissertation describes the design and implementation of ZPL.
Compiling Nested DataParallel Programs for SharedMemory Multiprocessors
 ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS
, 1993
"... While data parallelism is wellsuited from algorithmic, architectural, and linguistic considerations to serve as a basis for portable parallel programming, its characteristic finegrained parallelism makes the efficient implementation of dataparallel languages on MIMD machines a challenging task. ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
While data parallelism is wellsuited from algorithmic, architectural, and linguistic considerations to serve as a basis for portable parallel programming, its characteristic finegrained parallelism makes the efficient implementation of dataparallel languages on MIMD machines a challenging task. The design, implementation, and evaluation of an optimizing compiler are presented for an applicative nested dataparallel language called VCODE targeted at the Encore Multimax, a sharedmemory multiprocessor. The source language supports nested aggregate data types; aggregate operations including elementwise forms, scans, reductions, and permutations; and conditionals and recursion for control flow. A small set of graphtheoretic compiletime optimizations reduce the overheads on MIMDmachines in several ways: by increasing the grain size of the output program, by reducing synchronization and storage requirements, and by improving locality of reference. The two key ideas behind these...
Efficient parallel algorithms for closest point problems
, 1994
"... This dissertation develops and studies fast algorithms for solving closest point problems. Algorithms for such problems have applications in many areas including statistical classification, crystallography, data compression, and finite element analysis. In addition to a comprehensive empirical study ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
This dissertation develops and studies fast algorithms for solving closest point problems. Algorithms for such problems have applications in many areas including statistical classification, crystallography, data compression, and finite element analysis. In addition to a comprehensive empirical study of known sequential methods, I introduce new parallel algorithms for these problems that are both efficient and practical. I present a simple and flexible programming model for designing and analyzing parallel algorithms. Also, I describe fast parallel algorithms for nearestneighbor searching and constructing Voronoi diagrams. Finally, I demonstrate that my algorithms actually obtain good performance on a wide variety of machine architectures. The key algorithmic ideas that I examine are exploiting spatial locality, and random sampling. Spatial decomposition provides allows many concurrent threads to work independently of one another in local areas of a shared data structure. Random sampling provides a simple way to adaptively decompose irregular problems, and to balance workload among many threads. Used together, these techniques result in effective algorithms for a wide range of geometric problems. The key
An Efficient Implementation of Nested Data Parallelism for Irregular DivideandConquer Algorithms
 FIRST INTERNATIONAL WORKSHOP ON HIGHLEVEL PROGRAMMING MODELS AND SUPPORTIVE ENVIRONMENTS, APRIL 1996.
, 1996
"... This paper presents work in progress on a new method of implementing irregular divideandconquer algorithms in a nested dataparallel language model on distributedmemory multiprocessors. The main features discussed are the recursive subdivision of asynchronous processor groups to match the change f ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
This paper presents work in progress on a new method of implementing irregular divideandconquer algorithms in a nested dataparallel language model on distributedmemory multiprocessors. The main features discussed are the recursive subdivision of asynchronous processor groups to match the change from dataparallel to controlparallel behavior over the lifetime of an algorithm, switching from parallel code to serial code when the group size is one (with the opportunityto use a more efficient serial algorithm) , and a simple managerbased runtime loadbalancing system. Sample algorithms translated from the highlevel nested dataparallel language NESL into C and MPI using this method are significantly faster than the current NESL system, and show the potential for further speedup.
An objectoriented approach to nested data parallelism
 IN PROCEEDINGS OF THE FIFTH SYMPOSIUM ON THE FRONTIERS OF MASSIVELY PARALLEL COMPUTATION
, 1995
"... This paper describes an implementation technique for integrating nested data parallelism into an objectoriented language. Dataparallel programming employs sets of data called "collections " and expresses parallelism as operations performed over the elements of a collection. When the elem ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
This paper describes an implementation technique for integrating nested data parallelism into an objectoriented language. Dataparallel programming employs sets of data called "collections " and expresses parallelism as operations performed over the elements of a collection. When the elements of a collection are also collections, then there is the possibility for "nested data parallelism. " Few current programming languages support nested data parallelism however. /n an objectoriented framework, a collection is a single object. Its type defines the parallel operations that may be applied to it. Our goal is to design and build an objectoriented dataparallel programming environment supporting nested data parallelism. Our initial approach is built upon three fundamental additions to C++. We add new parallel base types by implementing them as classes, and add a new parallel collection type called a "vector " that is implemented as a template. Only one new language feature is introduced: the foreach construct, which is the basis for exploiting elementwise parallelism over collections. The strength of the method lies in the compilation strategy, which translates nested dataparallel C++ into ordinary C++. Extracting the potential parallelism in nested foreach constructs is called "flattening " nested parallelism. We show how to flatten foreach constructs using a simple program transformation. Our prototype system produces
Practical Parallel DivideandConquer Algorithms
, 1997
"... Nested data parallelism has been shown to be an important feature of parallel languages, allowing the concise expression of algorithms that operate on irregular data structures such as graphs and sparse matrices. However, previous nested dataparallel languages have relied on a vector PRAM impleme ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Nested data parallelism has been shown to be an important feature of parallel languages, allowing the concise expression of algorithms that operate on irregular data structures such as graphs and sparse matrices. However, previous nested dataparallel languages have relied on a vector PRAM implementation layer that cannot be efficiently mapped to MPPs with high interprocessor latency. This thesis shows that by restricting the problem set to that of dataparallel divideandconquer algorithms I can maintain the expressibility of full nested dataparallel languages while achieving good efficiency on current distributedmemory machines. Specifically, I define