Results 1 - 10
of
43
Implementation of a Portable Nested Data-Parallel Language
- Journal of Parallel and Distributed Computing
, 1994
"... This paper gives an overview of the implementation of Nesl, a portable nested data-parallel language. This language and its implementation are the first to fully support nested data structures as well as nested dataparallel function calls. These features allow the concise description of parallel alg ..."
Abstract
-
Cited by 154 (26 self)
- Add to MetaCart
This paper gives an overview of the implementation of Nesl, a portable nested data-parallel language. This language and its implementation are the first to fully support nested data structures as well as nested dataparallel function calls. These features allow the concise description of parallel algorithms on irregular data, such as sparse matrices and graphs. In addition, they maintain the advantages of data-parallel languages: a simple programming model and portability. The current Nesl implementation is based on an intermediate language called Vcode and a library of vector routines called Cvl. It runs on the Connection Machine CM-2, the Cray Y-MP C90, and serial machines. We compare initial benchmark results of Nesl with those of machine-specific code on these machines for three algorithms: least-squares line-fitting, median finding, and a sparse-matrix vector product. These results show that Nesl's performance is competitive with that of machine-specific codes for regular dense da...
Implicit and Explicit Parallel Programming in Haskell
, 1993
"... Abstract It has often been suggested that functional languages provide an excellent basis for programming parallel computer systems. This is largely a result of the lack of side effects which makes it possible to evaluate the subexpressions of a given term without any risk of interference. On the ot ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
Abstract It has often been suggested that functional languages provide an excellent basis for programming parallel computer systems. This is largely a result of the lack of side effects which makes it possible to evaluate the subexpressions of a given term without any risk of interference. On the other hand, the lack of side-effects has also been seen as a weakness of functional languages since it rules out many features of traditional imperative languages such as state, I/O and exceptions. These ideas can be simulated in a functional language but the resulting programs are sometimes unnatural and inefficient. On the bright side, recent work has shown how many of these features can be naturally incorporated into a functional language without compromising efficiency by expressing computations in terms of monads or continuations. Unfortunately, the "single-threading " implied by these techniques often destroys many opportunities for parallelism. In this paper, we describe a simple extension to the Haskell I/O monad that allows a form of explicit high-level concurrency. It is a simple matter to incorporate these features in a sequential implementation, and genuine parallelism can be obtained on a parallel machine. In addition, the inclusion of constructs for explicit concurrency enhances the use of Haskell as an executable specification language, since some programs are most naturally described as a composition of parallel processes. \Lambda This research was supported by ARPA via a subcontract to Intermetrics, Inc. 1
Work-Efficient Nested Data-Parallelism
- IN PROCEEDINGS OF THE FIFTH SYMPOSIUM ON THE FRONTIERS OF MASSIVELY PARALLEL PROCESSING (FRONTIERS 95). IEEE
, 1995
"... An apply-to-all construct is the key mechanism for expressing data-parallelism, but data-parallel programming languages like HPF and C* significantly restrict which operations can appear in the construct. Allowing arbitrary operations substantially simplifies the expression of irregular and nested d ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
An apply-to-all construct is the key mechanism for expressing data-parallelism, but data-parallel programming languages like HPF and C* significantly restrict which operations can appear in the construct. Allowing arbitrary operations substantially simplifies the expression of irregular and nested data-parallel computations. The technique of flattening nested parallelism introduced by Blelloch, compiles data-parallel programs with unrestricted apply-to-all constructs into vector operations, and has achieved notable success, particularly with irregular data-parallel programs. However, these programs must be carefully constructed so that flattening them does not lead to suboptimal work complexity due to unnecessary replication in index operations. We present new flattening transformations that generate programs with correct work complexity. Because these transformations may introduce concurrent reads in parallel indexing, we developed a randomized indexing that reduces concurrent reads w...
Khepera: A System for Rapid Implementation of Domain Specific Languages
- IN PROCEEDINGS USENIX CONFERENCE ON DOMAIN-SPECI LANGUAGES
, 1997
"... The Khepera system is a toolkit for the rapid implementation and long-term maintenance of domain specific languages (DSLs). Our viewpoint is that DSLs are most easily implemented via source-tosource translation from the DSL into another language and that this translation should be based on simple pa ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
The Khepera system is a toolkit for the rapid implementation and long-term maintenance of domain specific languages (DSLs). Our viewpoint is that DSLs are most easily implemented via source-tosource translation from the DSL into another language and that this translation should be based on simple parsing, sophisticated tree-based analysis and manipulation, and source generation using prettyprinting techniques. Khepera emphasizes the use of familiar, pre-existing tools and provides support for transformation replay and debugging for the DSL processor and end-user programs. In this paper, we present an overview of our approach, including implementation details and a short example.
Nepal -- Nested Data-Parallelism in Haskell
- IN EURO-PAR ’01
, 2001
"... This paper discusses an extension of Haskell by support for nested data-parallel programming in the style of the special-purpose language Nesl. More precisely, the extension consists of a parallel array type, array comprehensions, and a set of primitive parallel array operations. This extension brin ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
This paper discusses an extension of Haskell by support for nested data-parallel programming in the style of the special-purpose language Nesl. More precisely, the extension consists of a parallel array type, array comprehensions, and a set of primitive parallel array operations. This extension brings a hitherto unsupported style of parallel programming to Haskell. Moreover, nested data parallelism should receive wider attention when available in a standardised language like Haskell. This paper outlines the language extension and demonstrates its usefulness with two case studies.
Harnessing the Multicores: Nested Data Parallelism in Haskell
, 2008
"... ABSTRACT. If you want to program a parallel computer, a purely functional language like Haskell is a promising starting point. Since the language is pure, it is by-default safe for parallel evaluation, whereas imperative languages are by-default unsafe. But that doesn’t make it easy! Indeed it has p ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
ABSTRACT. If you want to program a parallel computer, a purely functional language like Haskell is a promising starting point. Since the language is pure, it is by-default safe for parallel evaluation, whereas imperative languages are by-default unsafe. But that doesn’t make it easy! Indeed it has proved quite difficult to get robust, scalable performance increases through parallel functional programming, especially as the number of processors increases. A particularly promising and well-studied approach to employing large numbers of processors is data parallelism. Blelloch’s pioneering work on NESL showed that it was possible to combine a rather flexible programming model (nested data parallelism) with a fast, scalable execution model (flat data parallelism). In this paper we describe Data Parallel Haskell, which embodies nested data parallelism in a modern, general-purpose language, implemented in a state-of-the-art compiler, GHC. We focus particularly on the vectorisation transformation, which transforms nested to flat data parallelism. 1
Flattening Trees
, 1998
"... Nested data-parallelism can be efficiently implemented by mapping it to flat parallelism using Blelloch & Sabot's flattening transformation. So far, the only dynamic data structure supported by flattening are vectors. We extend it with support for user-defined recursive types, which allow parallel t ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
Nested data-parallelism can be efficiently implemented by mapping it to flat parallelism using Blelloch & Sabot's flattening transformation. So far, the only dynamic data structure supported by flattening are vectors. We extend it with support for user-defined recursive types, which allow parallel tree structures to be denfied. Thus, important parallel algorithms can be implemented more clearly and efficiently.
Implementation and Evaluation of an Efficient Parallel Delaunay Triangulation Algorithm
- in Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures
, 1997
"... This paper describes the derivation of an empirically efficient parallel two-dimensional Delaunay triangulation program from a theoretically efficient CREW PRAM algorithm. Compared to previous work, the resulting implementation is not limited to datasets with a uniform distribution of points, achiev ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
This paper describes the derivation of an empirically efficient parallel two-dimensional Delaunay triangulation program from a theoretically efficient CREW PRAM algorithm. Compared to previous work, the resulting implementation is not limited to datasets with a uniform distribution of points, achieves significantly better speedups over good serial code, and is widely portable due to its use of MPI as a communication mechanism. Results are presented for a loosely-coupled cluster of workstations, a distributed-memory multicomputer, and a shared-memory multiprocessor. The Machiavelli toolkit used to transform the nested data parallelism inherent in the divide-and-conquer algorithm into achievable task and data parallelism is also described and compared to previous techniques.
A Data-Parallel Implementation of the Adaptive Fast Multipole Algorithm
, 1993
"... Given an ensemble of n bodies in space whose interaction is governed by a potential function, the N-body problem is to calculate the force on each body in the ensemble that results from its interaction with all other bodies. An efficient algorithm for this problem is critical in the simulation of mo ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Given an ensemble of n bodies in space whose interaction is governed by a potential function, the N-body problem is to calculate the force on each body in the ensemble that results from its interaction with all other bodies. An efficient algorithm for this problem is critical in the simulation of molecular dynamics, turbulent fluid flow, intergalactic matter and other problems. The fast multipole algorithm (FMA) developed by Greengard approximates the solution with bounded error in time O(n ). For non-uniform distributions of bodies, an adaptive variation of the algorithm is required to maintain this time complexity. The parallel execution of the FMA poses complex implementation issues in the decomposition of the problem over processors to reduce communication. As a result the 3D Adaptive FMA has, to our knowledge, never been implemented on a scalable parallel computer. This paper describes several variations on the parallel adaptive 3D FMA algorithm that are expressed using the datapa...
Enlarging the Scope of Vector-Based Computations: Extending Fortran 90 by Nested Data Parallelism
, 1997
"... This paper describes the integration of nested data parallelism into Fortran 90. Unlike flat data parallelism, nested data parallelism directly provides means for handling irregular data structures and certain forms of control parallelism, such as divideand -conquer algorithms, thus enabling the pro ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
This paper describes the integration of nested data parallelism into Fortran 90. Unlike flat data parallelism, nested data parallelism directly provides means for handling irregular data structures and certain forms of control parallelism, such as divideand -conquer algorithms, thus enabling the programmer to express such algorithms far more naturally. Existing work deals with nested data parallelism in a functional environment, which does help avoid a set of problems, but makes efficient implementations more complicated. Moreover, functional languages are not readily accepted by programmers used to languages, such as Fortran and C, which are currently predominant in programming parallel machines. In this paper, we introduce the imperative data-parallel language Fortran 90V and give an overview of its implementation. 1 Introduction Vector computers are one of the most successful architectures for high-performance computing. They offer fine-grain data parallelism, enabling one operati...

