Results 1  10
of
17
BSPlib: The BSP Programming Library
, 1998
"... BSPlib is a small communications library for bulk synchronous parallel (BSP) programming which consists of only 20 basic operations. This paper presents the full definition of BSPlib in C, motivates the design of its basic operations, and gives examples of their use. The library enables programming ..."
Abstract

Cited by 82 (6 self)
 Add to MetaCart
BSPlib is a small communications library for bulk synchronous parallel (BSP) programming which consists of only 20 basic operations. This paper presents the full definition of BSPlib in C, motivates the design of its basic operations, and gives examples of their use. The library enables programming in two distinct styles: direct remote memory access using put or get operations, and bulk synchronous message passing. Currently, implementations of BSPlib exist for a variety of modern architectures, including massively parallel computers with distributed memory, shared memory multiprocessors, and networks of workstations. BSPlib has been used in several scientific and industrial applications; this paper briefly describes applications in benchmarking, Fast Fourier Transforms, sorting, and molecular dynamics.
Algorithm + Strategy = Parallelism
 JOURNAL OF FUNCTIONAL PROGRAMMING
, 1998
"... The process of writing large parallel programs is complicated by the need to specify both the parallel behaviour of the program and the algorithm that is to be used to compute its result. This paper introduces evaluation strategies, lazy higherorder functions that control the parallel evaluation of ..."
Abstract

Cited by 55 (19 self)
 Add to MetaCart
The process of writing large parallel programs is complicated by the need to specify both the parallel behaviour of the program and the algorithm that is to be used to compute its result. This paper introduces evaluation strategies, lazy higherorder functions that control the parallel evaluation of nonstrict functional languages. Using evaluation strategies, it is possible to achieve a clean separation between algorithmic and behavioural code. The result is enhanced clarity and shorter parallel programs. Evaluation strategies are a very general concept: this paper shows how they can be used to model a wide range of commonly used programming paradigms, including divideand conquer, pipeline parallelism, producer/consumer parallelism, and dataoriented parallelism. Because they are based on unrestricted higherorder functions, they can also capture irregular parallel structures. Evaluation strategies are not just of theoretical interest: they have evolved out of our experience in parallelising several largescale applications, where they have proved invaluable in helping to manage the complexities of parallel behaviour. These applications are described in detail here. The largest application we have studied to date, Lolita, is a 60,000 line natural language parser. Initial results show that for these applications we can achieve acceptable parallel performance, while incurring minimal overhead for using evaluation strategies.
Functional Bulk Synchronous Parallel Programming in C++
 In 14th IASTED International Conference on Parallel and Distributed Computing Systems
, 2002
"... This paper presents the BSFC++ library for functional bulk synchronous parallel programming in C++. It is based on an extension of the #calculus by parallel operations on a parallel data structure named parallel vector, which is given by intention. This guarantees the determinism and the absence ..."
Abstract

Cited by 19 (14 self)
 Add to MetaCart
This paper presents the BSFC++ library for functional bulk synchronous parallel programming in C++. It is based on an extension of the #calculus by parallel operations on a parallel data structure named parallel vector, which is given by intention. This guarantees the determinism and the absence of deadlock. Broadcast algorithms are implemented using the core library.
Logic of Global Synchrony
, 2001
"... An intermediatelevel specification notation is presented for use with BSPstyle programming. It is achieved by extending prepost semantics to reveal state at points of global synchronisation. That enables us to integrate the prepost, finite and reactiveprocess styles of specification in BSP, as ..."
Abstract

Cited by 14 (10 self)
 Add to MetaCart
An intermediatelevel specification notation is presented for use with BSPstyle programming. It is achieved by extending prepost semantics to reveal state at points of global synchronisation. That enables us to integrate the prepost, finite and reactiveprocess styles of specification in BSP, as shown by our treatment of the dining philosophers. The language is provided with a complete set of laws and has been formulated to benefit from a simple predicative semantics.
P3PC: A PointtoPoint Communication Model for Automatic and Optimal Decomposition of Regular Domain Problems
 IEEE Transactions on Parallel and Distributed Systems
, 2002
"... One of the most fundamental problems automatic parallelization tools are confronted with is to nd an optimal domain decomposition for a given application. For regular domain problems (such as simple matrix manipulations) this task may seem trivial. However, communication costs in message passing pr ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
One of the most fundamental problems automatic parallelization tools are confronted with is to nd an optimal domain decomposition for a given application. For regular domain problems (such as simple matrix manipulations) this task may seem trivial. However, communication costs in message passing programs often signi cantly depend on the memory layout of data blocks to be transmitted. As a consequence, straightforward domain decompositions may be nonoptimal. In this paper we introduce a new pointtopoint communication model (called P3PC, or the 'Parameterized model based on the Three Paths of Communication') that is speci cally designed to overcome this problem. In comparison with related models (e.g., LogGP) P3PC is similar in complexity, but more accurate in many situations. Although the model is aimed at MPI's standard pointtopoint operations, it is applicable to similar message passing de nitions as well.
Parallel Juxtaposition for Bulk Synchronous Parallel ML
 EuroPar 2003, number 2790 in LNCS
, 2002
"... The BSMLlib is a library for Bulk Synchronous Parallel (BSP) programming with the functional language Objective Caml. It is based on an extension of the #calculus by parallel operations on a parallel data structure named parallel vector, which is given by intention. ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
The BSMLlib is a library for Bulk Synchronous Parallel (BSP) programming with the functional language Objective Caml. It is based on an extension of the #calculus by parallel operations on a parallel data structure named parallel vector, which is given by intention.
Parallel Superposition for Bulk Synchronous Parallel ML
, 2003
"... The BSMLlib is a library for Bulk Synchronous Parallel programming with the functional language Objective Caml. It is based on an extension of the lcalculus by parallel operations on a parallel data structure named parallel vector, which is given by intention. ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
The BSMLlib is a library for Bulk Synchronous Parallel programming with the functional language Objective Caml. It is based on an extension of the lcalculus by parallel operations on a parallel data structure named parallel vector, which is given by intention.
A fixpoint theory for nonmonotonic parallelism
, 2002
"... This paper studies parallel recursion. The trace specification language used in this paper incorporates sequential,j nondeterminism, reactiveness(inclvenessg,F'k traces), three forms of paral'VgJj (inclVgJjqMkEglglgl fairinterlkEglgl synchronous paralonousg and general recursion. In order to use Ta ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
This paper studies parallel recursion. The trace specification language used in this paper incorporates sequential,j nondeterminism, reactiveness(inclvenessg,F'k traces), three forms of paral'VgJj (inclVgJjqMkEglglgl fairinterlkEglgl synchronous paralonousg and general recursion. In order to use Tarski's theorem to determine the fixpoints of recursions, we need to identify awelVjgJ,FIq partial order.Several orders are considered,incldered new order calrg the lexical order, which tends tosimulM, the execution of a recursion in asimilk manner as the EglVqgJ,E, order. A theorem of this paper shows that no appropriate order exists for the lhegIIIE Tarski's theoremalor is not enough to determine the fixpoints ofparalVI recursions. Instead of usingTarski's theoremdirectl, we reason about the fixpoints of terminatingand nonterminatingbehavioursseparateli Such reasoningis supported by the leg of a new compositioncalio partition. We propose a fixpoint techniquecalni the partitioned fixpoint, which is thelgqk fixpoint of the nonterminatingbehaviours after the terminatingbehaviours reach their greatest fixpoint. The surprisingresul is thataltg,M, a recursion may not beljV"EgJqVE' monotonic, it must have the partitioned fixpoint, which isequal to thelegj lgjIjI,gJqF' fixpoint. Since the partitioned #xpoint iswel defined in anycompl,q lmpl,q theresulq areappljFMgJ to various semanticmodeli Existing fixpoint techniques simpl becomespecial cases of the partitioned fixpoint. Forexamplj an EglIIqgJq',EFglEFg recursion has itslsgj EglMMFIgJq fixpoint, which can be shown to be the same as the partitioned fixpoint. The new technique is moregeneral than thelegq EglEEkIgJq fixpoint in that the partitioned fixpoint can be determined even when a recursion is notEglVjjVgJq monotonic.Exampln of nonmonotonic recur...
A simple and efficient parallel FFT algorithm using the BSP model
 Parallel Comput
, 2000
"... . In this paper, we present a new parallel radix4 FFT algorithm based on the BSP model. Our parallel algorithm uses the groupcyclic distribution family, which makes it simple to understand and easy to implement. We show how to reduce the communication cost of the algorithm by a factor of three, in ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
. In this paper, we present a new parallel radix4 FFT algorithm based on the BSP model. Our parallel algorithm uses the groupcyclic distribution family, which makes it simple to understand and easy to implement. We show how to reduce the communication cost of the algorithm by a factor of three, in the case that the input/output vector is in the cyclic distribution. We also show how to reduce computation time on computers with a cachebased architecture. We present performance results on a Cray T3E with up to 64 processors, obtaining reasonable efficiency levels for local problem sizes as small as 256 and very good efficiency levels for sizes larger than 2048. 1. Introduction The discrete Fourier transform (DFT) plays an important role in computational science. DFT applications ranges from solving numerical differential equations to signal processing. (For an introduction to DFT applications see e.g. [7].) The widespread use of DFTs in computational science is mainly due to the exist...
On the Efficient Parallel Computation of Legendre Transforms
, 1999
"... In this article, we discuss a parallel implementation of efficient algorithms for computation of Legendre polynomial transforms and other orthogonal polynomial transforms. We develop an approach to the DriscollHealy algorithm using polynomial arithmetic and present experimental results on the accur ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
In this article, we discuss a parallel implementation of efficient algorithms for computation of Legendre polynomial transforms and other orthogonal polynomial transforms. We develop an approach to the DriscollHealy algorithm using polynomial arithmetic and present experimental results on the accuracy, efficiency, and scalability of our implementation. The algorithms were implemented in ANSI C using the BSPlib communications library. We also present a new algorithm for computing the cosine transform of two vectors at the same time.