Results 11 - 20
of
51
A library of constructive skeletons for sequential style of parallel programming
- In InfoScale ’06: Proceedings of the 1st international conference on Scalable information systems, volume 152 of ACM International Conference Proceeding Series
, 2006
"... With the increasing popularity of parallel programming environments such as PC clusters, more and more sequential programmers, with little knowledge about parallel architectures and parallel programming, are hoping to write parallel programs. Numerous attempts have been made to develop high-level pa ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
With the increasing popularity of parallel programming environments such as PC clusters, more and more sequential programmers, with little knowledge about parallel architectures and parallel programming, are hoping to write parallel programs. Numerous attempts have been made to develop high-level parallel programming libraries that use abstraction to hide low-level concerns and reduce difficulties in parallel programming. Among them, libraries of parallel skeletons have emerged as a promising way towards this direction. Unfortunately, these libraries are not well accepted by sequential programmers, because of incomplete elimination of lower-level details, ad-hoc selection of library functions, unsatisfactory performance, or lack of convincing application examples. This paper addresses principle of designing skeleton libraries of parallel programming and reports implementation details and practical applications of a skeleton library SkeTo. The SkeTo library is unique in its feature that it has a solid theoretical foundation based on the theory of Constructive Algorithmics, and is practical to be used to describe various parallel computations in a sequential manner. 1.
The Integration of Task and Data Parallel Skeletons
, 2002
"... We describe a skeletal parallel programming library which integrates task and data parallel constructs within an API for C++. Traditional skeletal requirements for higher orderness and polymorphism are achieved through exploitation of operator overloading and templates, while the underlying paral ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
We describe a skeletal parallel programming library which integrates task and data parallel constructs within an API for C++. Traditional skeletal requirements for higher orderness and polymorphism are achieved through exploitation of operator overloading and templates, while the underlying parallelism is provided by MPI. We present a case study describing two algorithms for the travelling salesman problem.
Computing Downwards Accumulations on Trees Quickly
, 1995
"... Downwards passes on binary trees are essentially functions which pass information down a tree, from the root towards the leaves. Under certain conditions, a downwards pass is both `efficient' (computable in a functional style in parallel time proportional to the depth of the tree) and `manipulable' ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Downwards passes on binary trees are essentially functions which pass information down a tree, from the root towards the leaves. Under certain conditions, a downwards pass is both `efficient' (computable in a functional style in parallel time proportional to the depth of the tree) and `manipulable' (enjoying a number of distributivity properties useful in program construction); we call a downwards pass satisfying these conditions a downwards accumulation. In this paper, we show that these conditions do in fact yield a stronger conclusion: the accumulation can be computed in parallel time proportional to the logarithm of the depth of the tree, on a Crew Pram machine.
Optimizing Compositions of Scans and Reductions in Parallel Program Derivation
, 1997
"... Introduction We study two popular programming schemas: scan (also known as prefix sums, parallel prefix, etc.) and reduction (also known as fold). Originally from the functional world [3], they are becoming increasingly popular as primitives of parallel programming. The reasons are that, first, such ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Introduction We study two popular programming schemas: scan (also known as prefix sums, parallel prefix, etc.) and reduction (also known as fold). Originally from the functional world [3], they are becoming increasingly popular as primitives of parallel programming. The reasons are that, first, such higher-order combinators are adequate and useful for a broad class of applications [4], second, they encourage well-structured, coarse-grained parallel programming and, third, their implementation in the MPI standard [14] makes the target programs portable across different parallel architectures with predictable performance. Our contributions are as follows: -- We formally prove two optimization rules: the first rule transforms a sequential composition of scan and reduction into a single reduction, the second rule transforms a composition of two scans into a single scan. -- We apply the first rule in the formal derivation of a parallel algorithm for the
Transformation Based Communication and Clock Domain Refinement for System Design
, 2002
"... The ForSyDe methodology has been developed for system level design. In this paper we present formal transformation methods for the refinement of an abstract and formal system model into an implementation model. The methodology defines two classes of design transformations: (1) semantic-preserving tr ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
The ForSyDe methodology has been developed for system level design. In this paper we present formal transformation methods for the refinement of an abstract and formal system model into an implementation model. The methodology defines two classes of design transformations: (1) semantic-preserving transformations and (2) design decisions. In particular we present and illustrate communication and clock domain refinement by way of a digital equalizer system.
Towards a Scalable Parallel Object Database - The Bulk Synchronous Parallel Approach
, 1996
"... Parallel computers have been successfully deployed in many scientific and numerical application areas, although their use in non-numerical and database applications has been scarce. In this report, we first survey the architectural advancements beginning to make general-purpose parallel computing co ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Parallel computers have been successfully deployed in many scientific and numerical application areas, although their use in non-numerical and database applications has been scarce. In this report, we first survey the architectural advancements beginning to make general-purpose parallel computing cost-effective, the requirements for non-numerical (or symbolic) applications, and the previous attempts to develop parallel databases. The central theme of the Bulk Synchronous Parallel model is to provide a high level abstraction of parallel computing hardware whilst providing a realisation of a parallel programming model that enables architecture independent programs to deliver scalable performance on diverse hardware platforms. Therefore, the primary objective of this report is to investigate the feasibility of developing a portable, scalable, parallel object database, based on the Bulk Synchronous Parallel model of computation. In particular, we devise a way of providing high-level abstra...
(De)Composition Rules for Parallel Scan and Reduction
- In Proc. 3rd Int. Working Conf. on Massively Parallel Programming Models (MPPM'97
, 1998
"... We study the use of well-defined building blocks for SPMD programming of machines with distributed memory. Our general framework is based on homomorphisms, functions that capture the idea of dataparallelism and have a close correspondence with collective operations of the MPI standard, e.g., scan an ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We study the use of well-defined building blocks for SPMD programming of machines with distributed memory. Our general framework is based on homomorphisms, functions that capture the idea of dataparallelism and have a close correspondence with collective operations of the MPI standard, e.g., scan and reduction. We prove two composition rules: under certain conditions, a composition of a scan and a reduction can be transformed into one reduction, and a composition of two scans into one scan. As an example of decomposition, we transform a segmented reduction into a composition of partial reduction and all-gather. The performance gain and overhead of the proposed composition and decomposition rules are assessed analytically for the hypercube and compared with the estimates for some other parallel models.
Diffusion: Calculating Efficient Parallel Programs
- IN 1999 ACM SIGPLAN WORKSHOP ON PARTIAL EVALUATION AND SEMANTICS-BASED PROGRAM MANIPULATION (PEPM ’99
, 1999
"... Parallel primitives (skeletons) intend to encourage programmers to build a parallel program from ready-made components for which efficient implementations are known to exist, making the parallelization process easier. However, programmers often suffer from the difficulty to choose a combination of p ..."
Abstract
-
Cited by 8 (7 self)
- Add to MetaCart
Parallel primitives (skeletons) intend to encourage programmers to build a parallel program from ready-made components for which efficient implementations are known to exist, making the parallelization process easier. However, programmers often suffer from the difficulty to choose a combination of proper parallel primitives so as to construct efficient parallel programs. To overcome this difficulty, we shall propose a new transformation, called diffusion, which can efficiently decompose a recursive definition into several functions such that each function can be described by some parallel primitive. This allows programmers to describe algorithms in a more natural recursive form. We demonstrate our idea with several interesting examples. Our diffusion transformation should be significant not only in development of new parallel algorithms, but also in construction of parallelizing compilers.
Using Algorithmic Skeletons with Dynamic Data Structures
- in Proceedings of Irregular '96, LNCS 1117
, 1996
"... . Algorithmic skeletons are polymorphic higher-order functions representing common parallelization patterns. A special category are data-parallel skeletons, which perform operations on a distributed data structure. In this paper, we consider the case of distributed data structures with dynamic eleme ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
. Algorithmic skeletons are polymorphic higher-order functions representing common parallelization patterns. A special category are data-parallel skeletons, which perform operations on a distributed data structure. In this paper, we consider the case of distributed data structures with dynamic elements. We present the enhancements necessary in order to cope with these data structures, both on the language level and in the implementation of the skeletons. Further, we show that these enhancements practically do not affect the user, who merely has to supply two additional functional arguments to the communication skeletons. We then implement a parallel sorting algorithm using dynamic data with the enhanced skeletons on a MIMD distributed memory machine. Run-time measurements show that the speedups of the skeleton-based implementation are comparable to those obtained for a direct C implementation. 1 Introduction Algorithmic skeletons represent an approach to parallel programming which co...
Parallelizing Functional Programs by Generalization
- Journal of Functional Programming
, 1997
"... List homomorphisms are functions that are parallelizable using the divide-and-conquer paradigm. We study the problem of finding a homomorphic representation of a given function, based on the Bird-Meertens theory of lists. A previous work proved that to each pair of leftward and rightward sequential ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
List homomorphisms are functions that are parallelizable using the divide-and-conquer paradigm. We study the problem of finding a homomorphic representation of a given function, based on the Bird-Meertens theory of lists. A previous work proved that to each pair of leftward and rightward sequential representations of a function, based on cons- and snoc-lists, respectively, there is also a representation as a homomorphism. Our contribution is a mechanizable method to extract the homomorphism representation from a pair of sequential representations. The method is decomposed to a generalization problem and an inductive claim, both solvable by term rewriting techniques. To solve the former we present a sound generalization procedure which yields the required representation, and terminates under reasonable assumptions. We illustrate the method and the procedure by the parallelization of the scan-function (parallel prefix). The inductive claim is provable automatically.

