Results 1 -
7 of
7
Functional Bulk Synchronous Parallel Programming in C++
- In 14th IASTED International Conference on Parallel and Distributed Computing Systems
, 2002
"... This paper presents the BSFC++ library for functional bulk synchronous parallel programming in C++. It is based on an extension of the #-calculus by parallel operations on a parallel data structure named parallel vector, which is given by intention. This guarantees the determinism and the absence ..."
Abstract
-
Cited by 18 (14 self)
- Add to MetaCart
This paper presents the BSFC++ library for functional bulk synchronous parallel programming in C++. It is based on an extension of the #-calculus by parallel operations on a parallel data structure named parallel vector, which is given by intention. This guarantees the determinism and the absence of deadlock. Broadcast algorithms are implemented using the core library.
Stream Parallel Skeleton Optimization
- in proceedings of the 11th IASTED International Conference on Parallel and Distributed Computing and Systems, MIT
, 1999
"... We discuss the properties of the composition of stream parallel skeletons such as pipelines and farms. By looking at the ideal performance gures assumed to hold for these skeletons, we show that any stream parallel skeleton composition can always be rewritten into an equivalent \normal form" skeleto ..."
Abstract
-
Cited by 14 (12 self)
- Add to MetaCart
We discuss the properties of the composition of stream parallel skeletons such as pipelines and farms. By looking at the ideal performance gures assumed to hold for these skeletons, we show that any stream parallel skeleton composition can always be rewritten into an equivalent \normal form" skeleton composition, delivering a service time which is equal or even better to the service time of the original skeleton composition, and achieving a better utilization of the processors used. The normal form is dened as a single farm built around a sequential worker code. Experimental results are discussed that validate this normal form. Keywords: skeletons, rewriting, stream parallelism. 1 Introduction Skeleton based programming models represent an interesting subject in the eld of parallel programming models. Cole introduced the skeleton concept in the late 80's [1]. Cole's skeletons represented parallelism exploitation patterns that can be used (instantiated) to model common parallel ap...
BSλ_p: Functional BSP Programs on Enumerated Vectors
, 2000
"... The BS#p calculus is a calculus of functional BSP programs on enumerated parallel vectors. This confluent calculus is defined and a parallel cost model is associated with a weak call-by-value strategy. ..."
Abstract
-
Cited by 9 (8 self)
- Add to MetaCart
The BS#p calculus is a calculus of functional BSP programs on enumerated parallel vectors. This confluent calculus is defined and a parallel cost model is associated with a weak call-by-value strategy.
Optimizing Sequences of Skeleton Calls
- in D. Batory, C. Consel, C. Lengauer, M. Odersky (Eds.): Domain-Specific Program Generation. LNCS
, 2004
"... Abstract. Today, parallel programming is dominated by message passing libraries such as MPI. Algorithmic skeletons intend to simplify parallel programming by their expressive power. The idea is to offer typical parallel programming patterns as polymorphic higher-order functions which are efficiently ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Abstract. Today, parallel programming is dominated by message passing libraries such as MPI. Algorithmic skeletons intend to simplify parallel programming by their expressive power. The idea is to offer typical parallel programming patterns as polymorphic higher-order functions which are efficiently implemented in parallel. Skeletons can be understood as a domain-specific language for parallel programming. In this chapter, we describe a set of data parallel skeletons in detail and investigate the potential of optimizing sequences of these skeletons by replacing them by more efficient sequences. Experimental results based on a draft implementation of our skeleton library are shown. 1
The Network of Tasks Model
- Proc. of Parallel and Distributed Computing Systems 1999, IASTED, available as Report 1999-427, Queen's University School of Computing, http://www.cs.queensu.ca/TechReports/authorsS.html
, 1999
"... We present a new model, the Networks of Tasks (NOT) model, that allows modules from skeletonlike languages to be embedded in static task graphs. The model is designed to provide transparent cost information, so that program designers can accurately predict the execution time performance of their pro ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We present a new model, the Networks of Tasks (NOT) model, that allows modules from skeletonlike languages to be embedded in static task graphs. The model is designed to provide transparent cost information, so that program designers can accurately predict the execution time performance of their programs as they assemble them. This is done by using an implementation technique called work-based allocation which uses adaptivity of the component node programs to execute the task graph with the same work and communication cost that is visible when the task graph is assembled. The semantics of NOT programs is simple enough that formal methods for developing them are straightforward. A refinement-based calculus for the NOT model is also outlined, and a law for handling residuals is given. Keywords: task graph, adaptive computation, heterogeneous computation, program transformation, cost modelling, formal methods, refinement calculus. 1 Introduction Constructing parallel software is made ...
Interprocedural Optimisation of Regular Parallel Computations at Runtime
, 2001
"... This thesis concerns techniques for efficient runtime optimisation of regular parallel programs that are built from separate software components. High-quality, high-performance parallel software is frequently built from separately-written reusa-ble software components such as functions from a librar ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This thesis concerns techniques for efficient runtime optimisation of regular parallel programs that are built from separate software components. High-quality, high-performance parallel software is frequently built from separately-written reusa-ble software components such as functions from a library of parallel routines. Apart from the strong case from the software engineering point-of-view for constructing software in such a way, there is often also a large performance benefit in hand-optimising individual, frequently used routines. Hitherto, a problem with such libraries of separate software components has been that there is a performance penalty, both because of invocation and indirection overheads, and because opportuni-ties for cross-component optimisations are missed. The techniques we describe in this thesis aim to reverse this disadvantage by making use of high-level abstract information about the components for performing cross-component optimisation. The key is to specify, generate and make use of metadata which characterise both data and software components, and to take advantage of run-time information. We propose a delayed evaluation, self-optimising (DESO) library of data-parallel numerical rou-tines. Delayed evaluation allows us to capture the control-flow of a user program from within the
A Review of Data Placement Optimisation for Data-Parallel Component Composition
- of the University of Passau
"... Constructive methods for parallel programming are characterised by the composition of optimised, parallel software components. This paper concerns data placement, a key cross-component optimisation for regular data-parallel programs. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Constructive methods for parallel programming are characterised by the composition of optimised, parallel software components. This paper concerns data placement, a key cross-component optimisation for regular data-parallel programs.

