Results 1 - 10
of
14
A Monadic Calculus for Parallel Costing of a Functional Language of Arrays
- Euro-Par'97 Parallel Processing, volume 1300 of Lecture Notes in Computer Science
, 1997
"... . Vec is a higher-order functional language of nested arrays, which includes a general folding operation. Static computation of the shape of its programs is used to support a compositional cost calculus based on a cost monad. This, in turn, is based on a cost algebra, whose operations may be customi ..."
Abstract
-
Cited by 24 (9 self)
- Add to MetaCart
. Vec is a higher-order functional language of nested arrays, which includes a general folding operation. Static computation of the shape of its programs is used to support a compositional cost calculus based on a cost monad. This, in turn, is based on a cost algebra, whose operations may be customized to handle different cost regimes, especially for parallel programming. We present examples based on sequential costing and on the PRAM model of parallel computation. The latter has been implemented in Haskell, and applied to some linear algebra examples. 1 Introduction Second-order combinators such as map, fold and zip provide programmers with a concise, abstract language for writing skeletons for implicitly parallel programs, as in [Ski94], but there is a hitch. These combinators are defined for list programs (see [BW88]), but efficient implementations (which is the point of parallelism, after all) are based on arrays. This disparity becomes acute when working with nested arrays, which...
Optimising Data-Parallel Programs Using the BSP Cost Model
- In Europar'98
, 1998
"... . We describe the use of the BSP cost model to optimise programs using skeletons or data-parallel operations, in which program components may have multiple implementations. The use of BSP transforms the problem of finding the best implementation choice for each component that minimises overall execu ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
. We describe the use of the BSP cost model to optimise programs using skeletons or data-parallel operations, in which program components may have multiple implementations. The use of BSP transforms the problem of finding the best implementation choice for each component that minimises overall execution time into a one-dimensional minimisation problem. An algorithm which finds optimal implementations in time linear in the length of the program is given. 1 Problem Setting Many parallel programming models gain expressiveness by raising the level of abstraction. Important examples are skeletons and data-parallel languages such as HPF. Programs in these models are compositions of moderately-large building blocks, each of which hides significant parallel computation internally. There are typically multiple implementations for each of these building blocks, and it is straightforward to order these implementations by execution cost. What makes the problem difficult is that different implemen...
Tuning Task Granularity and Data Locality of Data Parallel GpH Programs
, 2001
"... The performance of data parallel programs often hinges on two key coordination aspects: the computational costs of the parallel tasks relative to their management overhead | task granularity ; and the communication costs induced by the distance between tasks and their data | data locality . In da ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
The performance of data parallel programs often hinges on two key coordination aspects: the computational costs of the parallel tasks relative to their management overhead | task granularity ; and the communication costs induced by the distance between tasks and their data | data locality . In data parallel programs both granularity and locality can be improved by clustering, i.e. arranging for parallel tasks to operate on related sub-collections of data.
Parallel Functional Programming by Partitioning
, 1997
"... Caliban is a declarative language which addresses the area of static distributed memory parallel computing. It is an annotation language that allows the pro-grammer to partition a functional program and data amongst the computational resources available. It is integrated into the source language so ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Caliban is a declarative language which addresses the area of static distributed memory parallel computing. It is an annotation language that allows the pro-grammer to partition a functional program and data amongst the computational resources available. It is integrated into the source language so that the full power of the host language can be used to express the partitioning of the program. Partial evaluation is used to determine a complete version of the annotation at compile time. Program transformation is then used to make the parallelism ex-plicit. This thesis describes the Caliban language and its pilot implementation. It then continues by presenting extensions and improvements to the basic language. Implementation techniques for the improved language are discussed in relation to an implementation on the Fujitsu AP1000 distributed memory multiprocessor computer. Two application case studies together with some performance results are presented. Finally, there is a critical appraisal of the language and its ap-proach. Caliban has good support for general data and computation partitioning. It also aids software reuse with its ability to abstract common computational structures into higher order forms which are concretised at compile time by partial evaluation. However, there do remain some open issues relating to evaluation order control. Finally, Caliban can be implemented reasonably e ciently on standard parallel hardware. Acknowledgements First and foremost I would liketothankPaul Kelly,my supervisor, for his encour-agement, support and direction. It was his work that formed the genesis of the work presented here and it is with his supervision that I have journeyed through the world of parallel programming as an undergraduate, research assistant and PhD student. I also thank my second supervisor Susan Eisenbach for her help. Many thanks go to the members of the Advanced Languages and Architectures Section for providing a stimulating and friendly environment to work in. Also
Performance Models for Co-ordinating Parallel Data Classification
- In Proceedings of the Seventh International Parallel Computing Workshop (PCW-97
, 1997
"... In this paper we investigate the use of performance models for structuring parallel programs through a case study in data mining. Performance models have been shown to be an integral part of providing a more structured approach to the problems of performance portability and resource allocation in pa ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
In this paper we investigate the use of performance models for structuring parallel programs through a case study in data mining. Performance models have been shown to be an integral part of providing a more structured approach to the problems of performance portability and resource allocation in parallel programming. This is particularly true in the context of skeletons, where parallel programs are expressed as combinations of predefined, often higher-order, functions. The use of performance models has, to some extent, been limited by the difficulty in applying the approach to irregular and dynamic parallel algorithms. We explore this problem in the context of a well known data mining algorithm, C4.5, which exhibits both irregular and dynamic characteristics. C4.5 is rich in inherent parallelism making the choice of a suitable parallel implementation for a given architecture non-trivial. We demonstrate how a structured approach to developing the performance models enables a c...
Costing Parallel Programs as a Function of Shapes
- Science of Computer Programming
, 1999
"... Portable, efficient, parallel programming requires cost models to compare different possible implementations. In turn, these require knowledge of the shapes of the data structures being used, as well as knowledge of the hardware parameters. This paper shows how shape analysis techniques developed ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Portable, efficient, parallel programming requires cost models to compare different possible implementations. In turn, these require knowledge of the shapes of the data structures being used, as well as knowledge of the hardware parameters. This paper shows how shape analysis techniques developed in the FISh programming language could be exploited to produce a data parallel language with an accurate, portable cost model. 1 Introduction The problem of constructing portable efficient parallel programs is still unsolved. It originates in the observation that an algorithm that executes efficiently in one setting may be extremely inefficient in another. Hence, the challenge is to automatically adapt the algorithm to match the circumstances. To do this during compilation requires a cost model that is able to identify which of two alternative algorithms is faster. To date, most work has focussed on measuring the impact of changes to hardware as observed through a small suite of hardwar...
Compiling and supporting skeletons on MPP
- In Proceedings of Massively Parallel Programming Models (MPPM) '97
, 1997
"... Parallel programming needs a high level programming model in which compilers and run time supports take care of traditionally intractable problems related to efficient usage of the target machine (mapping, scheduling, data decomposition, etc.). The matter of designing a real system providing such a ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Parallel programming needs a high level programming model in which compilers and run time supports take care of traditionally intractable problems related to efficient usage of the target machine (mapping, scheduling, data decomposition, etc.). The matter of designing a real system providing such a model is highly simplified by constructing the parallel programs using scalable skeletons which capture common structural components of parallel computations. The key problem is the efficient implementation of programs composed of several nested skeleton instances. This requires optimizing the resulting process graph structure and map it on the available resources in order to balance load and minimize communications. The paper describes how this can be done, despite of the intractability of the problems involved, exploiting the `structure' imposed by the skeleton approach. 1. Introduction Skeleton based parallel models are becoming increasingly popular as the basis of high level systems ...
A Scheme For Nesting Algorithmic Skeletons
- Department of Computer Science, University College London
, 1998
"... . A scheme for arbitrary nesting of algorithmic skeletons is explained which is based on the idea of groups in MPI. Two skeletons were developed which run in a nested mode: a binary divide and conquer and a process farm for a parallel implementation of fold and map HOFs respectively. An Example show ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
. A scheme for arbitrary nesting of algorithmic skeletons is explained which is based on the idea of groups in MPI. Two skeletons were developed which run in a nested mode: a binary divide and conquer and a process farm for a parallel implementation of fold and map HOFs respectively. An Example showing various cases for nesting the two skeletons is presented. The experiment was conducted on the Fujitsu AP1000 parallel machine. 1 Introduction It is well known that parallelism adds an additional level of difficulty to software development. Following Cole's characterisation[1], algorithmic skeletons have been recognised widely as a valuable basis for parallel software construction. A skeleton abstracts a control structure which may be instantiated subsequently with specific functions to carry out specific tasks. Therefore, the encapsulation of parallel algorithms into skeletons is a promising approach to high-level specification of parallel algorithms. Normally, functional programming la...
Interprocedural Optimisation of Regular Parallel Computations at Runtime
, 2001
"... This thesis concerns techniques for efficient runtime optimisation of regular parallel programs that are built from separate software components. High-quality, high-performance parallel software is frequently built from separately-written reusa-ble software components such as functions from a librar ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This thesis concerns techniques for efficient runtime optimisation of regular parallel programs that are built from separate software components. High-quality, high-performance parallel software is frequently built from separately-written reusa-ble software components such as functions from a library of parallel routines. Apart from the strong case from the software engineering point-of-view for constructing software in such a way, there is often also a large performance benefit in hand-optimising individual, frequently used routines. Hitherto, a problem with such libraries of separate software components has been that there is a performance penalty, both because of invocation and indirection overheads, and because opportuni-ties for cross-component optimisations are missed. The techniques we describe in this thesis aim to reverse this disadvantage by making use of high-level abstract information about the components for performing cross-component optimisation. The key is to specify, generate and make use of metadata which characterise both data and software components, and to take advantage of run-time information. We propose a delayed evaluation, self-optimising (DESO) library of data-parallel numerical rou-tines. Delayed evaluation allows us to capture the control-flow of a user program from within the

