Results 11 - 20
of
112
Performance Models for the Processor Farm Paradigm
- IEEE Transactions on Parallel and Distributed Systems
, 1997
"... In this paper, we describe the design, implementation, and modeling of a runtime kernel to support the processor farm paradigm on multicomputers. We present a general topology-independent framework for obtaining performance models to predict the performance of the start-up, steady-state, and wind- ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
In this paper, we describe the design, implementation, and modeling of a runtime kernel to support the processor farm paradigm on multicomputers. We present a general topology-independent framework for obtaining performance models to predict the performance of the start-up, steady-state, and wind-down phases of a processor farm. An algorithm is described, which for any interconnection network determines a tree-structured subnetwork that optimizes farm performance. The analysis technique is applied to the important case of k-ary tree topologies. The models are compared with the measured performance on a variety of topologies using both constant and varied task sizes. Index Terms---Parallel programming paradigms, performance evaluation, processor farm, tree networks, message passing architecture, network flow, master-slave. ------------------------------ F ------------------------------ 1I NTRODUCTION HE major problems in parallel computation revolve around questions of ease of...
Stream Parallel Skeleton Optimization
- in proceedings of the 11th IASTED International Conference on Parallel and Distributed Computing and Systems, MIT
, 1999
"... We discuss the properties of the composition of stream parallel skeletons such as pipelines and farms. By looking at the ideal performance gures assumed to hold for these skeletons, we show that any stream parallel skeleton composition can always be rewritten into an equivalent \normal form" skeleto ..."
Abstract
-
Cited by 14 (12 self)
- Add to MetaCart
We discuss the properties of the composition of stream parallel skeletons such as pipelines and farms. By looking at the ideal performance gures assumed to hold for these skeletons, we show that any stream parallel skeleton composition can always be rewritten into an equivalent \normal form" skeleton composition, delivering a service time which is equal or even better to the service time of the original skeleton composition, and achieving a better utilization of the processors used. The normal form is dened as a single farm built around a sequential worker code. Experimental results are discussed that validate this normal form. Keywords: skeletons, rewriting, stream parallelism. 1 Introduction Skeleton based programming models represent an interesting subject in the eld of parallel programming models. Cole introduced the skeleton concept in the late 80's [1]. Cole's skeletons represented parallelism exploitation patterns that can be used (instantiated) to model common parallel ap...
Efficient Distributed Memory Implementation of a Data Parallel Functional Language
- in Proceedings of PARLE '94, LNCS 817
, 1994
"... . We discuss why existing implementations of functional languages on MIMD-machines with distributed memory are slow. This is done by comparing the behavior of a functional program with a corresponding Occam program. The main reason is that functional languages give insufficient means to control para ..."
Abstract
-
Cited by 13 (11 self)
- Add to MetaCart
. We discuss why existing implementations of functional languages on MIMD-machines with distributed memory are slow. This is done by comparing the behavior of a functional program with a corresponding Occam program. The main reason is that functional languages give insufficient means to control parallelism and communication. Our approach is to support data parallelism by providing a set of primitives on arrays which allow the user to control the parallelism and communication on a high level, disabling problems like deadlocks. Only one unique version of an array may be referenced at a time. This restriction allows arrays to be updated in place and enables the user to control the space requirements of the program. The uniqueness of arrays is checked by the compiler. Experimental results demonstrate the efficiency of our data parallel functional language. 1 Introduction MIMD machines with distributed memory offer high computation power at relatively low costs. On the other hand, the corr...
Realtime Signal Processing -- Dataflow, Visual, and Functional Programming
, 1995
"... This thesis presents and justifies a framework for programming real-time signal processing systems. The framework extends the existing "block-diagram" programming model; it has three components: a very high-level textual language, a visual language, and the dataflow process network model of computat ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
This thesis presents and justifies a framework for programming real-time signal processing systems. The framework extends the existing "block-diagram" programming model; it has three components: a very high-level textual language, a visual language, and the dataflow process network model of computation. The dataflow process network model, although widely-used, lacks a formal description, and I provide a semantics for it. The formal work leads into a new form of actor. Having established the semantics of dataflow processes, the functional language Haskell is layered above this model, providing powerful features---notably polymorphism, higher-order functions, and algebraic program transformation---absent in block-diagram systems. A visual equivalent notation for Haskell, Visual Haskell, ensures that this power does not exclude the "intuitive" appeal of visual interfaces; with some intelligent layout and suggestive icons, a Visual Haskell program can be made to look very like a block dia...
Parallelization of Divide-and-Conquer by Translation to Nested Loops
- J. Functional Programming
, 1997
"... We propose a sequence of equational transformations and specializations which turns a divide-and-conquer skeleton in Haskell into a parallel loop nest in C. Our initial skeleton is often viewed as general divide-and-conquer. The specializations impose a balanced call tree, a fixed degree of the prob ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
We propose a sequence of equational transformations and specializations which turns a divide-and-conquer skeleton in Haskell into a parallel loop nest in C. Our initial skeleton is often viewed as general divide-and-conquer. The specializations impose a balanced call tree, a fixed degree of the problem division, and elementwise operations. Our goal is to select parallel implementations of divide-and-conquer via a space-time mapping, which can be determined at compile time. The correctness of our transformations is proved by equational reasoning in Haskell; recursion and iteration are handled by induction. Finally, we demonstrate the practicality of the skeleton by expressing Strassen's matrix multiplication in it.
Efficient High-Level Parallel Programming
- Theoretical Computer Science
, 1997
"... Algorithmic skeletons are polymorphic higher-order functions representing common parallelization patterns and implemented in parallel. They can be used as the building blocks of parallel and distributed applications by integrating them into a sequential language. In this paper, we present a new appr ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Algorithmic skeletons are polymorphic higher-order functions representing common parallelization patterns and implemented in parallel. They can be used as the building blocks of parallel and distributed applications by integrating them into a sequential language. In this paper, we present a new approach to programming with skeletons. We integrate the skeletons into an imperative host language enhanced with higher-order functions and currying, as well as with a polymorphic type system. We thus obtain a high-level programming language which can be implemented very efficiently. We then present a compile-time technique for the implementation of the functional features which has an important positive impact on the efficiency of the language. After describing a series of skeletons which work with distributed arrays, we give two examples of parallel algorithms implemented in our language, namely matrix multiplication and Gaussian elimination. Run-time measurements for these and other applicat...
On the Distributed Implementation of Aggregate Data Structures by Program Transformation
- In Proceedings of the 4th IPPS/SDP International Workshop on High-Level Parallel Programming Models and Supportive Environments,IPPS/SDP99
, 1999
"... . A critical component of many data-parallel programming languages are operations that manipulate aggregate data structures as a whole---this includes Fortran 90, Nesl, and languages based on BMF. These operations are commonly implemented by a library whose routines operate on a distributed represen ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
. A critical component of many data-parallel programming languages are operations that manipulate aggregate data structures as a whole---this includes Fortran 90, Nesl, and languages based on BMF. These operations are commonly implemented by a library whose routines operate on a distributed representation of the aggregate structure; the compiler merely generates the control code invoking the library routines and all machine-dependent code is encapsulated in the library. While this approach is convenient, we argue that by breaking the abstraction enforced by the library and by presenting some of internals in the form of a new intermediate language to the compiler back-end, we can optimize on all levels of the memory hierarchy and achieve more flexible data distribution. The new intermediate language allows us to present these optimisations elegantly as program transformations. We report on first results obtained by our approach in the implementation of nested data parallelis...

