Results 1 -
8 of
8
Performance Models for the Processor Farm Paradigm
- IEEE Transactions on Parallel and Distributed Systems
, 1997
"... In this paper, we describe the design, implementation, and modeling of a runtime kernel to support the processor farm paradigm on multicomputers. We present a general topology-independent framework for obtaining performance models to predict the performance of the start-up, steady-state, and wind- ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
In this paper, we describe the design, implementation, and modeling of a runtime kernel to support the processor farm paradigm on multicomputers. We present a general topology-independent framework for obtaining performance models to predict the performance of the start-up, steady-state, and wind-down phases of a processor farm. An algorithm is described, which for any interconnection network determines a tree-structured subnetwork that optimizes farm performance. The analysis technique is applied to the important case of k-ary tree topologies. The models are compared with the measured performance on a variety of topologies using both constant and varied task sizes. Index Terms---Parallel programming paradigms, performance evaluation, processor farm, tree networks, message passing architecture, network flow, master-slave. ------------------------------ F ------------------------------ 1I NTRODUCTION HE major problems in parallel computation revolve around questions of ease of...
Clumps: A Candidate Model Of Efficient, General Purpose Parallel Computation
, 1994
"... A new model of parallel computation is proposed, CLUMPS (Campbell's Lenient, Unified Model of Parallel Systems). This is composed of an abstract machine with an associated cost model, and aims to be more portable, reflective of costs, expressible and encouraging of more efficient implementations of ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
A new model of parallel computation is proposed, CLUMPS (Campbell's Lenient, Unified Model of Parallel Systems). This is composed of an abstract machine with an associated cost model, and aims to be more portable, reflective of costs, expressible and encouraging of more efficient implementations of algorithms than other existing models. It is shown that each basic parallel architecture class can congruently perform each other's computations, but the congruent simulation of each other's communication is not generally possible (where for a simulation to be congruent the simulation costs on the target architecture are asymptotically equivalent to the implementation costs on the native architectures). This is reflected in the CLUMPS abstract machine through its flexibility in terms of program control and memory access. The congruence requirement is relaxed so that though strict congruence may not be achieved according to the above definition, communication costs are reflectively accounted ...
Further Towards a Unification of Parallel Architecture Classes
- Department of Computer Science, University fo York
, 1996
"... Traditionally, parallel programming paradigms are associated with only a single class of parallel architecture. No single model of parallel computation has been able to span the range of parallel architecture classes. This has resulted in a lack of portability of software between parallel architectu ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Traditionally, parallel programming paradigms are associated with only a single class of parallel architecture. No single model of parallel computation has been able to span the range of parallel architecture classes. This has resulted in a lack of portability of software between parallel architectures, also a lack of portability of programmers' skills. This report describes the latest developments and revisions of the author's work in unifying the basic parallel architecture classes. The original aim was to produce a single, unified parallel architecture class. Each individual parallel architecture class has a parallel programming paradigm traditionally associated with itself. The unified parallel architecture class would be capable of supporting all those parallel programming paradigms. However, the results of this unification process demonstrate that all parallel architecture classes are capable of efficiently simulating the computation (control methods and memory access) of all oth...
Unbalanced Computations onto a Transputer Grid
, 1994
"... Many applications exists that are characterised by having some "core" function code repeatedly applied over all the elements of a compound data structure or over all the elements of an input data stream. Here, a technique is discussed that allows parallel implementations of these applications to be ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Many applications exists that are characterised by having some "core" function code repeatedly applied over all the elements of a compound data structure or over all the elements of an input data stream. Here, a technique is discussed that allows parallel implementations of these applications to be derived that achieve high performance and efficiency in the machine resource usage. The technique is especially effective when the computational load of each application sub-tasks requires a variable amount of time to be computed. Some results concerning the usage of these techniques on a transputer based machine are discussed here along with the technical details of the implementation schemas that have been used. 1 Introduction Many scientific and non-scientific applications can be parallelised in an easy way. We refer to the computations that are characterised by a large amount of time spent in repeatedly computing the same functions/procedures/statements over different input data sets a...
Machine independent Analytical models for cost evaluation of template-based programs
, 1996
"... Structured parallel programming is one of the possible solutions to exploit Programmability, Portability and Performance in the parallel programming world. The power of this approach stands in the possibility to build an optimizing template-- based compiler using low time complexity algorithms. I ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Structured parallel programming is one of the possible solutions to exploit Programmability, Portability and Performance in the parallel programming world. The power of this approach stands in the possibility to build an optimizing template-- based compiler using low time complexity algorithms. In order to optimize the code, this compiler needs formulas that describe the performance of language constructs over the target architecture. We propose a set of parameters able to describe current parallel systems and build deterministic analytical models for basic forms of parallelism. The analytical model describes construct performance in a parametric way.This can be done by knowing that the compiler exploits a template--based support and giving template implementors guidelines to follow to make actual implementation perform as predicted. ACM--CR Subject Classification: Keyword and phrases: Skeletons, performance modeling, parallel languages, template-- based compilers Machine...
A Domain-Specific Parallel Programming System. I: Design and Application to Ecological Modelling
, 1994
"... The goal of the Em project is to make parallel programming easily accessible to a broad community of scientists. Previous approaches such as the use of general parallel programming languages and parallelizing compilers for sequential languages have fallen short in this respect. The approach is to de ..."
Abstract
- Add to MetaCart
The goal of the Em project is to make parallel programming easily accessible to a broad community of scientists. Previous approaches such as the use of general parallel programming languages and parallelizing compilers for sequential languages have fallen short in this respect. The approach is to design a special purpose programming language which is oriented towards a specific area of application. The result is a specialized and effective scientific tool. Em is a high-level programming system which puts parallelism into the hands of scientists who are not sophisticated programmers. By restricting and simplifying the programming interface, Em eases both the conceptual task of the programmer and the analytical task of the compiler. The model of success is the financial spreadsheet, a specialized tool which makes programmers out of relatively naive end-users and makes computer technology broadly accessible to business. Here the initial prototype is described, motivated by practical ecological modelling problems.
Towards a Unified Parallel Architecture Class
- Department of Computer Science, University of Exeter
, 1994
"... Traditionally, parallel programming paradigms are associated with only a single class of parallel architecture. This has resulted in a lack of portability of software between parallel architectures, also a lack of portability of programmers' skills. This report describes the author's work in unif ..."
Abstract
- Add to MetaCart
Traditionally, parallel programming paradigms are associated with only a single class of parallel architecture. This has resulted in a lack of portability of software between parallel architectures, also a lack of portability of programmers' skills. This report describes the author's work in unifying the basic parallel architecture classes with the aim of producing a single, unified parallel architecture class. Such a unified class would be capable of supporting all the parallel programming paradigms traditionally associated with the individual parallel architecture classes unified to form the single parallel architecture class. The results of this unification process demonstrate that all parallel architecture classes are capable of efficiently simulating the computation (control methods and memory access) of all other parallel architecture classes, communication can not be so simulated efficiently due to certain architecture classes being non-scalable in terms of bandwidth. ...
Parallel 1D-FFT Computation on Constant-valence Multicomputers
, 1995
"... This paper addresses the problem of monodimensional (1D) FFT parallel computation on constantvalence multicomputers, i.e. on parallel systems made up of processing elements (PEs) which do not share memory and are connected to a bounded number of neighbours. After a qualitative analysis of several po ..."
Abstract
- Add to MetaCart
This paper addresses the problem of monodimensional (1D) FFT parallel computation on constantvalence multicomputers, i.e. on parallel systems made up of processing elements (PEs) which do not share memory and are connected to a bounded number of neighbours. After a qualitative analysis of several possible partitionings of the DIT FFT algorithm, a decomposition is introduced that has good scalability properties and makes it possible to use sections of sequential code based on the most common 1D-FFT algorithms. If a computing architecture with indirect binary n-cube interconnection network is used, the proposed decomposition guarantees strictly local communications and therefore requires no through-routing support. These characteristics have a positive impact on software development and also on overall performance. Furthermore, thanks to a pipelined organization of the PEs, the resulting architecture has high potentialities for real-time signal processing. As these useful features are obtained at the `expense' of an uneven workload distribution, computing efficiency is relatively low but does not significantly change in a wide range of the number of processors. An implementation on a Transputer-based system is presented along with the performance results obtained. Finally a simple analytical model of the architecture is shown, that allows the values of the main performance parameters to be obtained as a function of the number of processors used and of the elementary response times of the first stage of PEs. key words : FFT; parallel processing; constant-valence multicomputers; transputer

