Results 1  10
of
14
Master/Slave Computing on the Grid
, 2000
"... Resource selection is fundamental to the performance of master/slave applications. In this paper, we address the problem of promoting performance for distributed master/slave applications targeted to distributed, heterogeneous "Grid" resources. We present a workratebased model of master/slave appl ..."
Abstract

Cited by 41 (3 self)
 Add to MetaCart
Resource selection is fundamental to the performance of master/slave applications. In this paper, we address the problem of promoting performance for distributed master/slave applications targeted to distributed, heterogeneous "Grid" resources. We present a workratebased model of master/slave application performance which utilizes both system and application characteristics to select potentially performanceefficient hosts for both the master and slave processes. Using a Grid allocation strategy based on this performance model, we demonstrate a performance improvement over other selection options for a representative set of Master/Slave applications in both simulated and actual Grid environments.
COLT_HPF, a RunTime Support for the HighLevel Coordination of HPF Tasks
 of HPF Tasks, Concurrency: Practice and Experience, Vol
, 1999
"... ions (SDAs), using a syntax similar to that of HPF. Each instance of an SDA encapsulates distributed data and methods, where methods have exclusive access to encapsulated data. Data parallel tasks are thus started by creating instances of specific SDAs, while the intertask cooperation takes place ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
ions (SDAs), using a syntax similar to that of HPF. Each instance of an SDA encapsulates distributed data and methods, where methods have exclusive access to encapsulated data. Data parallel tasks are thus started by creating instances of specific SDAs, while the intertask cooperation takes place by means of remote synchronous (or asynchronous) method invocations. Note that SDA instances are started dynamically by a so called coordination task, so that the runtime that implements intertask communication has to control passing distributed data structures from one task to another, including any possible remapping that might be needed. The runtime accomplishes this through a handshaking protocol, which exchanges the distribution information about the actual argument (on the caller SDA) and the formal one (on the callee SDA) of a given method. Note that this handshaking protocol is very similar to the COLT HPF protocol to create a channel between two tasks. Finally, even though in th...
An idiomfinding tool for increasing productivity of accelerators
 In ICS
, 2011
"... Suppose one is considering purchase of a computer equipped with accelerators. Or suppose one has access to such a computer and is considering porting code to take advantage of the accelerators. Is there a reason to suppose the purchase cost or programmer effort will be worth it? It would be nice to ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Suppose one is considering purchase of a computer equipped with accelerators. Or suppose one has access to such a computer and is considering porting code to take advantage of the accelerators. Is there a reason to suppose the purchase cost or programmer effort will be worth it? It would be nice to able to estimate the expected improvements in advance of paying money or time. We exhibit an analytical framework and toolset for providing such estimates: the tools first look for userdefined idioms that are patterns of computation and data access identified in advance as possibly being able to benefit from accelerator hardware. A performance model is then applied to estimate how much faster these idioms would be if they were ported and run on the accelerators, and a recommendation is made as to whether or not each idiom is worth the porting effort to put them on the
Parallelizing a Set of 2D Frequency Transforms in a Flexible Manner
 IEE Part I (Vision, Image and Signal Processing
, 1997
"... The implementation of parallel 2D frequency transforms intended for the acceleration of imageprocessing algorithms is described. The way these routines fit into a wider generic format for such parallel routines is also indicated. The paper touches on the design decisions needed to marry choice ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
The implementation of parallel 2D frequency transforms intended for the acceleration of imageprocessing algorithms is described. The way these routines fit into a wider generic format for such parallel routines is also indicated. The paper touches on the design decisions needed to marry choice of efficient routine with wide utility. Due consideration is given to auxiliary functions. Details of the bookkeeping required to enable realvalued data to be efficiently transformed in a parallel setting are included. 1 Introduction This paper describes the design and implementation of a generic suite of 2D frequency transform programs intended for messagepassing multicomputers. The objective was an imageprocessing library implemented in a common format which allows the user to select a serial or a parallel mode of operation [ 1 ] . The Fourier transform is central to the class of frequency 1 transforms. 2D Discrete Fourier transform (DFT) algorithms include: the Vector Radix (VR)...
Parallel RealTime Complexity Theory
, 2002
"... We present a new complexity theoretic approach to realtime computations. We define timed ωlanguages as a new formal model for such computations, that we believe to allow a unified treatment of all variants of realtime computations that are meaningful in practice. To our knowledge, such a practic ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We present a new complexity theoretic approach to realtime computations. We define timed ωlanguages as a new formal model for such computations, that we believe to allow a unified treatment of all variants of realtime computations that are meaningful in practice. To our knowledge, such a practically meaningful formal definition does not exist at this time. In order to support our claim that timed ωlanguages capture all the realtime characteristics that are important in practice, we use this formalism to model the two most important features of realtime algorithms, namely the presence of deadlines and the realtime arrival of input data. We emphasize the expressive power of our model by using it to formalize aspects from the areas of realtime database systems and ad hoc networks. We also offer a complexity theoretic characterization of parallel realtime computations. First, we define complexity classes that capture the intuitive notion of resource requirements for realtime computations in a parallel environment. Then, we show that realtime algorithms form an infinite hierarchy with respect to the number of processors used, and
Parallel pipeline to ATM: Graphical simulation techniques
 In 15 th UK Performance Engineering Workshop, UKPEW'99
, 1999
"... Experience gained from constructing a graphical simulator of pipelines of processor farms has been transferred to prototype simulations of an ATM bufferedBanyan switch, and a slottedring network. Ways to make a meaningful and appealing display are indicated. The qualitative information that ca ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Experience gained from constructing a graphical simulator of pipelines of processor farms has been transferred to prototype simulations of an ATM bufferedBanyan switch, and a slottedring network. Ways to make a meaningful and appealing display are indicated. The qualitative information that can be usefully shown is considered. The design and evaluation of the simulators is included.
Scheduling Schemes for Data Farming
 IEE Proceedings Part E (Computers and Digital Techniques
, 1999
"... The use of order statistics to arrive at a scheduling regime is shown to be applicable to data farms running on secondgeneration parallel processors. Uniform and decreasing tasksize scheduling regimes are examined. Experimental timings and a further simulation for largescale effects were used ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The use of order statistics to arrive at a scheduling regime is shown to be applicable to data farms running on secondgeneration parallel processors. Uniform and decreasing tasksize scheduling regimes are examined. Experimental timings and a further simulation for largescale effects were used to exercise the scheduling regimes. The paper also considers a number of other scheduling schemes for data farms. It is shown that a method previously used for loopscheduling is preferable, particularly as a form of automatic and generalised scheduling for data farming where there is a datadependent workload. 1 Introduction A processor or data farm [ 1 ] is a programming paradigm involving messagepassing in which a single task is repeatedly executed in parallel on a collection of initial data. Datafarming is a commonlyused paradigm in parallel processing [ 2 ] and appears in numerous guises: some (networkofworkstations) NOWbased [ 3 ] ; some based on dedicated multicomputers [ 4...
Performance Metrics for Embedded Parallel Pipelines
 Proc. Caltech Conference on VLSI
, 1979
"... A statistical approach to performance prediction is applied to a system development methodology for pipelines comprised of independent parallel stages. The methodology is aimed at distributed memory machines employing mediumgrained parallelization. The target applications are continuousflow emb ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
A statistical approach to performance prediction is applied to a system development methodology for pipelines comprised of independent parallel stages. The methodology is aimed at distributed memory machines employing mediumgrained parallelization. The target applications are continuousflow embedded systems. The use of order statistics on this type of system is compared to previous practical usage which appears largely confined to traditional NonUniform Memory Access (NUMA) machines for loop parallelization. A range of suitable performance metrics which give upper bounds or estimates for task durations are discussed. The metrics have a practical role when included in prediction equations in checking fidelity to an application performance specification. An empirical study applies the mathematical findings to the performance of a multicomputer for a synchronous pipeline stage. The results of a simulation are given for larger numbers of processors. In a further simulation, the results are extended to take account of waitingtime distributions while data are buffered between stages of an asynchronous pipeline. Order statistics are also employed to estimate the degradation due to an output ordering constraint. Practical illustrations in the image communication and vision application domains are included. Index terms: performance prediction, parallel pipelines, realtime systems, order statistics 2 1
Parallel Efficient Algorithms and Their Programming.
"... measures used to analyze algorithms are depth and work; arithmetic and communication costs are distinguished. The one corresponds to operations performed (macroinstructions nodes) while the other to access in the shared memory (data dependencies nodes). Arithmetic work and depth have been used for ..."
Abstract
 Add to MetaCart
measures used to analyze algorithms are depth and work; arithmetic and communication costs are distinguished. The one corresponds to operations performed (macroinstructions nodes) while the other to access in the shared memory (data dependencies nodes). Arithmetic work and depth have been used for many years to analyze performances of parallel algorithms 8 In such a DFG, any output may be equivalently seen as a polynomial whose indeterminates are the inputs. The arithmetic degree is then the maximal degree of polynomials corresponding to the outputs. 28 CHAPTER 1. PARALLEL EFFICIENT ALGORITHMS [9, 55, 35, 28, 6]. Due to experimental constraints, the relevance of communications costs (i.e. total communication traffic  work  and total communications delay) has been pointed out to obtain practical performant programs [5, 19]. Since minimizing communications overhead and minimizing parallel time are antagonist, good tradeoffs have been studied for several common algorithms [47, 1...