Results 1 - 10
of
12
Master/Slave Computing on the Grid
, 2000
"... Resource selection is fundamental to the performance of master/slave applications. In this paper, we address the problem of promoting performance for distributed master/slave applications targeted to distributed, heterogeneous "Grid" resources. We present a work-rate-based model of master/slave appl ..."
Abstract
-
Cited by 41 (3 self)
- Add to MetaCart
Resource selection is fundamental to the performance of master/slave applications. In this paper, we address the problem of promoting performance for distributed master/slave applications targeted to distributed, heterogeneous "Grid" resources. We present a work-rate-based model of master/slave application performance which utilizes both system and application characteristics to select potentially performance-efficient hosts for both the master and slave processes. Using a Grid allocation strategy based on this performance model, we demonstrate a performance improvement over other selection options for a representative set of Master/Slave applications in both simulated and actual Grid environments.
COLT_HPF, a Run-Time Support for the High-Level Coordination of HPF Tasks
- of HPF Tasks, Concurrency: Practice and Experience, Vol
, 1999
"... ions (SDAs), using a syntax similar to that of HPF. Each instance of an SDA encapsulates distributed data and methods, where methods have exclusive access to encapsulated data. Data parallel tasks are thus started by creating instances of specific SDAs, while the inter-task co-operation takes place ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
ions (SDAs), using a syntax similar to that of HPF. Each instance of an SDA encapsulates distributed data and methods, where methods have exclusive access to encapsulated data. Data parallel tasks are thus started by creating instances of specific SDAs, while the inter-task co-operation takes place by means of remote synchronous (or asynchronous) method invocations. Note that SDA instances are started dynamically by a so called coordination task, so that the run-time that implements inter-task communication has to control passing distributed data structures from one task to another, including any possible remapping that might be needed. The run-time accomplishes this through a handshaking protocol, which exchanges the distribution information about the actual argument (on the caller SDA) and the formal one (on the callee SDA) of a given method. Note that this handshaking protocol is very similar to the COLT HPF protocol to create a channel between two tasks. Finally, even though in th...
Parallelizing a Set of 2-D Frequency Transforms in a Flexible Manner
- IEE Part I (Vision, Image and Signal Processing
, 1997
"... The implementation of parallel 2-D frequency transforms intended for the acceleration of image-processing algorithms is described. The way these routines fit into a wider generic format for such parallel routines is also indicated. The paper touches on the design decisions needed to marry choice ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The implementation of parallel 2-D frequency transforms intended for the acceleration of image-processing algorithms is described. The way these routines fit into a wider generic format for such parallel routines is also indicated. The paper touches on the design decisions needed to marry choice of efficient routine with wide utility. Due consideration is given to auxiliary functions. Details of the book-keeping required to enable real-valued data to be efficiently transformed in a parallel setting are included. 1 Introduction This paper describes the design and implementation of a generic suite of 2-D frequency transform programs intended for message-passing multicomputers. The objective was an imageprocessing library implemented in a common format which allows the user to select a serial or a parallel mode of operation [ 1 ] . The Fourier transform is central to the class of frequency 1 transforms. 2-D Discrete Fourier transform (DFT) algorithms include: the Vector Radix (VR)...
Parallel pipeline to ATM: Graphical simulation techniques
- In 15 th UK Performance Engineering Workshop, UKPEW'99
, 1999
"... Experience gained from constructing a graphical simulator of pipelines of processor farms has been transferred to prototype simulations of an ATM buffered-Banyan switch, and a slotted-ring network. Ways to make a meaningful and appealing display are indicated. The qualitative information that ca ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Experience gained from constructing a graphical simulator of pipelines of processor farms has been transferred to prototype simulations of an ATM buffered-Banyan switch, and a slotted-ring network. Ways to make a meaningful and appealing display are indicated. The qualitative information that can be usefully shown is considered. The design and evaluation of the simulators is included.
Scheduling Schemes for Data Farming
- IEE Proceedings Part E (Computers and Digital Techniques
, 1999
"... The use of order statistics to arrive at a scheduling regime is shown to be applicable to data farms running on second-generation parallel processors. Uniform and decreasing task-size scheduling regimes are examined. Experimental timings and a further simulation for large-scale effects were used ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The use of order statistics to arrive at a scheduling regime is shown to be applicable to data farms running on second-generation parallel processors. Uniform and decreasing task-size scheduling regimes are examined. Experimental timings and a further simulation for large-scale effects were used to exercise the scheduling regimes. The paper also considers a number of other scheduling schemes for data farms. It is shown that a method previously used for loop-scheduling is preferable, particularly as a form of automatic and generalised scheduling for data farming where there is a data-dependent workload. 1 Introduction A processor or data farm [ 1 ] is a programming paradigm involving message-passing in which a single task is repeatedly executed in parallel on a collection of initial data. Data-farming is a commonly-used paradigm in parallel processing [ 2 ] and appears in numerous guises: some (network-of-workstations) NOW-based [ 3 ] ; some based on dedicated multicomputers [ 4...
Performance Metrics for Embedded Parallel Pipelines
- Proc. Caltech Conference on VLSI
, 1979
"... A statistical approach to performance prediction is applied to a system development methodology for pipelines comprised of independent parallel stages. The methodology is aimed at distributed memory machines employing medium-grained parallelization. The target applications are continuous-flow emb ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A statistical approach to performance prediction is applied to a system development methodology for pipelines comprised of independent parallel stages. The methodology is aimed at distributed memory machines employing medium-grained parallelization. The target applications are continuous-flow embedded systems. The use of order statistics on this type of system is compared to previous practical usage which appears largely confined to traditional Non-Uniform Memory Access (NUMA) machines for loop parallelization. A range of suitable performance metrics which give upper bounds or estimates for task durations are discussed. The metrics have a practical role when included in prediction equations in checking fidelity to an application performance specification. An empirical study applies the mathematical findings to the performance of a multicomputer for a synchronous pipeline stage. The results of a simulation are given for larger numbers of processors. In a further simulation, the results are extended to take account of waitingtime distributions while data are buffered between stages of an asynchronous pipeline. Order statistics are also employed to estimate the degradation due to an output ordering constraint. Practical illustrations in the image communication and vision application domains are included. Index terms: performance prediction, parallel pipelines, real-time systems, order statistics 2 1
Parallel Efficient Algorithms and Their Programming.
"... measures used to analyze algorithms are depth and work; arithmetic and communication costs are distinguished. The one corresponds to operations performed (macro-instructions nodes) while the other to access in the shared memory (data dependencies nodes). Arithmetic work and depth have been used for ..."
Abstract
- Add to MetaCart
measures used to analyze algorithms are depth and work; arithmetic and communication costs are distinguished. The one corresponds to operations performed (macro-instructions nodes) while the other to access in the shared memory (data dependencies nodes). Arithmetic work and depth have been used for many years to analyze performances of parallel algorithms 8 In such a DFG, any output may be equivalently seen as a polynomial whose indeterminates are the inputs. The arithmetic degree is then the maximal degree of polynomials corresponding to the outputs. 28 CHAPTER 1. PARALLEL EFFICIENT ALGORITHMS [9, 55, 35, 28, 6]. Due to experimental constraints, the relevance of communications costs (i.e. total communication traffic -- work - and total communications delay) has been pointed out to obtain practical performant programs [5, 19]. Since minimizing communications overhead and minimizing parallel time are antagonist, good trade-offs have been studied for several common algorithms [47, 1...
Evaluating Optical-Flow Algorithms on a Parallel Machine
- Image and Vision Comp
"... Algorithmic development of optical-ow routines is hampered by slow turnaround times (to iterate over testing, evaluation, and adjustment of the algorithm). To ease the problem, parallel implementation on a convenient general-purpose parallel machine is possible. A generic parallel pipeline struct ..."
Abstract
- Add to MetaCart
Algorithmic development of optical-ow routines is hampered by slow turnaround times (to iterate over testing, evaluation, and adjustment of the algorithm). To ease the problem, parallel implementation on a convenient general-purpose parallel machine is possible. A generic parallel pipeline structure, suitable for distributed-memory machines, has enabled parallelisation to be quickly achieved. Gradient, correlation, and phase-based methods of optical-ow detection have been constructed to demonstrate the approach.
Neighborhood Composition: A Parallelization of Local Search Algorithms
"... Abstract. To practically solve NP-hard combinatorial optimization problems, local search algorithms and their parallel implementations on PVM or MPI have been frequently discussed. Since a huge number of neighbors may be examined to discover a locally optimal neighbor in each of local search calls, ..."
Abstract
- Add to MetaCart
Abstract. To practically solve NP-hard combinatorial optimization problems, local search algorithms and their parallel implementations on PVM or MPI have been frequently discussed. Since a huge number of neighbors may be examined to discover a locally optimal neighbor in each of local search calls, many of parallelization schemes, excluding so-called the multi-start parallel scheme, try to extract parallelism from a local search by distributing the examinations of neighbors to processors. However, in straightforward implementations, when the next local search starts, all the processors will be assigned to the neighbors of the latest solution, and the results of all (but one) examinations in the previous local search are thus discarded in vain, despite that they would contain useful information on further search. This paper explores the possibility of extracting information even from unsuccessful neighbor examinations in a systematic way to boost parallel local search algorithms. Our key concept is neighborhood composition. We demonstrate how this idea improves parallel implementations on PVM, by taking as examples well-known local search algorithms for the Traveling Salesman Problem. 1

