Results 1 -
6 of
6
Using fine-grain threads and run-time decision making in parallel computing
- Journal of Parallel and Distributed Computing
, 1996
"... Programming distributed-memory multiprocessors and networks of workstations requires deciding what can execute concurrently, how processes communicate, and where data is placed. These decisions can be made statically by a programmer or compiler, or they can be made dynamically at run time. Using run ..."
Abstract
-
Cited by 33 (14 self)
- Add to MetaCart
(Show Context)
Programming distributed-memory multiprocessors and networks of workstations requires deciding what can execute concurrently, how processes communicate, and where data is placed. These decisions can be made statically by a programmer or compiler, or they can be made dynamically at run time. Using run-time decisions leads to a simpler interface—because decisions are implicit—and it can lead to better decisions—because more information is available. This paper examines the costs, benefits, and details of making decisions at run time. The starting point is explicit fine-grain parallelism with any number (even thousands) of threads. Five specific techniques are considered: (1) implicitly coarsening the granularity of parallelism, (2) using implicit communication implemented by a distributed shared memory, (3) overlapping computation and communication, (4) adaptively moving threads and data between nodes to minimize communication and balance load, and (5) dynamically remapping data to pages to avoid false sharing. Details are given on the performance of each of these techniques as well as their overall performance on several scientific applications. 1
A Sisal Compiler for Both Distributed- and Shared-Memory Machines
- In High Performance Functional Computing
, 1995
"... This paper describes a prototype Sisal compiler that supports distributed- as well as shared-memory machines. The compiler, fsc, modifies the code-generation phase of the optimizing Sisal compiler, osc, to use the Filaments library as a run-time system. Filaments efficiently supports fine-grain para ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This paper describes a prototype Sisal compiler that supports distributed- as well as shared-memory machines. The compiler, fsc, modifies the code-generation phase of the optimizing Sisal compiler, osc, to use the Filaments library as a run-time system. Filaments efficiently supports fine-grain parallelism and a shared-memory programming model. Using fine-grain threads makes it possible to implement recursive as well as loop parallelism; it also facilitates dynamic load balancing. Using a distributed implementation of shared memory (a DSM) simplifies the compiler by obviating the need for explicit message passing. February 21, 1995 Department of Computer Science The University of Arizona Tucson, AZ 85721 1 First published in the "High-Performance Functional Computing Conference", April 1995. 2 This work was supported by NSF grants CCR-9108412 and CDA-8822652. 1 Introduction It is difficult to create a correct and efficient parallel program; this difficulty is compounded because e...
Reducing File-related Network Traffic in TreadMarks via Parallel File Input/Output
"... In this paper, we describe the implementation of a parallel file I/O system on TreadMarks, a page-based software Distributed Shared Memory (DSM) system built on a network of workstations. The main goal of our parallel file I/O system is to reduce filerelated network traffic in TreadMarks. This proto ..."
Abstract
- Add to MetaCart
(Show Context)
In this paper, we describe the implementation of a parallel file I/O system on TreadMarks, a page-based software Distributed Shared Memory (DSM) system built on a network of workstations. The main goal of our parallel file I/O system is to reduce filerelated network traffic in TreadMarks. This prototype employs our previously proposed variable data distribution scheme, which distributes the file blocks among the nodes according to the application’s access pattern, and delayed file access mechanism, which delays the transfer of a requested file block across the network until the block is actually used during computation. Currently, our parallel file I/O system is combined into the user-level library of TreadMarks, with minor modification of TreadMarks ’ code. Due to our UNIX-like interface, the existing TreadMarks programs require very little modifications. The performance improvement of our prototype on Successive Over Relaxation is quite satisfactory while that on Matrix Multiplication is less significant. Keywords: distributed shared memory, parallel file I/O, network of workstations, variable data distribution scheme, delayed file access mechanism 1.
Locality-Preserving Dynamic Load Balancing for Data-Parallel Applications on Distributed-Memory Multiprocessors
"... Load balancing and data locality are the two most important factors affecting the performance of parallel programs running on distributed-memory multiprocessors. A good balancing scheme should evenly distribute the workload among the available processors, and locate the tasks close to their data to ..."
Abstract
- Add to MetaCart
(Show Context)
Load balancing and data locality are the two most important factors affecting the performance of parallel programs running on distributed-memory multiprocessors. A good balancing scheme should evenly distribute the workload among the available processors, and locate the tasks close to their data to reduce communication and idle time. In this paper, we study the load balancing problem of data-parallel loops with predictable neighborhood data references. The loops are characterized by variable and unpredictable execution time due to dynamic external workload. Nevertheless the data referenced by each loop iteration exploits spatial locality of stencil references. We combine an initial static BLOCK scheduling and a dynamic scheduling based on work stealing. Data locality is preserved by careful restrictions on the tasks that can be migrated. Experimental results on a network of workstations are reported.
Improving the Performance of Distributed Shared Memory Systems via Parallel File Input/Output
"... File accesses in page-based software Distributed Shared Memory (DSM) systems are usually performed by a single node, which may lead to a poor overall performance because a large amount of network traffic is generated to transfer data between this file handling node and the other nodes. To reduce the ..."
Abstract
- Add to MetaCart
File accesses in page-based software Distributed Shared Memory (DSM) systems are usually performed by a single node, which may lead to a poor overall performance because a large amount of network traffic is generated to transfer data between this file handling node and the other nodes. To reduce the file-related network traffic in the DSM systems, we have designed a parallel file I/O system, that is independent of the memory consistency models, for the pagebased software DSM systems built on a network of workstations. The two main features in our design are the adaptive data distribution scheme and the delayed file access mechanism. The former distributes file blocks among the nodes according to the access pattern of the application; while the latter ensures that the data are transferred to the consumer node instead of the request node by exploiting the memory mapping features of the virtual shared address space of the DSM systems. Our first prototype is built on Cohesion, a page-base ...
FINE-GRAIN PARALLELISM AND RUN-TIME DECISION MAKING
"... Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the aut ..."
Abstract
- Add to MetaCart
Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. Downloaded 10-May-2016 21:06:32 Link to item