Results 11 -
17 of
17
Minimizing Redundant Dependencies and Interprocessor Synchronizations
- International Journal of Parallel Programming
, 1994
"... Run-time synchronization overhead is a crucial factor in achieving speedup for parallel computers. A new algorithm for removing redundant dependencies and minimizing interprocessor synchronizations in a multiprocessor system is presented in this paper. In our simulation, on the average, only 0.59% o ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Run-time synchronization overhead is a crucial factor in achieving speedup for parallel computers. A new algorithm for removing redundant dependencies and minimizing interprocessor synchronizations in a multiprocessor system is presented in this paper. In our simulation, on the average, only 0.59% of the initial dependencies are required synchronizations. The algorithm has O(n 3 ) time complexity and O(n 2 ) space complexity. key words: multiprocessor scheduling, transitive closure, transitive reduction, interprocessor synchronization. 1 Introduction Run-time synchronization overhead is a crucial factor in achieving effective speedup when a system of tasks with dependencies among the tasks is executed on a MIMD machine with identical processing elements. In this paper, we describe a new two-phase algorithm for minimizing synchronizations in a multiprocessor system. In the first phase, redundant dependencies are removed before scheduling; in the second phase, interprocessor synch...
Exploiting Parallelism o.n a Fine-Grained MIMD Architecture Based Upon Channel Queues1
, 1991
"... We present techniques for exploiting fine-grained parallelism extracted from sequential programs on a fine-grained MIMD system. The system exploits fine-grained parallelism through parallel execution of instructions on multiple processors as well as pipelined nature of individual processors. The pro ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We present techniques for exploiting fine-grained parallelism extracted from sequential programs on a fine-grained MIMD system. The system exploits fine-grained parallelism through parallel execution of instructions on multiple processors as well as pipelined nature of individual processors. The processors can communicate data values via globally shared registers as well as dedicated channel queues. Compilation techniques are presented to utilize these mecha-nisms. A scheduling algorithm has been developed to distribute operations among the processors in a manner that reduces communication among the processors. The compiler identifies data dependencies which require syn-chronization and enforces them using channel queues. Delays that may result by attempting write operations to a full channel queue are avoided by spilling values from channels to local registers. If an interprocessor data dependency does not require synchronization, then the data value is passed through a shared register or shared memory. KEY WORDS: Multiprocessor systems; parallelizing compilers; fine-grained parallelism; top-down scheduling; redundant synchronization; channel queues.
A Strategy for Exploiting Implicit Loop Parallelism in Java Programs
, 1996
"... In this paper, we explore a strategy that can be used by a source to source restructuring compiler to exploit implicit loop parallelism in Java programs. First, the compiler must identify the parallel loops in a program. Thereafter, the compiler explicitly expresses this parallelism in the transform ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper, we explore a strategy that can be used by a source to source restructuring compiler to exploit implicit loop parallelism in Java programs. First, the compiler must identify the parallel loops in a program. Thereafter, the compiler explicitly expresses this parallelism in the transformed program using the multithreading mechanism of Java. Finally, after a single compilation of the transformed program into Java byte-code, speedup can be obtained on any platform on which the Java byte-code interpreter supports actual concurrent execution of threads, whereas threads only induce a slight overhead for serial execution. In addition, this approach can enable a compiler to explicitly express the scheduling policy of each parallel loop in the program. 1 Introduction One of the important design goals of the Java programming language was to provide a truly architectural neutral language, so that Java applications could run on any platform. To achieve this objective, a Java program ...
Abstract Compiler Algorithms for Event Variable Synchronization
"... Event variable synchronization is a well-known mechanism for enforcing data dependences in a program that runs in parallel on a shared memory multiprocessor. This paper presents compiler algorithms to automatically generate event variable synchronization code. Previously published algorithms dealt w ..."
Abstract
- Add to MetaCart
Event variable synchronization is a well-known mechanism for enforcing data dependences in a program that runs in parallel on a shared memory multiprocessor. This paper presents compiler algorithms to automatically generate event variable synchronization code. Previously published algorithms dealt with single parallel loops in which dependence distances are constant and known by the compiler. However, loops in real application programs are often arbitrarily nested. Moreover, compilers are often unable to determine dependence distances. In contrast, our algorithms generate synchronization code based directly on array subscripts and do not require constant distances in data dependences. The algorithms are designed for arbitrarily nested loops, including triangular or trapezoidal loops. 1
Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks
- IEEE Transactions on Parallel and Distributed Systems
, 1992
"... We have studied the effectiveness of parallelizing compilers and the underlying transformation techniques. This paper reports the speedups of the Perfect Benchmarks TM codes that result from automatic parallelization. We have further measured the performance gains caused by individual restructurin ..."
Abstract
- Add to MetaCart
We have studied the effectiveness of parallelizing compilers and the underlying transformation techniques. This paper reports the speedups of the Perfect Benchmarks TM codes that result from automatic parallelization. We have further measured the performance gains caused by individual restructuring techniques. Specific reasons for the successes and failures of the transformations are discussed, and potential improvements that result in measurably better program performance are analyzed. Our most important findings are that available restructurers often cause insignificant performance gains in real programs and that only few restructuring techniques contribute to this gain. However, we can also show that there is potential for advancing compiler technology so that many of the most important loops in these programs can be parallelized. Keywords: Automatic parallelization, restructuring techniques, effectiveness analysis, compiler evaluation, Perfect Benchmarks 1 Introduction 1.1 Moti...
Barrier Synchronisation Optimisation
"... This paper describes a new compiler algorithm to reduce the number of barrier synchronisations in parallelised programs. A preliminary technique to rapidly determine critical data dependences is developed. This forms the basis of the Fast First Sink (FFS) algorithm which places, provably, the mi ..."
Abstract
- Add to MetaCart
This paper describes a new compiler algorithm to reduce the number of barrier synchronisations in parallelised programs. A preliminary technique to rapidly determine critical data dependences is developed. This forms the basis of the Fast First Sink (FFS) algorithm which places, provably, the minimal number of barriers in polynomial time for codes with a regular structure. This algorithm is implemented in a prototype compiler and applied to three well known benchmarks. Preliminary results show that it outperforms an existing state-of the-art commercial compiler.

