Results 1 - 10
of
55
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization
, 1995
"... Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we advocate a novel framework for their identification: speculatively e ..."
Abstract
-
Cited by 185 (36 self)
- Add to MetaCart
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we advocate a novel framework for their identification: speculatively execute the loop as a doall, and apply a fully parallel data dependence test to determine if it had any cross--iteration dependences; if the test fails, then the loop is re--executed serially. Since, from our experience, a significant amount of the available parallelism in Fortran programs can be exploited by loops transformed through privatization and reduction parallelization, our methods can speculatively apply these transformations and then check their validity at run--time. Another important contribution of this paper is a novel method for reduction recognition which goes beyond syntactic pattern matching: it detects at run--time if the values stored in an array participate in a reduction operation, even if they are transferred through private variables and/or are affected by statically unpredictable control flow. We present experimental results on loops from the PERFECT Benchmarks which substantiate our claim that these techniques can yield significant speedups which are often superior to those obtainable by inspector/executor methods. The methods presented in this paper differ from and extend our previous work on several important points. First, instead of distributing the loop into inspector and executor loops (the approach taken in all previous work on run-- time parallelization) we advocate the use of run--time tests to validate the execution of a loop that is speculatively executed in parallel. Second, in addition to array privatization, the new techniques are capa...
ParaScope: a parallel programming environment
- PROCEEDINGS OF THE IEEE
, 1993
"... The ParaScope parallel programming environment developed to support scientific programming of shared-memory multiprocessors, includes a collection of tools that use global program analysis to help users develop and debug parallel programs. This paper focuses on ParaScope’s compilation system, its pa ..."
Abstract
-
Cited by 120 (33 self)
- Add to MetaCart
The ParaScope parallel programming environment developed to support scientific programming of shared-memory multiprocessors, includes a collection of tools that use global program analysis to help users develop and debug parallel programs. This paper focuses on ParaScope’s compilation system, its parallel program editor, and its parallel debugging system. The compilation system extends the traditional single-procedure compiler by providing a mechanism for managing the compilation of complete programs. Thus, ParaScope can support both traditional single-procedure optimization and optimization across procedure boundaries. The ParaScope editor brings both compiler analysis and user expertise to bear on program parallelization. It assists the knowledgeable user by displaying and managing analysis and by proiiding a variety of interactive program tran.formation.s that are effective in exposing parallelism. The debugging svstem detects and reports timing-dependent errors, called data races, in execution of parallel programs. The system combines static analysis. program instrumentation. and run-time reporting to provide a mechanical system for isolating errors in parallel program executions. Finally, we describe a new project to extend ParaScope to support programming in Fortran D, a machine-independent parallel pro-gramming language intended for use with both distributed-memory and shared-memory parallel computers..
Automatic Array Privatization
- IN UTPAL BANERJEEDAVID GELERNTERALEX NICOLAUDAVID PADUA, EDITOR, PROC. SIXTH WORKSHOP ON LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING
, 1993
"... Array privatization is one of the most effective transformations for the exploitation of parallelism. In this paper, we present a technique for automatic array privatization. Our algorithm uses data flow analysis of array references to identify privatizable arrays intraprocedurally as well as in ..."
Abstract
-
Cited by 119 (25 self)
- Add to MetaCart
Array privatization is one of the most effective transformations for the exploitation of parallelism. In this paper, we present a technique for automatic array privatization. Our algorithm uses data flow analysis of array references to identify privatizable arrays intraprocedurally as well as interprocedurally. It employs static and dynamic resolution to determine the last value of a lived private array.We compare the result of automatic array privatization with that of manual array privatization and identify directions for future improvement. To enhance the effectiveness of our algorithm, wedevelop a goal directly technique to analysis symbolic variables in the present of conditional statements, loops and index arrays.
Automatic Program Parallelization
, 1993
"... This paper presents an overview of automatic program parallelization techniques. It covers dependence analysis techniques, followed by a discussion of program transformations, including straight-line code parallelization, do loop transformations, and parallelization of recursive routines. The last s ..."
Abstract
-
Cited by 97 (8 self)
- Add to MetaCart
This paper presents an overview of automatic program parallelization techniques. It covers dependence analysis techniques, followed by a discussion of program transformations, including straight-line code parallelization, do loop transformations, and parallelization of recursive routines. The last section of the paper surveys several experimental studies on the effectiveness of parallelizing compilers.
An Exact Method for Analysis of Value-based Array Data Dependences
- In Sixth Annual Workshop on Programming Languages and Compilers for Parallel Computing
, 1993
"... Standard array data dependence testing algorithms give information about the aliasing of array references. If statement 1 writes a[5], and statement 2 later reads a[5], standard techniques described this as a flow dependence, even if there was an intervening write. We call a dependence between two ..."
Abstract
-
Cited by 79 (14 self)
- Add to MetaCart
Standard array data dependence testing algorithms give information about the aliasing of array references. If statement 1 writes a[5], and statement 2 later reads a[5], standard techniques described this as a flow dependence, even if there was an intervening write. We call a dependence between two references to the same memory location a memory-based dependence. In contrast, if there are no intervening writes, the references touch the same value and we call the dependence a value-based dependence. There has been a surge of recent work on value-based array data dependence analysis (also referred to as computation of array data-flow dependence information). In this paper, we describe a technique that is exact over programs without control flow (other than loops) and non-linear references. We compare our proposal with the technique proposed by Paul Feautrier, which is the other technique that is complete over the same domain as ours. We also compare our work with that of Tu and Padua, a ...
Interprocedural array regions analyses
, 1995
"... In order to perform powerful program optimizations, an exact interprocedural analysis of array data ow is needed. For that purpose, two new types of array region are introduced. IN and OUT regions represent the sets of array elements, the values of which are imported to or exported from the current ..."
Abstract
-
Cited by 64 (7 self)
- Add to MetaCart
In order to perform powerful program optimizations, an exact interprocedural analysis of array data ow is needed. For that purpose, two new types of array region are introduced. IN and OUT regions represent the sets of array elements, the values of which are imported to or exported from the current statement or procedure. Among the various applications are: compilation of communications for message-passing machines, array privatization, compile-time optimization of local memory or cache behavior in hierarchical memory machines.
Nonlinear Array Dependence Analysis
, 1991
"... Standard array data dependence techniques can only reason about linear constraints. There has also been work on analyzing some dependences involving polynomial constraints. Analyzing array data dependences in real-world programs requires handling many "unanalyzable" terms: subscript arrays, run-time ..."
Abstract
-
Cited by 63 (5 self)
- Add to MetaCart
Standard array data dependence techniques can only reason about linear constraints. There has also been work on analyzing some dependences involving polynomial constraints. Analyzing array data dependences in real-world programs requires handling many "unanalyzable" terms: subscript arrays, run-time tests, function calls. The standard approach to analyzing such programs has been to omit and ignore any constraints that cannot be reasoned about. This is unsound when reasoning about value-based dependences and whether privatization is legal. Also, this prevents us from determining the conditions that must be true to disprove the dependence. These conditions could be checked by a run-time test or verified by a programmer or aggressive, demand-driven interprocedural analysis. We describe a solution to these problems. Our solution makes our system sound and more accurate for analyzing value-based dependences and derives conditions that can be used to disprove dependences. We also give some p...
The Privatizing DOALL Test: A Run-Time Technique for DOALL Loop Identification and Array Privatization
- In Proceedings of the 1994 International Conference on Supercomputing
, 1994
"... Current parallelizing compilers cannot extract a significant fraction of the available parallelism in a loop if it has a complex and/or statically insufficiently defined access pattern. This is an important issue because a large class of complex simulations used in industry today have irregular doma ..."
Abstract
-
Cited by 43 (16 self)
- Add to MetaCart
Current parallelizing compilers cannot extract a significant fraction of the available parallelism in a loop if it has a complex and/or statically insufficiently defined access pattern. This is an important issue because a large class of complex simulations used in industry today have irregular domains and/or dynamically changing interactions. To handle these types of problems methods capable of automatically extracting parallelism at run--time are needed. For this reason, we have developed the Privatizing DOALL test -- a technique for identifying fully parallel loops at run--time, and dynamically privatizing scalars and arrays. The test is fully parallel, requires no synchronization, is easily automatable, and can be applied to any loop, regardless of its access pattern. We show that the expected speedup for fully parallel loops is significant, and the cost of a failed test (a not fully parallel loop) is minimal. We present experimental results on loops from the PERFECT Benchmarks whi...
Polaris: A New-Generation Parallelizing Compiler for MPPs
- In CSRD Rept. No. 1306. Univ. of Illinois at Urbana-Champaign
, 1993
"... ion for Inner Loops When the algorithm finds a loop nested inside a loop body, it will recursively call itself on the inner loop. To hide the control flow of an inner loop, we introduce some abstraction and extend the previous definition from a basic block to a complete loop. We start by defining t ..."
Abstract
-
Cited by 43 (7 self)
- Add to MetaCart
ion for Inner Loops When the algorithm finds a loop nested inside a loop body, it will recursively call itself on the inner loop. To hide the control flow of an inner loop, we introduce some abstraction and extend the previous definition from a basic block to a complete loop. We start by defining the information for one iteration of the loop. Definition 5 Let L be a loop and V AR be the variables in the program. We define the following set as summary set for body(L). 1. DEF b (L) := fv 2 V AR : v has a MRD reaching all exits node of body(L) g 2. USE b (L) := fv 2 V AR : v has an outward exposed use in body(L) g 3. KILL b (L) := DEF b (L) 4. PRI b (L) := fv 2 V AR : every use of v has a reaching MRD in body(L) g 2 The summary set is an abstraction of the effect of a loop iteration on the data flow values. Using the summary set, we can ignore the structure of the inner loops in the analysis of the outer loop. The trade-off is that we have to make a conservative approximation and ma...

