Results 11  20
of
355
Ecient and exact data dependence analysis
 Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation
, 1991
"... Data dependence testing is the basic step in detecting loop level parallelism in numerieal programs. The problem is equivalent to integer linear programming and thus in general cannot be solved efficiently. Current methods in use employ inexact methods that sacrifice potential parallelism in order ..."
Abstract

Cited by 121 (8 self)
 Add to MetaCart
Data dependence testing is the basic step in detecting loop level parallelism in numerieal programs. The problem is equivalent to integer linear programming and thus in general cannot be solved efficiently. Current methods in use employ inexact methods that sacrifice potential parallelism in order to improve compiler efficiency. This paper shows that in practice, data dependence can be computed exactly and efficiently. There are three major ideas that lead to this result. First, we have developed and assembled a small set of efficient algorithms, each one exact for special case inputs. Combined with a moderately expensive backup test, they are exact for all the cases we have seen in practice. Second, we introduce a memorization technique to save results of previous tests, thus avoiding calling the data dependence routines multiple times on the same input. Third, we show that this approach can both be extended to compute distance and direction vectors and to use unknowns ymbolic terms without any loss of accuracy or efficiency, We have implemented our algorithm in the SUIF system, a general purpose compiler system developwl at Stanford. We ran the algorithm on the PERFECT Club Benchmarks and our data dependence analyzer gave an exact solution in all cases efficiently, 1
The Multiscalar Architecture
, 1993
"... The centerpiece of this thesis is a new processing paradigm for exploiting instruction level parallelism. This paradigm, called the multiscalar paradigm, splits the program into many smaller tasks, and exploits finegrain parallelism by executing multiple, possibly (control and/or data) dependent t ..."
Abstract

Cited by 117 (8 self)
 Add to MetaCart
The centerpiece of this thesis is a new processing paradigm for exploiting instruction level parallelism. This paradigm, called the multiscalar paradigm, splits the program into many smaller tasks, and exploits finegrain parallelism by executing multiple, possibly (control and/or data) dependent tasks in parallel using multiple processing elements. Splitting the instruction stream at statically determined boundaries allows the compiler to pass substantial information about the tasks to the hardware. The processing paradigm can be viewed as extensions of the superscalar and multiprocessing paradigms, and shares a number of properties of the sequential processing model and the dataflow processing model. The multiscalar paradigm is easily realizable, and we describe an implementation of the multiscalar paradigm, called the multiscalar processor. The central idea here is to connect multiple sequential processors, in a decoupled and decentralized manner, to achieve overall multiple issue. The multiscalar processor supports speculative execution, allows arbitrary dynamic code motion (facilitated by an efficient hardware memory disambiguation mechanism), exploits communication localities, and does all of these with hardware that is fairly straightforward to build. Other desirable aspects of the
Symbolic Bounds Analysis of Pointers, Array Indices, and Accessed Memory Regions
 PLDI 2000
, 2000
"... This paper presents a novel framework for the symbolic bounds analysis of pointers, array indices, and accessed memory regions. Our framework formulates each analysis problem as a system of inequality constraints between symbolic bound polynomials. It then reduces the constraint system to a linear p ..."
Abstract

Cited by 114 (14 self)
 Add to MetaCart
This paper presents a novel framework for the symbolic bounds analysis of pointers, array indices, and accessed memory regions. Our framework formulates each analysis problem as a system of inequality constraints between symbolic bound polynomials. It then reduces the constraint system to a linear program. The solution to the linear program provides symbolic lower and upper bounds for the values of pointer and array index variables and for the regions of memory that each statement and procedure accesses. This approach eliminates fundamental problems associated with applying standard xedpoint approaches to symbolic analysis problems. Experimental results from our implemented compiler show that the analysis can solve several important problems, including static race detection, automatic parallelization, static detection of array bounds violations, elimination of array bounds checks, and reduction of the number of bits used to store computed values.
Symbolic Analysis for Parallelizing Compilers
, 1994
"... Symbolic Domain The objects in our abstract symbolic domain are canonical symbolic expressions. A canonical symbolic expression is a lexicographically ordered sequence of symbolic terms. Each symbolic term is in turn a pair of an integer coefficient and a sequence of pairs of pointers to program va ..."
Abstract

Cited by 106 (4 self)
 Add to MetaCart
Symbolic Domain The objects in our abstract symbolic domain are canonical symbolic expressions. A canonical symbolic expression is a lexicographically ordered sequence of symbolic terms. Each symbolic term is in turn a pair of an integer coefficient and a sequence of pairs of pointers to program variables in the program symbol table and their exponents. The latter sequence is also lexicographically ordered. For example, the abstract value of the symbolic expression 2ij+3jk in an environment that i is bound to (1; (( " i ; 1))), j is bound to (1; (( " j ; 1))), and k is bound to (1; (( " k ; 1))) is ((2; (( " i ; 1); ( " j ; 1))); (3; (( " j ; 1); ( " k ; 1)))). In our framework, environment is the abstract analogous of state concept; an environment is a function from program variables to abstract symbolic values. Each environment e associates a canonical symbolic value e x for each variable x 2 V ; it is said that x is bound to e x. An environment might be represented by...
Automatic Program Parallelization
, 1993
"... This paper presents an overview of automatic program parallelization techniques. It covers dependence analysis techniques, followed by a discussion of program transformations, including straightline code parallelization, do loop transformations, and parallelization of recursive routines. The last s ..."
Abstract

Cited by 105 (8 self)
 Add to MetaCart
This paper presents an overview of automatic program parallelization techniques. It covers dependence analysis techniques, followed by a discussion of program transformations, including straightline code parallelization, do loop transformations, and parallelization of recursive routines. The last section of the paper surveys several experimental studies on the effectiveness of parallelizing compilers.
Tiling Multidimensional Iteration Spaces for Multicomputers
, 1992
"... This paper addresses the problem of compiling perfectly nested loops for multicomputers (distributed memory machines). The relatively high communication startup costs in these machines renders frequent communication very expensive. Motivated by this, we present a method of aggregating a number of lo ..."
Abstract

Cited by 104 (21 self)
 Add to MetaCart
This paper addresses the problem of compiling perfectly nested loops for multicomputers (distributed memory machines). The relatively high communication startup costs in these machines renders frequent communication very expensive. Motivated by this, we present a method of aggregating a number of loop iterations into tiles where the tiles execute atomically  a processor executing the iterations belonging to a tile receives all the data it needs before executing any one of the iterations in the tile, executes all the iterations in the tile and then sends the data needed by other processors. Since synchronization is not allowed during the execution of a tile, partitioning the iteration space into tiles must not result in deadlock. We first show the equivalence between the problem of finding partitions and the problem of determining the cone for a given set of dependence vectors. We then present an approach to partitioning the iteration space into deadlockfree tiles so that communicati...
Putting Pointer Analysis to Work
, 1998
"... This paper addresses the problem of how to apply pointer analysis to a wide variety of compiler applications. We are not presenting a new pointer analysis. Rather, we focus on putting two existing pointer analyses, pointsto analysis and connection analysis, to work. We demonstrate that the fundamen ..."
Abstract

Cited by 92 (8 self)
 Add to MetaCart
This paper addresses the problem of how to apply pointer analysis to a wide variety of compiler applications. We are not presenting a new pointer analysis. Rather, we focus on putting two existing pointer analyses, pointsto analysis and connection analysis, to work. We demonstrate that the fundamental problem is that one must be able to compare the memory locations read/written via pointer indirections, at different program points, and one must also be able to summarize the effect of pointer references over regions in the program. It is straightforward to compute read/write sets for indirections involving stackdirected pointers using pointsto information. However, for heapdirected pointers we show that one needs to introduce the notion of anchor handles into the connection analysis and then express read/write sets to the heap with respect to these anchor handles. Based on the read/write sets we show how to extend traditional optimizations like common subexpression elimination, loop...
Experience in the Automatic Parallelization of Four PerfectBenchmark Programs
 Lecture Notes in Computer Science 589. Proceedings of the Fourth Workshop on Languages and Compilers for Parallel Computing
, 1991
"... . This paper discusses the techniques used to handparallelize, for the Alliant FX/80, four Fortran programs from the PerfectBenchmark suite. The paper also includes the execution times of the programs before and after the transformations. The four programs considered here were not effectively par ..."
Abstract

Cited by 92 (25 self)
 Add to MetaCart
. This paper discusses the techniques used to handparallelize, for the Alliant FX/80, four Fortran programs from the PerfectBenchmark suite. The paper also includes the execution times of the programs before and after the transformations. The four programs considered here were not effectively parallelized by the automatic translators available to the authors. However, most of the techniques used for hand parallelization, and perhaps all of them, have wide applicability and can be incorporated into existing translators. 1. Introduction It is by now widely accepted that in many reallife applications, supercomputers have been unable to deliver a reasonable fraction of their peak performance. An illustration of this is provided by the Perfect Benchmark programs [3], many of which effectively use less than 1% of the computational resources available in the most powerful supercomputers. While it is apparent that the reason for this dismal behavior is sometimes the result of the machine ...
Dynamic Memory Disambiguation Using the Memory Conflict Buffer
 In Proceedings of the 6th Symposium on Architectural Support for Programming Languages and Operating Systems
, 1994
"... To exploit instruction level parallelism, compilers for VLIW and superscalar processors often employ static code scheduling. However, the available code reordering may be severely restricted due to ambiguous dependences between memory instructions. This paper introduces a simple hardware mechanism, ..."
Abstract

Cited by 89 (5 self)
 Add to MetaCart
To exploit instruction level parallelism, compilers for VLIW and superscalar processors often employ static code scheduling. However, the available code reordering may be severely restricted due to ambiguous dependences between memory instructions. This paper introduces a simple hardware mechanism, referred to as the memory con ict bu er, which facilitates static code scheduling in the presence of memory store/load dependences. Correct program execution is ensured by the memory con ict bu er and repair code provided by the compiler. With this addition, signi cant speedup over an aggressivecodescheduling model can be achieved for both nonnumerical and numerical programs. 1
The Range Test: A Dependence Test for Symbolic, Nonlinear
 Proceedings of Supercomputing '94, Washington D.C
, 1994
"... Most current data dependence tests cannot handle loop bounds or array subscripts that are symbolic, nonlinear expressions (e.g. A(n i+j), where 0 j n). In this paper, we describe a dependencetest,called the range test, that can handle such expressions. Briefly, the range test proves independence ..."
Abstract

Cited by 82 (16 self)
 Add to MetaCart
Most current data dependence tests cannot handle loop bounds or array subscripts that are symbolic, nonlinear expressions (e.g. A(n i+j), where 0 j n). In this paper, we describe a dependencetest,called the range test, that can handle such expressions. Briefly, the range test proves independence by determining whether certain symbolic inequalities hold for a permutation of the loop nest. Powerful symbolic analyses and constraint propagation techniques were developedtoprove such inequalities.Therange test has been implemented in Polaris, a parallelizing compiler being developed at the University of Illinois.