Results 1 - 10
of
265
The Omega Test: a fast and practical integer programming algorithm for dependence analysis
- Communications of the ACM
, 1992
"... The Omega testi s ani nteger programmi ng algori thm that can determi ne whether a dependence exi sts between two array references, and i so, under what condi7: ns. Conventi nalwi[A m holds thati nteger programmiB techni:36 are far too expensi e to be used for dependence analysi6 except as a method ..."
Abstract
-
Cited by 407 (15 self)
- Add to MetaCart
The Omega testi s ani nteger programmi ng algori thm that can determi ne whether a dependence exi sts between two array references, and i so, under what condi7: ns. Conventi nalwi[A m holds thati nteger programmiB techni:36 are far too expensi e to be used for dependence analysi6 except as a method of last resort for si:8 ti ns that cannot be deci:A by si[976 methods. We present evi[77B that suggests thiwi sdomi s wrong, and that the Omega testi s competi ti ve wi th approxi mate algori thms usedi n practi ce and sui table for usei n producti on compi lers. Experi ments suggest that, for almost all programs, the average ti me requi red by the Omega test to determi ne the di recti on vectors for an array pai ri s less than 500 secs on a 12 MIPS workstati on. The Omega testi based on an extensi n of Four i0-Motzki var i ble eli937 ti n (aliB: r programmiA method) toi nteger programmi ng, and has worst-case exponenti al ti me complexi ty. However, we show that for manysiB7 ti ns i whi h ...
A First Step towards Automated Detection of Buffer Overrun Vulnerabilities
- In Network and Distributed System Security Symposium
, 2000
"... We describe a new technique for finding potential buffer overrun vulnerabilities in security-critical C code. The key to success is to use static analysis: we formulate detection of buffer overruns as an integer range analysis problem. One major advantage of static analysis is that security bugs can ..."
Abstract
-
Cited by 314 (9 self)
- Add to MetaCart
We describe a new technique for finding potential buffer overrun vulnerabilities in security-critical C code. The key to success is to use static analysis: we formulate detection of buffer overruns as an integer range analysis problem. One major advantage of static analysis is that security bugs can be eliminated before code is deployed. We have implemented our design and used our prototype to find new remotely-exploitable vulnerabilities in a large, widely deployed software package. An earlier hand audit missed these bugs. 1.
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization
, 1995
"... Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we advocate a novel framework for their identification: speculatively e ..."
Abstract
-
Cited by 185 (36 self)
- Add to MetaCart
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we advocate a novel framework for their identification: speculatively execute the loop as a doall, and apply a fully parallel data dependence test to determine if it had any cross--iteration dependences; if the test fails, then the loop is re--executed serially. Since, from our experience, a significant amount of the available parallelism in Fortran programs can be exploited by loops transformed through privatization and reduction parallelization, our methods can speculatively apply these transformations and then check their validity at run--time. Another important contribution of this paper is a novel method for reduction recognition which goes beyond syntactic pattern matching: it detects at run--time if the values stored in an array participate in a reduction operation, even if they are transferred through private variables and/or are affected by statically unpredictable control flow. We present experimental results on loops from the PERFECT Benchmarks which substantiate our claim that these techniques can yield significant speedups which are often superior to those obtainable by inspector/executor methods. The methods presented in this paper differ from and extend our previous work on several important points. First, instead of distributing the loop into inspector and executor loops (the approach taken in all previous work on run-- time parallelization) we advocate the use of run--time tests to validate the execution of a loop that is speculatively executed in parallel. Second, in addition to array privatization, the new techniques are capa...
Dynamic Speculation and Synchronization of Data Dependencies
, 1997
"... Data dependence speculation is used in instruction-level parallel (ILP) processors to allow early execution of an instruction before a logically preceding instruction on which it may be data dependent. If the instruction is independent, data dependence speculation succeeds; if not, it fails, and the ..."
Abstract
-
Cited by 165 (21 self)
- Add to MetaCart
Data dependence speculation is used in instruction-level parallel (ILP) processors to allow early execution of an instruction before a logically preceding instruction on which it may be data dependent. If the instruction is independent, data dependence speculation succeeds; if not, it fails, and the two instructions must be synchronized. The modern dynamically scheduled processors that use data dependence speculation do so blindly (i.e., every load instruction with unresolved dependences is speculated). In this paper, we demonstrate that as dynamic instruction windows get larger, significant performance benefits can result when intelligent decisions about data dependence speculation are made. We propose dynamic data dependence speculation techniques: (i) to predict if the execution of an instruction is likely to result in a data dependence mis-speculation, and (ii) to provide the synchronization needed to avoid a mis-speculation. Experimental results evaluating the effectiveness of the...
Practical Dependence Testing
, 1991
"... Precise and efficient dependence tests are essential to the effectiveness of a parallelizing compiler. This paper proposes a dependence testing scheme based on classifying pairs of subscripted variable references. Exact yet fast dependence tests are presented for certain classes of array references, ..."
Abstract
-
Cited by 131 (16 self)
- Add to MetaCart
Precise and efficient dependence tests are essential to the effectiveness of a parallelizing compiler. This paper proposes a dependence testing scheme based on classifying pairs of subscripted variable references. Exact yet fast dependence tests are presented for certain classes of array references, as well as empirical results showing that these references dominate scientific Fortran codes. These dependence tests are being implemented at Rice University in both PFC, a parallelizing compiler, and ParaScope, a parallel programming environment.
A Schema for Interprocedural Modification Side-Effect Analysis With Pointer Aliasing
- In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation
, 2001
"... The first interprocedural modification side-effects analysis for C (MOD_C) that obtains better than worst-case precision on programs with general-purpose pointer usage is presented with empirical results. The analysis consists of an algorithm schema corresponding to a family of MODC algorithms with ..."
Abstract
-
Cited by 126 (13 self)
- Add to MetaCart
The first interprocedural modification side-effects analysis for C (MOD_C) that obtains better than worst-case precision on programs with general-purpose pointer usage is presented with empirical results. The analysis consists of an algorithm schema corresponding to a family of MODC algorithms with two independent phases: one for determining pointer-induced aliases and a subsequent one for propagating interprocedural side effects. These MOD_C algorithms are parameterized by the aliasing method used. The empirical results compare the performance of two dissimilar MOD_C algorithms: MOD_C(FSAlias) uses a flow-sensitive, calling-context-sensitive interprocedural alias analysis [LR92]; MOD_C(FIAlias) uses a flow-insensitive, calling-context-insensitive alias analysis which is much faster, but less accurate. These two algorithms were profiled on 45 programs ranging in size from 250 to 30,000 lines of C code, and the results demonstrate dramatically the possible cost-precision tradeoffs. This first comparative implementation of MODC analyses offers insight into the differences between flow-/context-sensitive and flow-/context-insensitive analyses. The analysis cost versus precision tradeoffs in side-effect information obtained is reported. The results show surprisingly that the precision of flow-sensitive side-effect analysis is not always prohibitive in cost, and that the precision of flow-insensitive analysis is substantially better than worst-case estimates and seems sufficient for certain applications. On average MODC (FSAlias) for procedures and calls is in the range of 20% more precise than MODC (F IAlias); however, the performance was found to be at least an order of magnitude slower than MODC (F IAlias).
Pointer Analysis for Multithreaded Programs
- ACM SIGPLAN 99
, 1999
"... This paper presents a novel interprocedural, flow-sensitive, and context-sensitive pointer analysis algorithm for multithreaded programs that may concurrently update shared pointers. For each pointer and each program point, the algorithm computes a conservative approximation of the memory locations ..."
Abstract
-
Cited by 125 (13 self)
- Add to MetaCart
This paper presents a novel interprocedural, flow-sensitive, and context-sensitive pointer analysis algorithm for multithreaded programs that may concurrently update shared pointers. For each pointer and each program point, the algorithm computes a conservative approximation of the memory locations to which that pointer may point. The algorithm correctly handles a full range of constructs in multithreaded programs, including recursive functions, function pointers, structures, arrays, nested structures and arrays, pointer arithmetic, casts between pointer variables of different types, heap and stack allocated memory, shared global variables, and thread-private global variables. We have implemented the algorithm in the SUIF compiler system and used the implementation to analyze a sizable set of multithreaded programs written in the Cilk multithreaded programming language. Our experimental results show that the analysis has good precision and converges quickly for our set of Cilk programs.
ParaScope: a parallel programming environment
- PROCEEDINGS OF THE IEEE
, 1993
"... The ParaScope parallel programming environment developed to support scientific programming of shared-memory multiprocessors, includes a collection of tools that use global program analysis to help users develop and debug parallel programs. This paper focuses on ParaScope’s compilation system, its pa ..."
Abstract
-
Cited by 120 (33 self)
- Add to MetaCart
The ParaScope parallel programming environment developed to support scientific programming of shared-memory multiprocessors, includes a collection of tools that use global program analysis to help users develop and debug parallel programs. This paper focuses on ParaScope’s compilation system, its parallel program editor, and its parallel debugging system. The compilation system extends the traditional single-procedure compiler by providing a mechanism for managing the compilation of complete programs. Thus, ParaScope can support both traditional single-procedure optimization and optimization across procedure boundaries. The ParaScope editor brings both compiler analysis and user expertise to bear on program parallelization. It assists the knowledgeable user by displaying and managing analysis and by proiiding a variety of interactive program tran.formation.s that are effective in exposing parallelism. The debugging svstem detects and reports timing-dependent errors, called data races, in execution of parallel programs. The system combines static analysis. program instrumentation. and run-time reporting to provide a mechanical system for isolating errors in parallel program executions. Finally, we describe a new project to extend ParaScope to support programming in Fortran D, a machine-independent parallel pro-gramming language intended for use with both distributed-memory and shared-memory parallel computers..
Automatic Array Privatization
- IN UTPAL BANERJEEDAVID GELERNTERALEX NICOLAUDAVID PADUA, EDITOR, PROC. SIXTH WORKSHOP ON LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING
, 1993
"... Array privatization is one of the most effective transformations for the exploitation of parallelism. In this paper, we present a technique for automatic array privatization. Our algorithm uses data flow analysis of array references to identify privatizable arrays intraprocedurally as well as in ..."
Abstract
-
Cited by 119 (25 self)
- Add to MetaCart
Array privatization is one of the most effective transformations for the exploitation of parallelism. In this paper, we present a technique for automatic array privatization. Our algorithm uses data flow analysis of array references to identify privatizable arrays intraprocedurally as well as interprocedurally. It employs static and dynamic resolution to determine the last value of a lived private array.We compare the result of automatic array privatization with that of manual array privatization and identify directions for future improvement. To enhance the effectiveness of our algorithm, wedevelop a goal directly technique to analysis symbolic variables in the present of conditional statements, loops and index arrays.
The Multiscalar Architecture
, 1993
"... The centerpiece of this thesis is a new processing paradigm for exploiting instruction level parallelism. This paradigm, called the multiscalar paradigm, splits the program into many smaller tasks, and exploits fine-grain parallelism by executing multiple, possibly (control and/or data) depen-dent t ..."
Abstract
-
Cited by 113 (8 self)
- Add to MetaCart
The centerpiece of this thesis is a new processing paradigm for exploiting instruction level parallelism. This paradigm, called the multiscalar paradigm, splits the program into many smaller tasks, and exploits fine-grain parallelism by executing multiple, possibly (control and/or data) depen-dent tasks in parallel using multiple processing elements. Splitting the instruction stream at statically determined boundaries allows the compiler to pass substantial information about the tasks to the hardware. The processing paradigm can be viewed as extensions of the superscalar and multiprocess-ing paradigms, and shares a number of properties of the sequential processing model and the dataflow processing model. The multiscalar paradigm is easily realizable, and we describe an implementation of the multis-calar paradigm, called the multiscalar processor. The central idea here is to connect multiple sequen-tial processors, in a decoupled and decentralized manner, to achieve overall multiple issue. The mul-tiscalar processor supports speculative execution, allows arbitrary dynamic code motion (facilitated by an efficient hardware memory disambiguation mechanism), exploits communication localities, and does all of these with hardware that is fairly straightforward to build. Other desirable aspects of the

