Results 1 - 10
of
29
Improving Data Locality with Loop Transformations
- ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS
, 1996
"... ..."
Optimizing for Parallelism and Data Locality
- In Proceedings of the 1992 ACM International Conference on Supercomputing
, 1992
"... Previous research has used program transformation to introduce parallelism and to exploit data locality. Unfortunately, these two objectives have usually been considered independently. This work explores the tradeoffs between effectively utilizing parallelism and memory hierarchy on shared-memory mu ..."
Abstract
-
Cited by 92 (13 self)
- Add to MetaCart
Previous research has used program transformation to introduce parallelism and to exploit data locality. Unfortunately, these two objectives have usually been considered independently. This work explores the tradeoffs between effectively utilizing parallelism and memory hierarchy on shared-memory multiprocessors. We present a simple, but surprisingly accurate, memory model to determine cache line reuse from both multiple accesses to the same memory location and from consecutive memory access. The model is used in memory optimizing and loop parallelization algorithms that effectively exploit data locality and parallelism in concert. We demonstrate the efficacy of this approach with very encouraging experimental results. 1 Introduction Transformations to exploit parallelism and to improve data locality are two of the most valuable compiler techniques in use today. Independently, each of these optimizations has been shown to result in dramatic improvements. This paper seeks to combine t...
The Range Test: A Dependence Test for Symbolic, Non-linear
- Proceedings of Supercomputing '94, Washington D.C
, 1994
"... Most current data dependence tests cannot handle loop bounds or array subscripts that are symbolic, nonlinear expressions (e.g. A(n i+j), where 0 j n). In this paper, we describe a dependencetest,called the range test, that can handle such expressions. Briefly, the range test proves independence ..."
Abstract
-
Cited by 73 (16 self)
- Add to MetaCart
Most current data dependence tests cannot handle loop bounds or array subscripts that are symbolic, nonlinear expressions (e.g. A(n i+j), where 0 j n). In this paper, we describe a dependencetest,called the range test, that can handle such expressions. Briefly, the range test proves independence by determining whether certain symbolic inequalities hold for a permutation of the loop nest. Powerful symbolic analyses and constraint propagation techniques were developedtoprove such inequalities.Therange test has been implemented in Polaris, a parallelizing compiler being developed at the University of Illinois.
Managing Interprocedural Optimization
, 1991
"... This dissertation addresses a number of important issues related to interprocedural optimization. Interprocedural optimization is an integral component in a compilation system for high-performance computing. The importance of interprocedural optimization stems from two sources: it increases the cont ..."
Abstract
-
Cited by 60 (9 self)
- Add to MetaCart
This dissertation addresses a number of important issues related to interprocedural optimization. Interprocedural optimization is an integral component in a compilation system for high-performance computing. The importance of interprocedural optimization stems from two sources: it increases the context available to the optimizing compiler, and it enables programmers to use procedure calls without the concern of hurting execution time. While important, interprocedural optimization can introduce some significant compile-time costs. When interprocedural information is used to optimize a procedure, the procedure is then dependent on those interprocedural facts. Thus, even if the procedure is not edited, it may require recompilation due to changes in the interprocedural facts. In addition to these effects on recompilation, interprocedural information can also be expensive to compute. Furthermore, interprocedural optimizations can increase program size which can in turn increase compile tim...
Automatic and Interactive Parallelization
, 1994
"... The goal of this dissertation is to give programmers the ability to achieve high performance by focusing on developing parallel algorithms, rather than on architecturespecific details. The advantages of this approach also include program portability and legibility. To achieve high performance, we pr ..."
Abstract
-
Cited by 38 (8 self)
- Add to MetaCart
The goal of this dissertation is to give programmers the ability to achieve high performance by focusing on developing parallel algorithms, rather than on architecturespecific details. The advantages of this approach also include program portability and legibility. To achieve high performance, we provide automatic compilation techniques that tailor parallel algorithms to shared-memory multiprocessors with local caches and a common bus. In particular, the compiler maps complete applications onto the specifics of a machine, exploiting both parallelism and memory. To optimize complete applications, we develop novel, general algorithms to transform loops that contain arbitrary conditional control flow. In addition, we provide new interprocedural transformations which enable optimization across procedure boundaries. These techniques provide the basis for a robust automatic parallelizing algorithm that is applicable to complete programs. The algorithm for automatic parallel code generation t...
Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs
, 1996
"... This papcr presents an optimal algorithm lor detecting line or medium grain parallelism in nested loops whose dependences are described by an approximation of distance vectors by polyhedra. In particular, this algorithm is optimal for the classical approximation by direction sectors. This result gcn ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
This papcr presents an optimal algorithm lor detecting line or medium grain parallelism in nested loops whose dependences are described by an approximation of distance vectors by polyhedra. In particular, this algorithm is optimal for the classical approximation by direction sectors. This result gcncruli/es. to the case of several statements. Wolf and Lam's algorithm which is optimal for a single statement. Our algorithm relies on a dependence uniformi/ation process and on paralleli/ation techniques related to system of uniform recurrence equations. It can also be viewed as a combination of both Allen and Kennedy's algorithm and Wolf and Lam's algorithm.
Compiling Nested Data-Parallel Programs for Shared-Memory Multiprocessors
- ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS
, 1993
"... While data parallelism is well-suited from algorithmic, architectural, and linguistic considerations to serve as a basis for portable parallel programming, its characteristic fine-grained parallelism makes the efficient implementation of data-parallel languages on MIMD machines a challenging task. ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
While data parallelism is well-suited from algorithmic, architectural, and linguistic considerations to serve as a basis for portable parallel programming, its characteristic fine-grained parallelism makes the efficient implementation of data-parallel languages on MIMD machines a challenging task. The design, implementation, and evaluation of an optimizing compiler are presented for an applicative nested data-parallel language called VCODE targeted at the Encore Multimax, a shared-memory multiprocessor. The source language supports nested aggregate data types; aggregate operations including elementwise forms, scans, reductions, and permutations; and conditionals and recursion for control flow. A small set of graph-theoretic compile-time optimizations reduce the overheads on MIMDmachines in several ways: by increasing the grain size of the output program, by reducing synchronization and storage requirements, and by improving locality of reference. The two key ideas behind these...
Non-Linear and Symbolic Data Dependence Testing
- IEEE Transactions on Parallel and Distributed Systems
, 1998
"... One of the most crucial qualities of an optimizing compiler is its ability to detect when different data references access the same storage location. Such references are said to be data-dependent and they impose constraints on the amount of program modifications the compiler can apply for improving ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
One of the most crucial qualities of an optimizing compiler is its ability to detect when different data references access the same storage location. Such references are said to be data-dependent and they impose constraints on the amount of program modifications the compiler can apply for improving the program's performance. For parallelizing compilers the most important program constructs to investigate are loops and the array references they contain. In previous work we have found a serious limitation of current data dependence tests to be that they cannot handle loop bounds or array subscripts that are symbolic, nonlinear expressions. In this paper, we describe a dependence test, called the Range Test, that can handle such expressions. Briefly, the Range Test proves independence by determining whether certain symbolic inequalities hold for a permutation of the loop nest.

