Results 1  10
of
23
Code Generation for Multiple Mappings
 IN FRONTIERS '95: THE 5TH SYMPOSIUM ON THE FRONTIERS OF MASSIVELY PARALLEL COMPUTATION
, 1994
"... There has been a great amount of recent work toward unifying iteration reordering transformations. Many of these approaches represent transformations as affine mappings from the original iteration space to a new iteration space. These approaches show a great deal of promise, but they all rely on the ..."
Abstract

Cited by 75 (2 self)
 Add to MetaCart
There has been a great amount of recent work toward unifying iteration reordering transformations. Many of these approaches represent transformations as affine mappings from the original iteration space to a new iteration space. These approaches show a great deal of promise, but they all rely on the ability to generate code that iterates over the points in these new iteration spaces in the appropriate order. This problem has been fairly wellstudied in the case where all statements use the same mapping. We have developed an algorithm for the less wellstudied case where each statement uses a potentially different mapping. Unlike many other approaches, our algorithm can also generate code from mappings corresponding to loop blocking. We address the important tradeoff between reducing control overhead and duplicating code.
Synthesizing transformations for locality enhancement of imperfectlynested loop nests
 In Proceedings of the 2000 ACM International Conference on Supercomputing
, 2000
"... We present an approach for synthesizing transformations to enhance locality in imperfectlynested loops. The key idea is to embed the iteration space of every statement in a loop nest into a special iteration space called the product space. The product space can be viewed as a perfectlynested loop ..."
Abstract

Cited by 56 (3 self)
 Add to MetaCart
We present an approach for synthesizing transformations to enhance locality in imperfectlynested loops. The key idea is to embed the iteration space of every statement in a loop nest into a special iteration space called the product space. The product space can be viewed as a perfectlynested loop nest, so embedding generalizes techniques like code sinking and loop fusion that are used in ad hoc ways in current compilers to produce perfectlynested loops from imperfectlynested ones. In contrast to these ad hoc techniques however, our embeddings are chosen carefully to enhance locality. The product space is then transformed further to enhance locality, after which fully permutable loops are tiled, and code is generated. We evaluate the effectiveness of this approach for dense numerical linear algebra benchmarks, relaxation codes, and the tomcatv code from the SPEC benchmarks. 1. BACKGROUND AND PREVIOUSWORK Sophisticated algorithms based on polyhedral algebra have been developed for determining good sequences of linear loop transformations (permutation, skewing, reversal and scaling) for enhancing locality in perfectlynested loops 1. Highlights of this technology are the following. The iterations of the loop nest are modeled as points in an integer lattice, and linear loop transformations are modeled as nonsingular matrices mapping one lattice to another. A sequence of loop transformations is modeled by the product of matrices representing the individual transformations; since the set of nonsingular matrices is closed under matrix product, this means that a sequence of linear loop transformations can be represented by a nonsingular matrix. The problem of finding an optimal sequence of linear loop transformations is thus reduced to the problem of finding an integer matrix that satisfies some desired property, permitting the full machinery of matrix methods and lattice theory to ¢ This work was supported by NSF grants CCR9720211, EIA9726388, ACI9870687,EIA9972853. £ A perfectlynested loop is a set of loops in which all assignment statements are contained in the innermost loop.
A Unifying Framework for Iteration Reordering Transformations
 In Proceedings of IEEE First International Conference on Algorithms and Architectures for Parallel Processing
, 1995
"... We present a framework for unifying iteration reordering transformations such as loop interchange, loop distribution, skewing, tiling, index set splitting and statement reordering. The framework is based on the idea that a transformation can be represented as a mapping from the original iteration sp ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
We present a framework for unifying iteration reordering transformations such as loop interchange, loop distribution, skewing, tiling, index set splitting and statement reordering. The framework is based on the idea that a transformation can be represented as a mapping from the original iteration space to a new iteration space. The framework is designed to provide a uniform way to represent and reason about transformations. We also provide algorithms to test the legality of mappings, and to generate optimized code for mappings. 1 Introduction Optimizing compilers reorder iterations of statements to improve instruction scheduling, register use, and cache utilization, and to expose parallelism. Many different reordering transformations have been developed and studied, such as loop interchange, loop distribution, skewing, tiling, index set splitting and statement reordering [AK87, Wol89b, Wol90, CK92]. Each of these transformations has its own special legality checks and transformation ...
Transitive Closure of Infinite Graphs and its Applications
, 1995
"... Integer tuple relations can concisely summarize many types of information gathered from analysis of scientific codes. For example they can be used to precisely describe which iterations of a statement are data dependent of which other iterations. It is generally not possible to represent these tuple ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
Integer tuple relations can concisely summarize many types of information gathered from analysis of scientific codes. For example they can be used to precisely describe which iterations of a statement are data dependent of which other iterations. It is generally not possible to represent these tuple relations by enumerating the related pairs of tuples. For example, it is impossible to enumerate the related pairs of tuples in the relation f[i] ! [i + 2] j 1 i n \Gamma 2 g. Even when it is possible to enumerate the related pairs of tuples, such as for the relation f[i; j] ! [i 0 ; j 0 ] j 1 i; j; i 0 ; j 0 100 g, it is often not practical to do so. We instead use a closed form description by specifying a predicate consisting of affine constraints on the related pairs of tuples. As we just saw, these affine constraints can be parameterized, so what we are really describing are infinite families of relations (or graphs). Many of our applications of tuple relations rely heavily ...
Minimal data dependence abstractions for loop transformations
 International Journal of Parallel Programming
, 1994
"... Many abstractions of program dependences have already been proposed, such as the Dependence Distance, the Dependence Direction Vector, the Dependence Level or the Dependence Cone. These di erent abstractions have di erent precision. The minimal abstraction associated to a transformation is the abstr ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
Many abstractions of program dependences have already been proposed, such as the Dependence Distance, the Dependence Direction Vector, the Dependence Level or the Dependence Cone. These di erent abstractions have di erent precision. The minimal abstraction associated to a transformation is the abstraction that contains the minimal amount of information necessary to decide when such a transformation is legal. The minimal abstractions for loop reordering and unimodular transformations are presented. As an example, the dependence cone, that approximates dependences by aconvex cone of the dependence distance vectors, is the minimal abstraction for unimodular transformations. It also contains enough information for legally applying all loop reordering transformations and nding the same set of valid mono and multidimensional linear schedulings than the dependence distance set.
Loop optimization for aggregate array computations
"... An aggregate array computation is a loop that computes accumulated quantities over array elements. Such computations are common in programs that use arrays, and the array elements involved in such computations often overlap, especially across iterations of loops, resulting in signi cant redundancy ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
An aggregate array computation is a loop that computes accumulated quantities over array elements. Such computations are common in programs that use arrays, and the array elements involved in such computations often overlap, especially across iterations of loops, resulting in signi cant redundancy in the overall computation. This paper presents a method and algorithms that eliminate such overlapping aggregate array redundancies and shows both analytical and experimental performance improvements. The method is based on incrementalization, i.e., updating the values of aggregate array computations from iteration to iteration rather than computing them from scratch in each iteration. This involves maintaining additional information not maintained in the original program. We reduce various analysis problems to solving inequality constraints on loop variables and array subscripts, and we apply results from work on array data dependence analysis. Incrementalizing aggregate array computations produces drastic program speedup compared to previous optimizations. Previous methods for loop optimizations of arrays do not perform incrementalization, and previous techniques for loop incrementalization do not handle arrays.
Sparse Tiling for Stationary Iterative Methods
 INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS
, 2004
"... In modern computers, a program’s data locality can affect performance significantly. This paper details full sparse tiling, a runtime reordering transformation that improves the data locality for stationary iterative methods such as Gauss–Seidel operating on sparse matrices. In scientific applicati ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
In modern computers, a program’s data locality can affect performance significantly. This paper details full sparse tiling, a runtime reordering transformation that improves the data locality for stationary iterative methods such as Gauss–Seidel operating on sparse matrices. In scientific applications such as finite element analysis, these iterative methods dominate the execution time. Full sparse tiling chooses a permutation of the rows and columns of the sparse matrix, and then an order of execution that achieves better data locality. We prove that full sparsetiled Gauss–Seidel generates a solution that is bitwise identical to traditional Gauss–Seidel on the permuted matrix. We also present measurements of the performance improvements and the overheads of full sparse tiling and of cache blocking for irregular grids, a related technique developed by Douglas et al.
Proving optimizations correct using parameterized program equivalence
 In Proceedings of the 2009 Conference on Programming Language Design and Implementation (PLDI 2009
, 2009
"... Translation validation is a technique for checking that, after an optimization has run, the input and output of the optimization are equivalent. Traditionally, translation validation has been used to prove concrete, fully specified programs equivalent. In this paper we present Parameterized Equivale ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
Translation validation is a technique for checking that, after an optimization has run, the input and output of the optimization are equivalent. Traditionally, translation validation has been used to prove concrete, fully specified programs equivalent. In this paper we present Parameterized Equivalence Checking (PEC), a generalization of translation validation that can prove the equivalence of parameterized programs. A parameterized program is a partially specified program that can represent multiple concrete programs. For example, a parameterized program may contain a section of code whose only known property is that it does not modify certain variables. By proving parameterized programs equivalent, PEC can prove the correctness of transformation rules that represent complex optimizations once and for all, before they are ever run. We implemented our PEC technique in a tool that can establish the equivalence of two parameterized programs. To highlight the power of PEC, we designed a language for implementing complex optimizations using manytomany rewrite rules, and used this language to implement a variety of optimizations including software pipelining, loop unrolling, loop unswitching, loop interchange, and loop fusion. Finally, to demonstrate the effectiveness of PEC, we used our PEC implementation to verify that all the optimizations we implemented in our language preserve program behavior.
Array Restructuring for Cache Locality
, 1996
"... Caches are used in almost every modern processor design to reduce the long memory access latency, which is increasingly a bottleneck to program performance. For caches to be effective, programs must exhibit good data locality. Thus, an optimizing compiler may have to restructure programs to enhance ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Caches are used in almost every modern processor design to reduce the long memory access latency, which is increasingly a bottleneck to program performance. For caches to be effective, programs must exhibit good data locality. Thus, an optimizing compiler may have to restructure programs to enhance their locality. We focus on the class of restructuring techniques that target array accesses in loops. There are two approaches to enhancing the locality of such accesses: loop restructuring and array restructuring. Under loop restructuring, a compiler adopts a canonical array layout but transforms the order in which loop iterations are performed and thereby reorders the execution of array accesses. Under array restructuring, in contrast, a compiler lays out array elements in an order that matches the access pattern, whi...