Results 1 -
7 of
7
Scanning Polyhedra with DO Loops
, 1991
"... Supercompilers perform complex program transformations which often result in new loop bounds. This paper shows that, under the usual assumptions in automatic parallelization, most transformations on loop nests can be expressed as affine transformations on integer sets de ned by polyhedra and that th ..."
Abstract
-
Cited by 180 (4 self)
- Add to MetaCart
Supercompilers perform complex program transformations which often result in new loop bounds. This paper shows that, under the usual assumptions in automatic parallelization, most transformations on loop nests can be expressed as affine transformations on integer sets de ned by polyhedra and that the new loop bounds can be computed with algorithms using Fourier's pairwise elimination method although it is not exact for integer sets. Sufficient conditions to use pairwise elimination on integer sets and to extend it to pseudo-linear constraints are also given. A tradeo has to be made between dynamic overhead due to some bound slackness and compilation complexity but the resulting code is always correct. These algorithms can be used to interchange or block loops regardless of the loop bounds or the blocking strategy and to safely exchange array parts between two levels of a memory hierarchy or between neighboring processors in a distributed memory machine.
Let’s Study Whole-Program Cache Behaviour Analytically
- In Proceedings of International Symposium on High-Performance Computer Architecture (HPCA 8
, 2002
"... ..."
Run-time Cache Bypassing
, 2000
"... The growing disparity between processor and memory performance has made cache misses increasingly expensive. Additionally, data and instruction caches are not always used eciently, resulting in large numbers of cache misses. Therefore, the importance of cache performance improvements at each level o ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
The growing disparity between processor and memory performance has made cache misses increasingly expensive. Additionally, data and instruction caches are not always used eciently, resulting in large numbers of cache misses. Therefore, the importance of cache performance improvements at each level of the memory hierarchy will continue to grow. In numeric programs there are several known compiler techniques for optimizing data cache performance. However, integer (non-numeric) programs often have irregular access patterns that are more difficult for the compiler to optimize. In the past, cache management techniques such as cache bypassing were implemented manually at the machine-language-programming level. As the available chip area grows, it makes sense to spend more resources to allow intelligent control over the cache management. In this paper we present an approach to improving cache effectiveness, taking advantage of the growing chip area, utilizing run-time adaptive cache ma...
A Compiler-Blockable Algorithm for QR Decomposition
, 1995
"... Because of an imbalance between computation and memory speed in modern processors, programmers are explicitly restructuring codes to perform well on particular memory systems, leading to machine-speci c programs. This paper describes a block algorithm for QR decomposition that is derivable by th ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Because of an imbalance between computation and memory speed in modern processors, programmers are explicitly restructuring codes to perform well on particular memory systems, leading to machine-speci c programs. This paper describes a block algorithm for QR decomposition that is derivable by the compiler and has good performance on small matrices | sizes that are typically run on nodes of a massively parallel system or workstation. The advantage of our algorithm over the one found in LAPACK is that it can be derived by the compiler and needs no hand optimization.
Portable High Performance Programming via Architecture-Cognizant Divide-and-Conquer Algorithms
, 2000
"... ...................................................... xiii 1 Introduction .................................................. 1 1. Divide-and-Conquer and the Memory Hierarchy . . . . . . . . . . . 2 2. Overview of Architecture-Cognizant Divide-and Conquer . . . . . . 4 3. Overview of Napoleon . . . ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
...................................................... xiii 1 Introduction .................................................. 1 1. Divide-and-Conquer and the Memory Hierarchy . . . . . . . . . . . 2 2. Overview of Architecture-Cognizant Divide-and Conquer . . . . . . 4 3. Overview of Napoleon . . . . . . . . . . . . . . . . . . . . . . . . . 5 4. What You Can Expect . . . . . . . . . . . . . . . . . . . . . . . . . 6 5. Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1. Divide-and-Conquer Algorithms for Performance Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2. The Importance of Architecture-Cognizance . . . . . . . . . 7 3. Complexity of Determining VariantPolicy . . . . . . . . . . 7 4. A Framework and System for Divide-and-Conquer Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 5. The Fastest Portable FFT Algorithm . . . . . . . . . . . . . 8 6. Outline of Thesis . . . . . . . . . . . . . . . . ....
Efficient Compile-Time Analysis of Cache Behaviour for Programs with IF Statements
, 2002
"... This paper presents an analytical method for analysing efficiently the cache behaviour of perfect loop nests containing IF statements with compile-time-analysable conditionals. We discuss the derivations of reuse vectors in the presence of IF statements, present miss equations for characterising the ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper presents an analytical method for analysing efficiently the cache behaviour of perfect loop nests containing IF statements with compile-time-analysable conditionals. We discuss the derivations of reuse vectors in the presence of IF statements, present miss equations for characterising the cache behaviour of a program and give algorithms for solving these equations for cache misses. We show that our method, together with loop sinking, can be used to analyse a large number of imperfect loop nests that cannot be analysed previously -- 17% more loop nests than previously in SPECfp95, Perfect Suite, Livermore kernels, Linpack and Lapack. Validation against cache simulation demonstrates the efficiency and accuracy of our method. Our method can be used to guide compiler cache optimisations and improve the performance of cache simulators and profilers.
Analysing Cache Memory Behaviour for Programs with IF Statements
, 2001
"... Cache memories are widely used to bridge the increasing performance gap between processors and main memories. However, cache memories are eective only when the program exhibits good cache locality. Analytical methods such as the Cache Miss Equations (CMEs) use mathematical formulas to provide a p ..."
Abstract
- Add to MetaCart
Cache memories are widely used to bridge the increasing performance gap between processors and main memories. However, cache memories are eective only when the program exhibits good cache locality. Analytical methods such as the Cache Miss Equations (CMEs) use mathematical formulas to provide a precise characterisation of the number and causes of cache misses in loop-oriented programs. The information gathered can be used to guide locality enhancement compiler optimisations. Unfortunately, all existing analytical methods are limited to special forms of perfectly nested loops, which, for example, must be free of IF statements. This paper presents an analytical method for analysing the cache behaviour of perfectly nested loops containing IF statements with compile-time-analysable conditionals. We demonstrate that our method, together with the compiler technique loop sinking, can be used to analyse a large number of imperfect loop nests. By analysing the loop nests in SPECfp95, Perfect Suite, Livermore kernels, Linpack and Lapack, we nd that our method enables 17% more loop nests to be analysed than previously. This represents an important step towards analysing complex program constructs in real programs. 2 1

