Results 1 - 10
of
10
Cache Miss Equations: A Compiler Framework for Analyzing and Tuning Memory Behavior
- ACM Transactions on Programming Languages and Systems
, 1999
"... This article describes methods for generating and solving Cache Miss Equations (CMEs) that give a detailed representation of cache behavior, including conflict misses, in loop-oriented scientific code. Implemented within the SUIF compiler framework, our approach extends traditional compiler reuse an ..."
Abstract
-
Cited by 127 (1 self)
- Add to MetaCart
This article describes methods for generating and solving Cache Miss Equations (CMEs) that give a detailed representation of cache behavior, including conflict misses, in loop-oriented scientific code. Implemented within the SUIF compiler framework, our approach extends traditional compiler reuse analysis to generate linear Diophantine equations that summarize each loop's memory behavior. While solving these equations is in general di#- cult, we show that is also unnecessary, as mathematical techniques for manipulating Diophantine equations allow us to relatively easily compute and/or reduce the number of possible solutions, where each solution corresponds to a potential cache miss. The mathematical precision of CMEs allows us to find true optimal solutions for transformations such as blocking or padding. The generality of CMEs also allows us to reason about interactions between transformations applied in concert. The article also gives examples of their use to determine array padding and o#set amounts that minimize cache misses, and to determine optimal blocking factors for tiled code. Overall, these equations represent an analysis framework that o#ers the generality and precision needed for detailed compiler optimizations
To Copy or Not to Copy: A Compile-Time Technique for Assessing When Data Copying Should be Used to Eliminate Cache Conflicts
- In Proceedings of Supercomputing '93
, 1993
"... this paper, we present a compile-time technique for making this determination, and present a selective copying strategy based on this methodology. Preliminary experimental results demonstrate that, because of the sensitivity of cache conflicts to small changes in problem size and base addresses, sel ..."
Abstract
-
Cited by 107 (5 self)
- Add to MetaCart
this paper, we present a compile-time technique for making this determination, and present a selective copying strategy based on this methodology. Preliminary experimental results demonstrate that, because of the sensitivity of cache conflicts to small changes in problem size and base addresses, selective copying can lead to better overall performance than either no copying, complete copying or manually applied heuristics.
Cache Miss Equations: An Analytical Representation of Cache Misses
- In Proceedings of the 1997 ACM International Conference on Supercomputing
, 1997
"... With the widening performance gap between processors and main memory, efficient memory accessing behavior is necessary for good program performance. Both hand-tuning and compiler optimization techniques are often used to transform codes to improve memory performance. Effective transformations requir ..."
Abstract
-
Cited by 99 (4 self)
- Add to MetaCart
With the widening performance gap between processors and main memory, efficient memory accessing behavior is necessary for good program performance. Both hand-tuning and compiler optimization techniques are often used to transform codes to improve memory performance. Effective transformations require detailed knowledge about the frequency and causes of cache misses in the code.
Cache Interference Phenomena
- In Proceedings of the Sigmetrics Conference on Measurement and Modeling of Computer Systems
, 1994
"... The impact of cache interferences on program performance (particularly numerical codes, which heavily use the memory hierarchy) remains unknown. The general knowledge is that cache interferences are highly irregular, in terms of occurrence and intensity. In this paper, the different types of cache i ..."
Abstract
-
Cited by 78 (5 self)
- Add to MetaCart
The impact of cache interferences on program performance (particularly numerical codes, which heavily use the memory hierarchy) remains unknown. The general knowledge is that cache interferences are highly irregular, in terms of occurrence and intensity. In this paper, the different types of cache interferences that can occur in numerical loop nests are identified. An analytical method is developed for detecting the occurrence of interferences and, more important, for computing the number of cache misses due to interferences. Simulations and experiments on real machines show that the model is generally accurate and that most interference phenomena are captured. Experiments also show that cache interferences can be intense and frequent. Certain parameters such as array base addresses or dimensions can have a strong impact on the occurrence of interferences. Modifying these parameters only can induce global execution time variations of 30% and more. Applications of these modeling techniq...
Precise Miss Analysis for Program Transformations with Caches of Arbitrary Associativity
- In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems
, 1998
"... Analyzing and optimizing program memory performance is a pressing problem in high-performance computer architectures. Currently, software solutions addressing the processormemory performance gap include compiler- or programmerapplied optimizations like data structure padding, matrix blocking, and ot ..."
Abstract
-
Cited by 74 (1 self)
- Add to MetaCart
Analyzing and optimizing program memory performance is a pressing problem in high-performance computer architectures. Currently, software solutions addressing the processormemory performance gap include compiler- or programmerapplied optimizations like data structure padding, matrix blocking, and other program transformations. Compiler optimization can be effective, but the lack of precise analysis and optimization frameworks makes it impossible to confidently make optimal, rather than heuristic-based, program transformations. Imprecision is most problematic in situations where hard-to-predict cache conflicts foil heuristic approaches. Furthermore, the lack of a general framework for compiler memory performance analysis makes it impossible to understand the combined effects of several program transformations. The Cache Miss Equation (CME) framework discussed in this paper addresses these issues. We express memory reference and cache conflict behavior in terms of sets of equations. The ...
Software Assistance for Data Caches
- Proceedings of the High Performance Computer Architecture Symposium
, 1995
"... Hardware and software cache optimizations are active fields of research, that have yielded powerful but occasionally complex designs and algorithms. The purpose of this paper is to investigate the performance of combined though simple software and hardware optimizations. Because current caches provi ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Hardware and software cache optimizations are active fields of research, that have yielded powerful but occasionally complex designs and algorithms. The purpose of this paper is to investigate the performance of combined though simple software and hardware optimizations. Because current caches provide little flexibility for exploiting temporal and spatial locality, two hardware modifications are proposed to support these two kinds of locality. Spatial locality is exploited by using large virtual cache lines which do not exhibit the performance flaws of large physical cache lines. Temporal locality is exploited by minimizing cache pollution with a bypass mechanism that still allows to exploit spatial locality. Subsequently, it is shown that simple software informations on the spatial/temporal locality of array references, as provided by current data locality optimizing algorithms, can be used to significantly increase cache performance. The performance and design tradeoffs of the propos...
An Integrated Hardware/Software Solution for Effective Management of Local Storage in High-Performance Systems
- In Proceedings of the International Conference on Parallel Processing, volume II
, 1991
"... The potential of high-performance systems, especially vector and parallel machines, is generally limited by the bandwidth between processors and memory. To achieve the performance of which these machines should be capable, greater emphasis must be placed on optimizing array accesses. We propose a pr ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
The potential of high-performance systems, especially vector and parallel machines, is generally limited by the bandwidth between processors and memory. To achieve the performance of which these machines should be capable, greater emphasis must be placed on optimizing array accesses. We propose a practical, integrated hardware/software strategy for increasing the effectiveness of local storage management. Our scheme provides many of the advantages of both compile-time and run-time memory management techniques. In this paper, we describe our local storage facility: the priority data cache. We also describe compile-time techniques for easily and effectively utilizing this level of local storage. 1 Introduction The potential of high-performance systems is generally limited by the bandwidth between processors and memory. To achieve the performance of which these machines should be capable, greater emphasis must be placed on optimizing array accesses. At compile time, many safe and profita...
Impact of Cache Interferences on Usual Numerical Dense Loop Nests
- Proceedings of the IEEE
, 1993
"... In numerical codes, the regular interleaved accesses that occur within do-loop nests induce cache interference phenomena that can severely degrade program performance. Cache interferences can significantly increase the volume of memory traffic and the amount of communication in uniprocessors and mul ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
In numerical codes, the regular interleaved accesses that occur within do-loop nests induce cache interference phenomena that can severely degrade program performance. Cache interferences can significantly increase the volume of memory traffic and the amount of communication in uniprocessors and multiprocessors. In this paper, we identify cache interference phenomena, determine their causes and the conditions under which they occur. Based on these results, we derive a methodology for computing an analytical expression of cache misses for most classic loop nests, which can be used for precise performance analysis and prediction. We show that cache performance is unstable, because some unexpected parameters such as arrays base address can play a significant role in interference phenomena. We also show that the impact of cache interferences can be so high, that the benefits of current data locality optimization techniques can be partially, if not totally, eradicated. Keywords: memory ref...
Cache Awareness in Blocking Techniques
- in Journal of Programming Languages
, 1998
"... To date, data locality optimizing algorithms mostly aim at providing strategies for blocking and reordering loops. But little research has been devoted to the final step: finding the optimal block size, i.e., a block size that provides the best possible performance. Optimal block sizes are currently ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
To date, data locality optimizing algorithms mostly aim at providing strategies for blocking and reordering loops. But little research has been devoted to the final step: finding the optimal block size, i.e., a block size that provides the best possible performance. Optimal block sizes are currently computed as if a cache is a local memory, i.e., cache interferences are ignored. Case-studies have already shown that cache interferences can greatly affect the optimal block size value. The purpose of this article is to show that analytical modeling of cache interferences can be used to compute near-optimal block sizes for blocked loop nests. First, the method for evaluating cache interferences is presented. Second, the model is validated by correlating the estimated miss ratio with the simulated miss ratio and the execution time of various loop nests. Then, current techniques for computing the optimal block size are analytically and experimentally shown to yield below-optimal performance....
Cache Awareness in Blocking Techniques, Part II
"... To date, data locality optimizing algorithms mostly aim at providing efficient strategies for blocking and reordering loops. But little research has been devoted to the final step, i.e., computing the optimal block size. Optimal block sizes are currently computed as if a cache behaves as a local mem ..."
Abstract
- Add to MetaCart
To date, data locality optimizing algorithms mostly aim at providing efficient strategies for blocking and reordering loops. But little research has been devoted to the final step, i.e., computing the optimal block size. Optimal block sizes are currently computed as if a cache behaves as a local memory, i.e., cache interference phenomena are ignored. Case-studies have already shown that cache interferences can greatly affect the optimal block size. The purpose of this paper is to propose a methodology for estimating interference misses in a regular do-loop nest, and use that knowledge to derive the optimal block size. First, the different types of interference phenomena are identified, and a method for predicting their occurrence and evaluating their impact is proposed. Second, current techniques for computing the optimal block size are analytically and experimentally shown to yield far below optimal performance. Third, cache interference phenomena and even TLB behavior are taken into ...

