Results 1 -
2 of
2
Cache Profiling and the SPEC Benchmarks: A Case Study
- IEEE Computer
, 1994
"... As VLSI technology improvements continue to widen the gap between processor and main memory cycle times, cache performance becomes increasingly important to overall system performance. Cache memories help alleviate the cycle time disparity, but only for programs that exhibit sufficient spatial an ..."
Abstract
-
Cited by 137 (7 self)
- Add to MetaCart
As VLSI technology improvements continue to widen the gap between processor and main memory cycle times, cache performance becomes increasingly important to overall system performance. Cache memories help alleviate the cycle time disparity, but only for programs that exhibit sufficient spatial and temporal locality. Programs with unruly access patterns spend much of their time transferring data to and from the cache. To fully exploit the performance potential of fast processors, programmers must explicitly consider cache behavior, restructuring their codes to increase locality. As these fast processors proliferate, techniques for improving cache performance must move beyond the supercomputer and multiprocessor communities and into the mainstream of computing. In this paper, we examine some of the techniques that programmers can use to improve cache performance. We show how to use CPROF, a cache profiler, to identify cache performance bottlenecks and gain insight into their o...
The Performance Impact of Incomplete Bypassing in Processor Pipelines
- In Proceedings of the 28th Annual International Symposium on Microarchitecture
, 1995
"... Pipelined processors employ hardware bypassing to eliminate certain pipeline hazards. Bypassing is logically simple but can be costly, especially in wide issue and deeply pipelined machines. In this paper bypassing is studied in detail, with an emphasis on designs in which the bypassing network is n ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
Pipelined processors employ hardware bypassing to eliminate certain pipeline hazards. Bypassing is logically simple but can be costly, especially in wide issue and deeply pipelined machines. In this paper bypassing is studied in detail, with an emphasis on designs in which the bypassing network is not complete. Cyclelevel simulations of a model of integer and floatingpoint pipelines running some of the SPEC92 benchmarks show that at least half of the instructions executed used a bypassed register result from a previous instruction. Missing bypasses induce interlock stalls. The paper reports measurements of the performance inpact of a number of pipeline configurations with incomplete bypassing networks. This impact ranges from a slowdown of just a few percent for a configuration with one late bypass missing to a slowdown of almost a factor of two for the integer pipe with no bypassing at all. Two types of code alterations reduce the new interlock stalls. A simple code transformation, th...

