Results 1 -
4 of
4
Software trace cache
- Proceedings of the 13th Intl. Conference on Supercomputing
, 1999
"... Abstract—This paper explores the use of compiler optimizations which optimize the layout of instructions in memory. The target is to enable the code to make better use of the underlying hardware resources regardless of the specific details of the processor/ architecture in order to increase fetch pe ..."
Abstract
-
Cited by 36 (9 self)
- Add to MetaCart
Abstract—This paper explores the use of compiler optimizations which optimize the layout of instructions in memory. The target is to enable the code to make better use of the underlying hardware resources regardless of the specific details of the processor/ architecture in order to increase fetch performance. The Software Trace Cache (STC) is a code layout algorithm with a broader target than previous layout optimizations. We target not only an improvement in the instruction cache hit rate, but also an increase in the effective fetch width of the fetch engine. The STC algorithm organizes basic blocks into chains trying to make sequentially executed basic blocks reside in consecutive memory positions, then maps the basic block chains in memory to minimize conflict misses in the important sections of the program. We evaluate and analyze in detail the impact of the STC, and code layout optimizations in general, on the three main aspects of fetch performance: the instruction cache hit rate, the effective fetch width, and the branch prediction accuracy. Our results show that layout optimized codes have some special characteristics that make them more amenable for highperformance instruction fetch: They have a very high rate of not-taken branches and execute long chains of sequential instructions; also, they make very effective use of instruction cache lines, mapping only useful instructions which will execute close in time, increasing both spatial and temporal locality. Index Terms—Pipeline processors, instruction fetch, compiler optimizations, branch prediction, trace cache. 1
Optimization of instruction fetch for decision support workloads
- Proceedings of the Intl. Conference on Parallel Processing
, 1999
"... Instruction fetch bandwidth is feared to be a major limiting factor to the performance of future wide-issue aggressive superscalars. In this paper, we focus on Database applications running Decision Support workloads. We characterize the locality patterns of ia database kernel and find frequently ex ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Instruction fetch bandwidth is feared to be a major limiting factor to the performance of future wide-issue aggressive superscalars. In this paper, we focus on Database applications running Decision Support workloads. We characterize the locality patterns of ia database kernel and find frequently executed paths. Using this information, we propose an algorithm to lay out the basic blocks for improved I-fetch. Our results show a miss reduction of 60-98 % for realistic I-cache sizes and a doubling of the number of instructions executed between taken branches. As a consequence, we increase the fetch bandwith provided by an aggressive sequential fetch unit from 5.8 for the original code to 10.6 using our proposed layout. Our software scheme combines well with hardware schemes like a Trace Cache providing up to 12.1 instruction per cycle, suggesting that commercial workloads may be amenable to the aggressive I-fetch of future superscalars. 1
Code Reordering of Decision Support Systems for optimized Instruction Fetch
- Proc. of the Intl. Conference on Parallel Processing
, 1998
"... Instruction fetch bandwidth is feared to be a major limiting factor to the performance of future wide-issue aggressive superscalars. Consequently, it is crucial to develop techniques to increase the number of useful instructions per cycle provided to the processor. Unfortunately, most of the past wo ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Instruction fetch bandwidth is feared to be a major limiting factor to the performance of future wide-issue aggressive superscalars. Consequently, it is crucial to develop techniques to increase the number of useful instructions per cycle provided to the processor. Unfortunately, most of the past work in this area has largely focused on engineering workloads, rather than on the more challenging, badly-behaved popular commercial workloads. In this paper, we focus on Database applications running Decision Support workloads. We characterize the locality patterns of database kernel code and find frequently executed paths. Using this information, we propose an algorithm to lay out the basic blocks of the database kernel for improved I-fetch. Finally, we evaluate the scheme via simulations. Our results show a miss reduction of 60-98% for realistic I-cache sizes and a doubling of the number of instructions executed between taken branches. As a consequence we increase the fetch bandwith provid...
Red blue traces: Trace cache redundancy
, 1999
"... The objective of this paper is to improve the use of the hardware resources of the trace cache mechanism, reducing the implementation cost with no performance degradation, eliminating trace and basic block replication. Two previous works approach a cost reduction for the trace cache. The software tr ..."
Abstract
- Add to MetaCart
The objective of this paper is to improve the use of the hardware resources of the trace cache mechanism, reducing the implementation cost with no performance degradation, eliminating trace and basic block replication. Two previous works approach a cost reduction for the trace cache. The software trace cache is a code reordering scheme which builds as many traces at compile-time as possible. This increases the performance of the core fetch unit, providing traces from the instruction cache, and obtaining similar or better performance with less trace cache storage space. The block-based trace cache is a hardware alternative to the trace cache targeting an elimination of the basic block redundancy among the stored traces. The trace cache used together with the software trace cache are generating a high degree of redundancy between the traces stored in the trace cache and those present in the instruction cache. Also, the block-based trace cache introduces a high degree of fragmentation wit...

