Results 1 -
2 of
2
Code Layout Optimizations for Transaction Processing Workloads
- IN PROC. 28TH ANNUAL INT. SYMP. COMPUTER ARCHITECTURE
, 2001
"... Commercial applications such as databases and Web servers constitute the most important market segment for high-performance servers. Among these applications, on-line transaction processing (OLTP) workloads provide a challenging set of requirements for system designs since they often exhibit ineffic ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
Commercial applications such as databases and Web servers constitute the most important market segment for high-performance servers. Among these applications, on-line transaction processing (OLTP) workloads provide a challenging set of requirements for system designs since they often exhibit inefficient executions dominated by a large memory stall component. This behavior arises from large instruction and data footprints and high communication miss rates. A number of recent studies have characterized the behavior of commercial workloads and proposed architectural features to improve their performance. However, there has been little research on the impact of software and compiler-level optimizations for improving the behavior of such workloads. This paper provides a detailed study of profile-driven compiler optimizations to improve the code layout in commercial workloads with
A Stream Processor Front-end
, 2000
"... This work proposes a new fetch unit model, inspired in the trace processor [8]. Instead of fetching instruction traces, our fetch unit will fetch instruction streams. An instruction stream is a sequential run of instructions, dened by the starting address and the stream length. All branches inclu ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This work proposes a new fetch unit model, inspired in the trace processor [8]. Instead of fetching instruction traces, our fetch unit will fetch instruction streams. An instruction stream is a sequential run of instructions, dened by the starting address and the stream length. All branches included in the stream are assumed to be not taken, except for the terminating one, which should be always taken (else, we are terminating the stream prematurely). We will show how stream fetching approaches the four factors determining instruction fetch performance: the width of instructions fetched per cycle, instruction cache misses, branch prediction throughput and branch prediction accuracy. 1 Fetch performance 1.1 Width of instruction fetch All instructions in a stream are consecutive in memory, and a stream contains no taken branches. This makes it very simple to obtain several consecutive instruction cache lines from a multi-banked cache, and simply select the desired instruction ...

