Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching (1996) [257 citations — 11 self]
http://www.cs.utah.edu/classes/cs7810-rajeev/paper
http://www.tinker.ncsu.edu/ericro/ece792/papers/TC
ftp://ftp.cs.wisc.edu/tech-reports/reports/96/tr13
ftp://ftp.cs.wisc.edu/sohi/papers/1996/micro.trace
ftp://ftp.cs.wisc.edu/sohi/papers/1996/micro.trace
http://www.tinker.ncsu.edu/ericro/publications/con
DBLP
CACHED:
Abstract:
Superscalar processors require sufficient instruction fetch bandwidth to feed their highly parallel execution cores. Fetch bandwidth is determined by a number of factors, namely instruction cache hit rate, branch prediction accuracy, and taken branches in the instruction stream. Taken branches introduce the problem of noncontiguous instruction fetching: the dynamic instruction sequence exists in the cache, but the instructions are not in contiguous cache locations. This report considers the problem of fetching noncontiguous blocks of instructions in a single cycle. We propose the trace cache, a special instruction cache that captures dynamic instruction sequences. Each line in the trace cache stores a dynamic code sequence, which may contain one or more taken branches. Dynamic sequences are built up as the program executes. If a predicted dynamic sequence exists in the trace cache, it can be fed directly to the decoders. We investigate other methods for fetching noncontiguous instructi...

