Results 1 - 10
of
26
Trace-Driven Memory Simulation: A Survey
- ACM Computing Surveys
, 2004
"... This article surveys and analyzes these developments by establishing criteria for evaluating trace-driven methods, and then applies these criteria to describe, categorize, and compare over 50 trace-driven simulation tools. We discuss the strengths and weaknesses of different approaches and show t ..."
Abstract
-
Cited by 134 (0 self)
- Add to MetaCart
This article surveys and analyzes these developments by establishing criteria for evaluating trace-driven methods, and then applies these criteria to describe, categorize, and compare over 50 trace-driven simulation tools. We discuss the strengths and weaknesses of different approaches and show that no single method is best when all criteria, including accuracy, speed, memory, flexibility, portability, expense, and ease of use are considered. In a concluding section, we examine fundamental limitations to trace-driven simulation, and survey some recent developments in memory simulation that may overcome these bottlenecks
A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches
- IEEE Transactions on Computers
, 1994
"... This paper compares the trace-sampling techniques of set sampling and time sampling. Using the multi-billion-reference traces of Borg et al., we apply both techniques to multi-megabyte caches, where sampling is most valuable. We evaluate whether either technique meets a 10% sampling goal: a method m ..."
Abstract
-
Cited by 74 (2 self)
- Add to MetaCart
This paper compares the trace-sampling techniques of set sampling and time sampling. Using the multi-billion-reference traces of Borg et al., we apply both techniques to multi-megabyte caches, where sampling is most valuable. We evaluate whether either technique meets a 10% sampling goal: a method meets this goal if, at least 90% of the time, it estimates the trace's true misses per instruction with 10% relative error using 10% of the trace. Results for these traces and caches show that set sampling meets the 10% sampling goal, while time sampling does not. We also find that cold-start bias in time samples is most effectively reduced by the technique of Wood et al. Nevertheless, overcoming cold-start bias requires tens of millions of consecutive references. Index Terms - Cache memory, cache performance, cold start, computer architecture, memory systems, performance evaluation, sampling techniques, trace-driven simulation.
Timekeeping in the Memory System: Predicting and Optimizing Memory Behavior
, 2002
"... Techniques for analyzing and improving memory referencing behavior continue to be important for achieving good overall program performance due to the ever-increasing performance gap between processors and main memory. This paper offers a fresh perspective on the problem of predicting and optimizing ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
Techniques for analyzing and improving memory referencing behavior continue to be important for achieving good overall program performance due to the ever-increasing performance gap between processors and main memory. This paper offers a fresh perspective on the problem of predicting and optimizing memory behavior. Namely, we show quantitatively the extent to which detailed timing characteristics of past memory reference events are strongly predictive of future program reference behavior. We propose a family of timekeeping techniques that optimize behavior based on observations about particular cache time durations, such as the cache access interval or the cache dead time. Timekeeping techniques can be used to build small, simple, and high-accuracy (often 90% or more) predictors for identifying conflict misses, for predicting dead blocks, and even for estimating the time at which the next reference to a cache frame will occur and the address that will be accessed. Based on these predictors, we demonstrate two new and complementary time-based hardware structures: (1) a time-based victim cache that improves performance by only storing conflict miss lines with likely reuse, and (2) a time-based prefetching technique that hones in on the right address to prefetch, and the right time to schedule the prefetch. Our victim cache technique improves performance over previous proposals by better selections of what to place in the victim cache. Our prefetching technique outperforms similar prior hardware prefetching proposals, despite being orders of magnitude smaller. Overall, these techniques improve performance by more than 11% across the SPEC2000 benchmark suite.
PDATS - Lossless Address Trace Compression for . . .
, 1994
"... The tremendous storage space required for a useful data base of traces has driven a search for trace compaction techniques. In this paper we present an information-lossless trace compression scheme that can reduce both storage space and access time by an order of magnitude or more, compared to ASCII ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
The tremendous storage space required for a useful data base of traces has driven a search for trace compaction techniques. In this paper we present an information-lossless trace compression scheme that can reduce both storage space and access time by an order of magnitude or more, compared to ASCII-format traces, without discarding either references or inter-reference timing from the original trace. This technique has been selected as the standard trace format for an extensive new trace data base that will be made accessible to the international research and teaching community.
The V-Way Cache: Demand Based Associativity via Global Replacement
- In Proceedings of the 32nd Annual International Symposium on Computer Architecture
, 2004
"... As processor speeds increase and memory latency becomes more critical, intelligent design and management of secondary caches becomes increasingly important. The efficiency of current set-associative caches is reduced because programs exhibit a non-uniform distribution of memory accesses across dif ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
As processor speeds increase and memory latency becomes more critical, intelligent design and management of secondary caches becomes increasingly important. The efficiency of current set-associative caches is reduced because programs exhibit a non-uniform distribution of memory accesses across different cache sets. We propose a technique to vary the associativity of a cache on a per-set basis in response to the demands of the program. By increasing the number of tag entries relative to the number of data lines, we achieve the performance benefit of global replacement while maintaining the constant hit latency of a set-associative cache. The proposed variable-way, or V-Way set-associative cache, when combined with Reuse Replacement reduces the second-level cache miss rate by an average of 13%. This translates into an average IPC improvement of 8%.
Efficient Simulation of Multiple Cache Configurations using Binomial Trees
, 1991
"... Simulation time is often the bottleneck in the cache design process. In this paper, algorithms for the efficient simulation of direct mapped and set associative caches are presented. Two classes of direct mapped caches are considered: fixed line size caches and fixed size caches. A binomial tree rep ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Simulation time is often the bottleneck in the cache design process. In this paper, algorithms for the efficient simulation of direct mapped and set associative caches are presented. Two classes of direct mapped caches are considered: fixed line size caches and fixed size caches. A binomial tree representation of the caches in each class is introduced. The fixed line size class is considered for set associative caches. A generalization of the binomial tree data structure is introduced and the fixed line size class of set associative caches is represented using the generalized binomial tree. Algorithms are developed that use the data structures to determine miss ratios for the caches in each class. Analytical and empirical comparisons of the algorithms to previously published algorithms such as all-associativity and forest simulation are presented. Analytically it is shown that the new algorithms always perform better than earlier algorithms. Empirically, the new algorithms are shown to...
The V-way cache: demand-based associativity via global replacement
- In Proceedings of the 32nd International Symposium on Computer Architecture
, 2005
"... As processor speeds increase and memory latency becomes more critical, intelligent design and management of secondary caches becomes increasingly important. The efficiency of current set-associative caches is reduced because programs exhibit a non-uniform distribution of memory accesses across diffe ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
As processor speeds increase and memory latency becomes more critical, intelligent design and management of secondary caches becomes increasingly important. The efficiency of current set-associative caches is reduced because programs exhibit a non-uniform distribution of memory accesses across different cache sets. We propose a technique to vary the associativity of a cache on a per-set basis in response to the demands of the program. By increasing the number of tag-store entries relative to the number of data lines, we achieve the performance benefit of global replacement while maintaining the constant hit latency of a set-associative cache. The proposed variable-way, or V-Way, set-associative cache achieves an average miss rate reduction of 13 % on sixteen benchmarks from the SPEC CPU2000 suite. This translates into an average IPC improvement of 8%. 1.
Active Memory: A New Abstraction for Memory-System Simulation
- IN PROCEEDINGS OF THE 1995 ACM SIGMETRICS CONFERENCE ON MEASUREMENT AND MODELING OF COMPUTER SYSTEMS
, 1995
"... This paper describes the active memory abstraction for memory-system simulation. In this abstraction---designed specifically for on-the-fly simulation, memory references logically invoke a user-specified function depending upon the reference's type and accessed memory block state. Active memory all ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
This paper describes the active memory abstraction for memory-system simulation. In this abstraction---designed specifically for on-the-fly simulation, memory references logically invoke a user-specified function depending upon the reference's type and accessed memory block state. Active memory allows simulator writers to specify the appropriate action on each reference, including "no action" for the common case of cache hits. Because the abstraction hides implementation details, implementations can be carefully tuned for particular platforms, permitting much more efficient onthe -fly simulation than the traditional trace-driven abstraction. Our SPARC implementation, Fast-Cache, executes simple data cache simulations two or three times faster than a highly-tuned trace-driven...

