Results 1 - 10
of
64
Shade: A Fast Instruction-Set Simulator for Execution Profiling
, 1994
"... Tracing tools are used widely to help analyze, design, and tune both hardware and software systems. This paper describes a tool called Shade which combines efficient instruction-set simulation with a flexible, extensible trace generation capability. Efficiency is achieved by dynamically compiling an ..."
Abstract
-
Cited by 315 (2 self)
- Add to MetaCart
Tracing tools are used widely to help analyze, design, and tune both hardware and software systems. This paper describes a tool called Shade which combines efficient instruction-set simulation with a flexible, extensible trace generation capability. Efficiency is achieved by dynamically compiling and caching code to simulate and trace the application program. The user may control the extent of tracing in a variety of ways; arbitrarily detailed application state information may be collected during the simulation, but tracing less translates directly into greater efficiency. Current Shade implementations run on SPARC systems and simulate the SPARC (Versions 8 and 9) and MIPS I instruction sets. This paper describes the capabilities, design, implementation, and performance of Shade, and discusses instruction set emulation in general.
The Effect of Context Switches on Cache Performance
- Jeffrey C. Mogul and Anita
, 1990
"... research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There is a second research laboratory located in Palo Al ..."
Abstract
-
Cited by 156 (1 self)
- Add to MetaCart
research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There is a second research laboratory located in Palo Alto, the Systems Research Center (SRC). Other Digital research groups are located in Paris (PRL) and in Cambridge,
Trace-Driven Memory Simulation: A Survey
- ACM Computing Surveys
, 2004
"... This article surveys and analyzes these developments by establishing criteria for evaluating trace-driven methods, and then applies these criteria to describe, categorize, and compare over 50 trace-driven simulation tools. We discuss the strengths and weaknesses of different approaches and show t ..."
Abstract
-
Cited by 134 (0 self)
- Add to MetaCart
This article surveys and analyzes these developments by establishing criteria for evaluating trace-driven methods, and then applies these criteria to describe, categorize, and compare over 50 trace-driven simulation tools. We discuss the strengths and weaknesses of different approaches and show that no single method is best when all criteria, including accuracy, speed, memory, flexibility, portability, expense, and ease of use are considered. In a concluding section, we examine fundamental limitations to trace-driven simulation, and survey some recent developments in memory simulation that may overcome these bottlenecks
Studies of Windows NT Performance using Dynamic Execution Traces
- IN PROCEEDINGS OF THE SECOND SYMPOSIUM ON OPERATING SYSTEM DESIGN AND IMPLEMENTATION
, 1996
"... We studied two aspects of the performance of Windows NT: processor bandwidth requirements for memory accesses in a uniprocessor system running commercial and benchmark applications, and locking behavior of a commercial database on a small-scale multiprocessor. Our studies are based on full dynamic e ..."
Abstract
-
Cited by 88 (0 self)
- Add to MetaCart
We studied two aspects of the performance of Windows NT: processor bandwidth requirements for memory accesses in a uniprocessor system running commercial and benchmark applications, and locking behavior of a commercial database on a small-scale multiprocessor. Our studies are based on full dynamic execution traces of the systems, which include all instructions executed by the operating system and applications over periods of a few seconds (enough time to allow for significant computation). The traces were obtained on Alpha PCs, using a new software tool called PatchWrx that takes advantage of the Alpha architecture's PAL-code layer to implement efficient, comprehensive system tracing. Because the Alpha version of Windows NT uses substantially the same code base as other versions, and therefore executes nearly the same sequence of calls, basic blocks, and data structure accesses, we believe our conclusions are relevant for non-Alpha systems as well. This paper describes our performance studies and interesting aspects of PatchWrx. We conclude
A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches
- IEEE Transactions on Computers
, 1994
"... This paper compares the trace-sampling techniques of set sampling and time sampling. Using the multi-billion-reference traces of Borg et al., we apply both techniques to multi-megabyte caches, where sampling is most valuable. We evaluate whether either technique meets a 10% sampling goal: a method m ..."
Abstract
-
Cited by 74 (2 self)
- Add to MetaCart
This paper compares the trace-sampling techniques of set sampling and time sampling. Using the multi-billion-reference traces of Borg et al., we apply both techniques to multi-megabyte caches, where sampling is most valuable. We evaluate whether either technique meets a 10% sampling goal: a method meets this goal if, at least 90% of the time, it estimates the trace's true misses per instruction with 10% relative error using 10% of the trace. Results for these traces and caches show that set sampling meets the 10% sampling goal, while time sampling does not. We also find that cold-start bias in time samples is most effectively reduced by the technique of Wood et al. Nevertheless, overcoming cold-start bias requires tens of millions of consecutive references. Index Terms - Cache memory, cache performance, cold start, computer architecture, memory systems, performance evaluation, sampling techniques, trace-driven simulation.
A Model for Estimating Trace-Sample Miss Ratios
- In Proceedings of the 1991 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems
, 1991
"... Unknown references, also known as cold-start misses, arise during trace-driven simulation of uniprocessor caches because of the unknown initial conditions. Accurately estimating the miss ratio of unknown references, denoted by ¯, is particularly important when simulating large caches with short trac ..."
Abstract
-
Cited by 74 (3 self)
- Add to MetaCart
Unknown references, also known as cold-start misses, arise during trace-driven simulation of uniprocessor caches because of the unknown initial conditions. Accurately estimating the miss ratio of unknown references, denoted by ¯, is particularly important when simulating large caches with short trace samples, since many references may be unknown. In this paper we make three contributions regarding ¯. First, we provide empirical evidence that ¯ is much larger than the overall miss ratio (e.g., 0.40 vs. 0.02). Prior work suggests that they should be the same. Second, we develop a model that explains our empirical results for long trace samples. In our model, each block frame is either live, if its next reference will hit, or dead, if its next reference will miss. We model each block frame as an alternating renewal process, and use the renewal-reward theorem to show that ¯ is simply the fraction of time block frames are dead. Finally, we extend the model to handle short trace samples an...
Instruction Fetching: Coping with Code Bloat
- In Proceedings of the 22nd Annual International Symposium on Computer Architecture
, 1995
"... Previous research has shown that the SPEC benchmarks achieve low miss ratios in relatively small instruction caches. This paper presents evidence that current software-development practices produce applications that exhibit substantially higher instruction-cache miss ratios than do the SPEC benchmar ..."
Abstract
-
Cited by 62 (9 self)
- Add to MetaCart
Previous research has shown that the SPEC benchmarks achieve low miss ratios in relatively small instruction caches. This paper presents evidence that current software-development practices produce applications that exhibit substantially higher instruction-cache miss ratios than do the SPEC benchmarks. To represent these trends, we have assembled a collection of applications, called the Instruction Benchmark Suite (IBS), that provides a better test of instruction-cache performance. We discuss the rationale behind the design of IBS and characterize its behavior relative to the SPEC benchmark suite. Our analysis is based on trace-driven and trap-driven simulations and takes into full account both the application and operating-system components of the workloads. This paper then reexamines a collection of previously-proposed hardware mechanisms for improving instruction-fetch performance
Static Cache Simulation and its Applications
, 1994
"... This work takes a fresh look at the simulation of cache memories. It introduces the technique of static cache simulation that statically predicts a large portion of cache references. To efficiently utilize this technique, a method to perform efficient on-the-fly analysis of programs in general is de ..."
Abstract
-
Cited by 41 (13 self)
- Add to MetaCart
This work takes a fresh look at the simulation of cache memories. It introduces the technique of static cache simulation that statically predicts a large portion of cache references. To efficiently utilize this technique, a method to perform efficient on-the-fly analysis of programs in general is developed and proved correct. This method is combined with static cache simulation for a number of applications. The application of fast instruction cache analysis provides a new framework to evaluate instruction cache memories that outperforms even the fastest techniques published. Static cache simulation is shown to address the issue of predicting cache behavior, contrary to the belief that cache memories introduce unpredictability to real-time systems that cannot be efficiently analyzed. Static cache simulation for instruction caches provides a large degree of predictability for real-time systems. In addition, an architectural modification through bit-encoding is introduced that provides fu...
A Design Environment for Addressing Architecture and Compiler Interactions
, 1991
"... This paper presents an environment that integrates the tasks of translating a source program to machine instructions for a proposed architecture, imitating the execution of these instructions, and collecting measurements. The environment, which is easily retargeted and quickly collects detailed meas ..."
Abstract
-
Cited by 38 (24 self)
- Add to MetaCart
This paper presents an environment that integrates the tasks of translating a source program to machine instructions for a proposed architecture, imitating the execution of these instructions, and collecting measurements. The environment, which is easily retargeted and quickly collects detailed measurements, facilitates experimentation with a proposed architecture and a compiler.
BACH: BYU Address Collection Hardware, The Collection of Complete Traces
, 1992
"... Trace driven simulation is an important tool for computer systems performance analysis and prediction, but its accuracy decreases when incomplete or inaccurate traces are used for input. Nevertheless, many memory hierarchy simulation studies have been published which rely on such traces. In this pap ..."
Abstract
-
Cited by 31 (2 self)
- Add to MetaCart
Trace driven simulation is an important tool for computer systems performance analysis and prediction, but its accuracy decreases when incomplete or inaccurate traces are used for input. Nevertheless, many memory hierarchy simulation studies have been published which rely on such traces. In this paper we describe BACH, a hardware monitor developed to capture long, accurate, and complete traces on a variety of hardware and software platforms. BACH traces are long --- traces containing over 200 million contiguous references have been collected to date. BACH traces are accurate --- in contrast to other techniques such as inlining they contain almost no time and space dilation effects. BACH traces are complete --- they contain all references generated by the CPU during tracing, including prefetches and demand fetches from user code, system calls, exceptions, interrupts, and other system code. Finally, the traces produced using BACH are available to members of the general research community...

