Results 1 -
8 of
8
Trace-Driven Memory Simulation: A Survey
- ACM Computing Surveys
, 2004
"... This article surveys and analyzes these developments by establishing criteria for evaluating trace-driven methods, and then applies these criteria to describe, categorize, and compare over 50 trace-driven simulation tools. We discuss the strengths and weaknesses of different approaches and show t ..."
Abstract
-
Cited by 134 (0 self)
- Add to MetaCart
This article surveys and analyzes these developments by establishing criteria for evaluating trace-driven methods, and then applies these criteria to describe, categorize, and compare over 50 trace-driven simulation tools. We discuss the strengths and weaknesses of different approaches and show that no single method is best when all criteria, including accuracy, speed, memory, flexibility, portability, expense, and ease of use are considered. In a concluding section, we examine fundamental limitations to trace-driven simulation, and survey some recent developments in memory simulation that may overcome these bottlenecks
Efficient Procedure Mapping using Cache Line Coloring
- IN PROCEEDINGS OF THE SIGPLAN'97 CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 1997
"... As the gap between memory and processor performance continues to widen, it becomes increasingly important to exploit cache memory effectively. Both hardware and software approaches can be explored to optimize cache performance. Hardware designers focus on cache organization issues, including replace ..."
Abstract
-
Cited by 67 (12 self)
- Add to MetaCart
As the gap between memory and processor performance continues to widen, it becomes increasingly important to exploit cache memory effectively. Both hardware and software approaches can be explored to optimize cache performance. Hardware designers focus on cache organization issues, including replacement policy, associativity, line size and the resulting cache access time. Software writers use various optimization techniques, including software prefetching, data scheduling and code reordering. Our focus is on improving memory usage through code reordering compiler techniques. In this
Operating System Impact on Trace-Driven Simulation
- Proc. of the 31st Annual Simulation Symposium
, 1998
"... Trace-driven simulation is commonly used by the computer architecture research community to pursue answers to a wide variety of architectural design issues. Traces taken from benchmark execution have been extensively studied to optimize the design of pipelines, branch predictors, and especially cach ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Trace-driven simulation is commonly used by the computer architecture research community to pursue answers to a wide variety of architectural design issues. Traces taken from benchmark execution have been extensively studied to optimize the design of pipelines, branch predictors, and especially cache memories. Today's computer designs have been optimized based on the characteristics of these benchmarks. One important aspect that has been ignored in a majority of these trace-driven studies is the effect of the operating system interacting with the benchmark. It has been acknowledged that operating system overhead can introduce a level of interference that can limit the benefits of new designs. The reason why the operating system has been, for the most part, ignored in these studies is the lack of readily available tools that can generate kernel-laden traces. In this paper we describe two tracing systems that allow the capture of operating system and application traces. We have captured...
Tracing and Characterization of NT-based System Workloads
- Digital Technical Journal
, 1998
"... Trace-driven simulation is commonly used by the computer architecture research community to pursue answers to a wide variety of architectural design issues. Traces taken from benchmark execution (e.g., SPEC, Bytemark, SPLASH) have been studied extensively to optimize the design of pipelines, branch ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Trace-driven simulation is commonly used by the computer architecture research community to pursue answers to a wide variety of architectural design issues. Traces taken from benchmark execution (e.g., SPEC, Bytemark, SPLASH) have been studied extensively to optimize the design of pipelines, branch predictors, and especially cache memories. Today's computer designs have been optimized based on the characteristics of these benchmarks. As applications become more dependent on services and APIs provided by the hosting operating system, the overall application performance becomes more dependent on efficient operating system interaction. It has been acknowledged that operating system overhead can greatly affect the benefits provided by a new design feature. The reason why the operating system interaction has, for the most part, been ignored in past architectural studies is the lack of available tools that can generate kernel-laden traces. In this contribution we describe the ongoing efforts...
Performance analysis on a CC-NUMA prototype
, 1997
"... Cache-coherent nonuniform memory access (CC-NUMA) machines have been shown to be a promising paradigm for exploiting distributed execution. CC-NUMA systems can provide performance typically associated with parallel machines, without the high cost associated with parallel programming. This is because ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Cache-coherent nonuniform memory access (CC-NUMA) machines have been shown to be a promising paradigm for exploiting distributed execution. CC-NUMA systems can provide performance typically associated with parallel machines, without the high cost associated with parallel programming. This is because a single image of memory is provided on a CC-NUMA machine. Past research on CC-NUMA machines has focused on modifications to the memory hierarchy, interconnect topology and memory consistency protocols, which which are all areas critical to achieving scalable performance. The research described here expands this focus to issues associated with operating system structures which can increase system scalability. We describe a hardware/software prototyping study which investigates how changes to the operating system of a commercial IBM AS/400 system can provide scalable performance when running transaction processing workloads. The project described was a joint effort between researchers at the...
Flexible Simulation of Distributed Protocols for Mobile Computing
- In Proc. 3rd Workshop on Modeling, Analysis and Simulation of Wireless and Mobile Systems (MSWIM
, 2000
"... This article describes MobiCS (Mobile Computing Simulator), a distributed protocol simulator for mobile computing that facilitates the prototyping and testing of protocols based on high-level programming abstractions and simulation transparency. The main contribution of MobiCS is the implementation ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This article describes MobiCS (Mobile Computing Simulator), a distributed protocol simulator for mobile computing that facilitates the prototyping and testing of protocols based on high-level programming abstractions and simulation transparency. The main contribution of MobiCS is the implementation of a software architecture for simulators that supports interchangeable simulation modes without affecting the distributed protocols being prototyped. Through MobiCS we aim at providing a unified tool both for testing the correctness of a protocol and for evaluating its performance in a simulated environment.
Performance analysis on a CC-NUMA prototype
, 1997
"... Cache-coherent nonuniform memory access (CC-NUMA) machines have been shown to be a promising paradigm for exploiting distributed execution. CC-NUMA systems can provide performance typically associated with parallel machines, without the high cost associated with parallel programming. ..."
Abstract
- Add to MetaCart
Cache-coherent nonuniform memory access (CC-NUMA) machines have been shown to be a promising paradigm for exploiting distributed execution. CC-NUMA systems can provide performance typically associated with parallel machines, without the high cost associated with parallel programming.
Cache Line Coloring Procedure Placement Using Real and Estimated Profiles
"... Efficient exploitation of the available cache memory space can have a significant improvement in program performance. By carefully restructuring a program such that temporally local sequences of instructions are mapped to different portions of the cache, fewer cache conflicts will result. In this pa ..."
Abstract
- Add to MetaCart
Efficient exploitation of the available cache memory space can have a significant improvement in program performance. By carefully restructuring a program such that temporally local sequences of instructions are mapped to different portions of the cache, fewer cache conflicts will result. In this paper we present a link-time procedure mapping algorithm which views the cache as a colored address space, each color representing a different cache line. The idea of coloring is borrowed from the register allocation problem. We consider the size of the cache and the size of a cache line in this work, when coloring procedures to indicate the current mapping of the procedure in the cache space. This algorithm takes as input a weighted program call graph, with nodes representing procedures and edge weights representing call frequencies. This graph is used as input to our color-based mapping algorithm to provide an improved program layout. The results in this paper are presented for reducing the ...

