Results 1 -
8 of
8
Embra: Fast and Flexible Machine Simulation
- In Measurement and Modeling of Computer Systems
, 1996
"... This paper describes Embra, a simulator for the processors, caches, and memory systems of uniprocessors and cache-coherent multiprocessors. When running as part of the SimOS simulation environment, Embra models the processors of a MIPS R3000/R4000 machine faithfully enough to run a commercial operat ..."
Abstract
-
Cited by 146 (3 self)
- Add to MetaCart
This paper describes Embra, a simulator for the processors, caches, and memory systems of uniprocessors and cache-coherent multiprocessors. When running as part of the SimOS simulation environment, Embra models the processors of a MIPS R3000/R4000 machine faithfully enough to run a commercial operating system and arbitrary user applications. To achieve high simulation speed, Embra uses dynamic binary translation to generate code sequences which simulate the workload. It is the first machine simulator to use this technique. Embra can simulate real workloads such as multiprocess compiles and the SPEC92 benchmarks running on Silicon Graphic's IRIX 5.3 at speeds only 3 to 9 times slower than native execution of the workload, making Embra the fastest reported complete machine simulator. Dynamic binary translation also gives Embra the flexibility to dynamically control both the simulation statistics reported and the simulation model accuracy with low performance overheads. For example, Embra...
Using the SimOS Machine Simulator to Study Complex Computer Systems
- ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION
, 1997
"... ... This paper identifies two challenges that machine simulators such as SimOS must overcome in order to effectively analyze large complex workloads: handling long workload execution times and collecting data effectively. To study long-running workloads, SimOS includes multiple interchangeable simul ..."
Abstract
-
Cited by 144 (5 self)
- Add to MetaCart
... This paper identifies two challenges that machine simulators such as SimOS must overcome in order to effectively analyze large complex workloads: handling long workload execution times and collecting data effectively. To study long-running workloads, SimOS includes multiple interchangeable simulation models for each hardware component. By selecting the appropriate combination of simulation models, the user can explicitly control the tradeoff between simulation speed and simulation detail. To handle the large amount of low-level data generated by the hardware simulation models, SimOS contains flexible annotation and event classification mechanisms that map the data back to concepts meaningful to the user. SimOS has been extensively used to study new computer hardware designs, to analyze application performance, and to study operating systems. We include two case studies that demonstrate how a low-level machine simulator such as SimOS can be used to study large and complex workloads.
mlcache: A Flexible Multi-Lateral Cache Simulator
"... As the gap between processor and memory speeds increases, cache performance becomes more critical to overall system performance. To address this, processor designers typically design for the largest possible caches that can still remain on the ever growing processor die. However, alternate, mul ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
As the gap between processor and memory speeds increases, cache performance becomes more critical to overall system performance. To address this, processor designers typically design for the largest possible caches that can still remain on the ever growing processor die. However, alternate, multi-lateral cache designs such as the Assist Cache, Victim Cache, and NTS Cache have been shown to perform as well as or better than larger, single structure caches while requiring less die area. For a given die size, reducing the requirements to attain a given rate of data supply can allow more space dedicated for branch prediction, data forwarding, increasing the size of the reorder buffer, etc. Current cache simulators are not able to study a wide variety of multi-lateral cache configurations. Thus, the mlcache multi-lateral cache simulator was developed to help designers in the middle of the design cycle decide which cache configuration would best aid in attaining the desi...
Hardware/Software Codesign of the Stanford FLASH Multiprocessor
- In Proceedings of the IEEE Special Issue on Hardware/Software Co-design
, 1997
"... Hardware/software codesign is a methodology for solving design problems in systems with processors or embedded controllers where the design requirements mandate a functionality and performance level for the system, independent of the hardware and software boundary. In addition to the challenges of f ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Hardware/software codesign is a methodology for solving design problems in systems with processors or embedded controllers where the design requirements mandate a functionality and performance level for the system, independent of the hardware and software boundary. In addition to the challenges of functional correctness and total system performance, design time is often a critical factor. To design MAGIC, the programmable memory and communication controller for the Stanford FLASH multiprocessor, we employed a hardware/software codesign methodology. This methodology allowed us to concurrently design the hardware and software thereby reducing design time while simultaneously ensuring that the design would meet ambitious performance goals. Serializing the hardware and software design would have lengthened the design time and significantly increased the amount of redesign when the tradeoffs between the hardware and software implementations became clear late in the design process. The codesign approach led us to build a series of hierarchical simulators that allowed us to begin design verification early and to reduce the level of effort required to ensure a functional design. 1
Early Design Cycle Timing Simulation of Caches
- IN UNIVERSITY OF MICHIGAN { ANN ARBOR
, 1996
"... ..."
Flexible Timing Simulation of Multiple-Cache Configurations
, 1997
"... As the gap between processor and memory speeds increases, cache performance becomes more critical to overall system performance. Behavioral cache simulation is typically used early in the design cycle of new processor/cache configurations to determine the performance of proposed cache configurations ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
As the gap between processor and memory speeds increases, cache performance becomes more critical to overall system performance. Behavioral cache simulation is typically used early in the design cycle of new processor/cache configurations to determine the performance of proposed cache configurations on target workloads. However, behavioral cache simulation does not account for the latency seen by each memory access. The Latency-Effects (LE) cache model presented in this paper accounts this nominal latency as well as the additional latencies due to trailing-edge effects, bus width considerations, port conflicts, and the number of outstanding accesses that a cache allows before it blocks. We also extend the LE cache model to handle the latency effects of moving data among multiple caches. mlcache, a new, easily configurable and extensible tool, has been built based on the extended LE model. We show the use of mlcache in estimating the performance of traditional and novel cache configurat...
TRANSACTIONS of The Society for Modeling and Simulation International
- In Transactions of the Society for Modeling and Simulation International
, 2001
"... this article. We also thank Prof. Trevor Pearce at SCE, Carleton University, for his help with the final version. Sergio Zlotnik collaborated in the early stages of this project; his work is presented earlier in [51]. The research was partially funded by the Usenix Foundation and by NSERC. It was de ..."
Abstract
- Add to MetaCart
this article. We also thank Prof. Trevor Pearce at SCE, Carleton University, for his help with the final version. Sergio Zlotnik collaborated in the early stages of this project; his work is presented earlier in [51]. The research was partially funded by the Usenix Foundation and by NSERC. It was developed while Gabriel Wainer was an assistant professor at the Computer Sciences Department of the Universidad de Buenos Aires in Argentina
ARMSim: An Instruction-Set Simulator for the ARM processor
"... A hardware simulator is a piece of software that emulates specific hardware devices, enabling execution of software that is written and compiled for those devices, on alternate systems. This paper describes a simulator for the ARM processor, which is widely used in embedded devices like PDAs, cellul ..."
Abstract
- Add to MetaCart
A hardware simulator is a piece of software that emulates specific hardware devices, enabling execution of software that is written and compiled for those devices, on alternate systems. This paper describes a simulator for the ARM processor, which is widely used in embedded devices like PDAs, cellular phones etc. ARMSim is a lightweight ISA (Instruction Set Architecture) level simulator and a trace generator too. It has some optimizations at the decoder level to improve performance.

