Results 1 -
7 of
7
Embra: Fast and Flexible Machine Simulation
- In Measurement and Modeling of Computer Systems
, 1996
"... This paper describes Embra, a simulator for the processors, caches, and memory systems of uniprocessors and cache-coherent multiprocessors. When running as part of the SimOS simulation environment, Embra models the processors of a MIPS R3000/R4000 machine faithfully enough to run a commercial operat ..."
Abstract
-
Cited by 146 (3 self)
- Add to MetaCart
This paper describes Embra, a simulator for the processors, caches, and memory systems of uniprocessors and cache-coherent multiprocessors. When running as part of the SimOS simulation environment, Embra models the processors of a MIPS R3000/R4000 machine faithfully enough to run a commercial operating system and arbitrary user applications. To achieve high simulation speed, Embra uses dynamic binary translation to generate code sequences which simulate the workload. It is the first machine simulator to use this technique. Embra can simulate real workloads such as multiprocess compiles and the SPEC92 benchmarks running on Silicon Graphic's IRIX 5.3 at speeds only 3 to 9 times slower than native execution of the workload, making Embra the fastest reported complete machine simulator. Dynamic binary translation also gives Embra the flexibility to dynamically control both the simulation statistics reported and the simulation model accuracy with low performance overheads. For example, Embra...
Platune: A Tuning Framework for System-on-a-Chip Platforms
, 2002
"... System-on-a-chip (SOC) platform manufacturers are increasingly adding configurable features that provide power and performance flexibility in order to increase a platform's applicability. This paper presents a framework, called Platune, for performance and power tuning of one such SOC platform. Plat ..."
Abstract
-
Cited by 34 (9 self)
- Add to MetaCart
System-on-a-chip (SOC) platform manufacturers are increasingly adding configurable features that provide power and performance flexibility in order to increase a platform's applicability. This paper presents a framework, called Platune, for performance and power tuning of one such SOC platform. Platune is used to simulate an embedded application that is mapped onto the SOC platform and output performance and power metrics for any configuration of the SOC platform. Furthermore, Platune is used to automatically explore the large configuration space of such an SOC platform. The versatility, in terms of accuracy and speed of exploration, of Platune is demonstrated experimentally using three large benchmark examples. The power estimation techniques for processors, caches, memories, buses, and peripherals combined with the design space exploration algorithm deployed by Platune form a methodology for design of tuning frameworks for parameterized SOC platforms in general.
DiSenS: Scalable Distributed Sensor Network Simulation
- In Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 07
, 2005
"... Simulation is widely used for developing, evaluating and analyzing sensor network applications, especially when deploying a large scale sensor network remains expensive and labor intensive. However, due to its computation intensive nature, existent simulation tools have to make trade-offs between fi ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Simulation is widely used for developing, evaluating and analyzing sensor network applications, especially when deploying a large scale sensor network remains expensive and labor intensive. However, due to its computation intensive nature, existent simulation tools have to make trade-offs between fidelity and scalability and thus offer limited capabilities as design and analysis tools. In this paper, we introduce DiSenS (DIstributed SENsor network Simulation) – a highly scalable distributed simulation system for sensor networks. DiSenS does not only faithfully emulates an extensive set of sensor hardware and supports extensible radio/power models, so that sensor network applications can be simulated transparently with high fidelity, but also employs distributed-memory parallel cluster system to attack the complex simulation problem. Combining an efficient distributed synchronization protocol and a sophisticated node partitioning algorithm (based on existent research), DiSenS achieves greater scalability than even many discrete event simulators. On a small to medium size cluster (16-64 nodes), DiSenS is able to simulate hundreds of motes in realtime speed and scale to thousands in sub-realtime speed. To our knowledge, DiSenS is the first full-system sensor network simulator with such scalability.
A workload generation environment for trace-driven simulation of shared-bus multiprocessor
- Proc. 30th Ann. Hawaii Int. Conf. on System Sciences
, 1997
"... We describe an environment to produce traces representing significant workloads for a shared-bus shared-memory multiprocessor used as a general-purpose multitasking machine, where each processor can include multithread facilities. By means of an exclusively software approach, the environment produce ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We describe an environment to produce traces representing significant workloads for a shared-bus shared-memory multiprocessor used as a general-purpose multitasking machine, where each processor can include multithread facilities. By means of an exclusively software approach, the environment produces traces that include both user and kernel references, starting from source traces containing only user references. The process scheduling and the virtualto-physical address translation are simulated, whereas a stochastic model is provided for the generation of the kernel reference stream. The paper includes a Section describing the generation of three different workloads used to evaluate the performance of a shared-bus shared-memory multiprocessor. 1.
Performance Characterization of Shared Attraction Memories in Cluster-Based COMA Multiprocessors
- in Cluster-Based COMA Multiprocessors, Masters Thesis, SICS Research Report
, 1997
"... The performance of a COMA multiprocessor greatly depends on the efficiency of the large node caches also known as attraction memories. This thesis investigates the behaviour of the attraction memories when they are shared by more than one processor. From experiments with program-driven simulation it ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The performance of a COMA multiprocessor greatly depends on the efficiency of the large node caches also known as attraction memories. This thesis investigates the behaviour of the attraction memories when they are shared by more than one processor. From experiments with program-driven simulation it has been found that clustering significantly improves the performance of the attraction memory. Read and migratory sharing reduce the number of misses in a shared attraction memory machine and as a consequence there is also a reduction in network traffic as well as the number of replacements. Conversely, there is a demand for higher attraction memory bandwidth in a clustered machine since the attraction memory is shared by several processors. Unallocated memory in a COMA allows data to be replicated between several attraction memories. Our simulations have shown that performance can be maintained with less unallocated memory in a clustered machine since shared data within a cluster can shar...
A Study of the Efficiency of Shared Attraction Memories in Cluster-Based COMA Multiprocessors
, 1997
"... The performance of a COMA multiprocessor greatly depends on the efficiency of the large node caches, the attraction memories. When more than one processor share an attraction memory its behavior is changed. From experiments with program-driven simulation we have found that clustering may improve the ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The performance of a COMA multiprocessor greatly depends on the efficiency of the large node caches, the attraction memories. When more than one processor share an attraction memory its behavior is changed. From experiments with program-driven simulation we have found that clustering may improve the performance of the attraction memory significantly. Traffic is reduced, and the miss rates are lower for shared attraction memories. However, clustering may introduce contention for the attraction memory that may ruin any potential performance gain from increased attraction memory hit rate. Provided enough local bandwidth, application execution can remain efficient at higher memory pressure in clustered systems than in systems with single processor nodes. At very highmemory pressure some applications change behavior and start suffering from clustering. This is caused by conflict misses due to the relatively lower associativity of the shared attraction memory. 1. Introduction It is popular ...
SimGate: Full-System, Cycle-Close Simulation of the Stargate Sensor Network Intermediate Node
- In Proceedings of International Conference on Embedded Computer Systems: Architectures, MOdeling, and Simulation (IC-SAMOS), 2006. Samos
"... Abstract — We present SimGate – a full-system simulator for the Stargate intermediate-level, resource-constrained, sensor network device. We empirically evaluate the accuracy and performance of the system in isolation as well as coupled with simulated Mica2 motes. Our system is functionally correct ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract — We present SimGate – a full-system simulator for the Stargate intermediate-level, resource-constrained, sensor network device. We empirically evaluate the accuracy and performance of the system in isolation as well as coupled with simulated Mica2 motes. Our system is functionally correct and achieves accurate cycle estimation (i.e. cycle-close). Moreover, the overhead of simulated execution is modest with respect to previously published work. I.

