DMCA
© Copyright Operating Systems Review 2009. COTSon: Infrastructure for Full System Simulation
Citations
1420 | The SPLASH-2 programs: characterization and methodological considerations. ISCA,
- Woo, Ohara, et al.
- 1995
(Show Context)
Citation Context ...roblem, COTSon feeds back scheduling information to raise the priority of software threads with a low number of instructions. Figure 4 shows the accumulated IPC (throughput) over time of two SPLASH-2 =-=[28]-=- benchmarks, FFT and Barnes, executing in a manycore machine with a varying number of cores (64–1024 cores). Each line in the graph corresponds to a different number of cores; the y axis shows accumul... |
991 | Pin: building customized program analysis tools with dynamic instrumentation
- Luk
- 2005
(Show Context)
Citation Context ...d timing simulators tightly coupled, which makes it easier for the timing to control the functional. Trace-driven simulators are usually built using instrumentation libraries such as Atom [23] or Pin =-=[13]-=-, which simplify the functional simulation by running natively in the machine. As a middle ground, Mauer et al. [14] propose a timingfirst approach where the timing simulator runs ahead and uses the f... |
848 | Scalable Molecular Dynamics with NAMD
- Phillips, Braun, et al.
- 2005
(Show Context)
Citation Context ...uanta. The results shown in this figure correspond to a cluster with 8 nodes. We present two sets of results, a first one using the NAS benchmark suite [17], and a second one using the NAMD benchmark =-=[19]-=-. Simulation Speedup 100 10 NAMD 10 NAMD dyn 2 NAMD dyn 1 NAS 10 NAMD 1000 NAS dyn 1 NAMD 100 NAS dyn 2 NAS 100 NAS 1000 1 0% 10% 20% 30% 40% 50% 60% 70% Accuracy Error Figure 3: Adaptive quantum resu... |
818 | Parallel discrete event simulation,”
- Fujimoto
- 1990
(Show Context)
Citation Context ... COTSon distributes the simulation of the different cores over multiple hosts. Synchronizing these COTSon node instances can be accomplished using Parallel Discrete Event Simulation (PDES) techniques =-=[9, 15]-=-, all of which basically require simulation to synchronize at given intervals, called quanta. Unfortunately, doing so in a straightforward way implies forcing very small synchronization quanta, smalle... |
783 | Atom: A system for building customized program analysis tools.
- SRIVASTAVA, EUSTACE
- 1994
(Show Context)
Citation Context ...unctional and timing simulators tightly coupled, which makes it easier for the timing to control the functional. Trace-driven simulators are usually built using instrumentation libraries such as Atom =-=[23]-=- or Pin [13], which simplify the functional simulation by running natively in the machine. As a middle ground, Mauer et al. [14] propose a timingfirst approach where the timing simulator runs ahead an... |
778 | Automatically characterizing large scale program behavior
- Sherwood
- 2002
(Show Context)
Citation Context ...tistics of the complete execution. Previous work has shown that an adequate sampler can yield excellent simulation accuracy. The two most cited samplers for microarchitectural simulation are SimPoint =-=[22]-=- and SMARTS [29]. 4.1.1 SMARTS SMARTS employs systematic sampling. It makes use of statistical analysis in order to determine the number of instructions that need to be simulated in the desired benchm... |
426 |
Performance Evaluation Corporation
- Standard
(Show Context)
Citation Context ...ccuracy trade-offs of the proposed Dynamic Sampling approach and how it compares with SMARTS and SimPoint sampling techniques. In these experiments, we simulate the whole SPEC CPU2000 benchmark suite =-=[24]-=- using the reference input until completion or until they reach 240 billion instructions, whichever occurs first. On the x axis we plot the accuracy error versus what we obtain in a full-timing run (s... |
288 | Distributed discrete-event simulation,”
- Misra
- 1986
(Show Context)
Citation Context ... COTSon distributes the simulation of the different cores over multiple hosts. Synchronizing these COTSon node instances can be accomplished using Parallel Discrete Event Simulation (PDES) techniques =-=[9, 15]-=-, all of which basically require simulation to synchronize at given intervals, called quanta. Unfortunately, doing so in a straightforward way implies forcing very small synchronization quanta, smalle... |
285 |
a fast and portable dynamic translator.
- QEMU
- 2005
(Show Context)
Citation Context ... be precise and have been historically used to verify correctness of systems and to do early software development before the hardware is available. Recently some emulators (such as SimOS [21] or QEMU =-=[4]-=-) became fast enough to approximate native execution. These have evolved into virtual machines and hypervisors which have been used to isolate, consolidate, encapsulate and also provide hardware indep... |
258 | SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling,” in ISCA,
- Wunderlich, Wenisch, et al.
- 2003
(Show Context)
Citation Context ...omplete execution. Previous work has shown that an adequate sampler can yield excellent simulation accuracy. The two most cited samplers for microarchitectural simulation are SimPoint [22] and SMARTS =-=[29]-=-. 4.1.1 SMARTS SMARTS employs systematic sampling. It makes use of statistical analysis in order to determine the number of instructions that need to be simulated in the desired benchmark (number of s... |
191 | Complete Computer System Simulation: the SimOS Approach,”
- Rosenblum, Herrod, et al.
- 1995
(Show Context)
Citation Context ...mulators must be precise and have been historically used to verify correctness of systems and to do early software development before the hardware is available. Recently some emulators (such as SimOS =-=[21]-=- or QEMU [4]) became fast enough to approximate native execution. These have evolved into virtual machines and hypervisors which have been used to isolate, consolidate, encapsulate and also provide ha... |
91 | Synergistic Processing in Cell’s Multicore Architecture.
- Gschwind, Hofstee, et al.
- 2006
(Show Context)
Citation Context ...te with relative accuracy betweendifferent timing simulations is enough for users to discover trends for the proposed techniques. If we look at the trajectory of general purpose multicore processors =-=[6, 10, 18]-=-, we can see that the number of cores per die is expected to grow quadratically with each new generation. Some specialized parts are already appearing with up to hundreds of cores [1, 3]. This trend b... |
82 | Full-System Timing-First Simulation.
- Mauer, Hill, et al.
- 2002
(Show Context)
Citation Context ... lightweight cores. It also adds enormous pressure to the simulation infrastructure. A defining characteristic of simulators is the control relationship between their functional and timing components =-=[14]-=-. In timing-directed simulation (also called executiondriven), the timing model is responsible for driving the functional simulation. The execution-driven approach allows for higher simulation accurac... |
74 | Choosing representative slices of program execution for microarchitecture simulations: A preliminary application to the data stream. In Workload characterization of emerging computer applications,
- Lafage, Seznec
- 2001
(Show Context)
Citation Context ...CPU events is several orders of magnitude bigger than that of any other kind of device. It was clear that optimizing simulation performance came through optimizing CPU simulation. Sampling techniques =-=[11]-=- selectively turn on and off timing simulation, and are among the most promising for improving timing simulation. Other techniques, such as using a reduced input set or simulating just an initial port... |
69 |
Tile64-Processor: A 64-Core soc with Mesh Interconnect.
- Bell, Edwards, et al.
- 2008
(Show Context)
Citation Context ...ocessors [6, 10, 18], we can see that the number of cores per die is expected to grow quadratically with each new generation. Some specialized parts are already appearing with up to hundreds of cores =-=[1, 3]-=-. This trend broadens the variability of architectural design choices, such as cache hierarchy, heterogeneity, and use of lightweight cores. It also adds enormous pressure to the simulation infrastruc... |
64 | Corona: System implications of emerging nanophotonic technology.
- Vantrease, Schreiber, et al.
- 2008
(Show Context)
Citation Context ...re threads that were previously mapped to hardware threads into threads mapped to each of the cores. We believe that this approach, which has already been used successfully in several research papers =-=[25, 27]-=-, is an important first step towards the simulation of chip multiprocessors and future manycore architectures. COTSon augments the functional simulator to identify the instruction streams. Instruction... |
55 | The strong correlation between code signatures and performance
- Lau, Sampson, et al.
- 2005
(Show Context)
Citation Context ...nd is a clear indicator of the behavior of the running applications. The correlation of changes in code locality with overall performance is a property that other researchers have already established =-=[12]-=-. The SimNow simulator also keeps track of statistics of its internal structures, such as the translation cache and the software translation lookaside buffer (TLB, necessary for efficient implementati... |
55 |
A Comprehensive Memory Modeling Tool and its Application to the Design and Analysis of Future Memory Hierarchies,” in ISCA,
- Thoziyoor, Ahn, et al.
- 2008
(Show Context)
Citation Context ...re threads that were previously mapped to hardware threads into threads mapped to each of the cores. We believe that this approach, which has already been used successfully in several research papers =-=[25, 27]-=-, is an important first step towards the simulation of chip multiprocessors and future manycore architectures. COTSon augments the functional simulator to identify the instruction streams. Instruction... |
42 | Characterizing and comparing prevailing simulation techniques
- Yi, Kodakara, et al.
- 2005
(Show Context)
Citation Context ...rarely try to optimize their functional simulation. After all, it represents just a minuscule part of their total execution. The best way of speeding up timing simulation is considered to be sampling =-=[31]-=-. Sampling consists of determining what are the interesting or representative phases of the simulation and just simulating those. The results from these samples are then combined to produce global res... |
33 |
An Integrated Quad-Core Opteron Processor
- Dorsey
(Show Context)
Citation Context ...te with relative accuracy betweendifferent timing simulations is enough for users to discover trends for the proposed techniques. If we look at the trajectory of general purpose multicore processors =-=[6, 10, 18]-=-, we can see that the number of cores per die is expected to grow quadratically with each new generation. Some specialized parts are already appearing with up to hundreds of cores [1, 3]. This trend b... |
21 |
The future of simulation: A field of dreams.
- YI, EECKHOUT, et al.
- 2006
(Show Context)
Citation Context ...researchers, developers and system designers understand the impact of their design decisions. The panel discussion of the 2004 Intl. Symp. on Performance Analysis of Systems and Software (captured in =-=[30]-=-) presents five important suggestions: 1) allow for multiprocessor and multithreaded simulation of operating systems (OS) and applications, 2) improve sampling techniques, 3) use higher-speed alternat... |
18 | Combining Simulation and Virtualization through Dynamic Sampling,” in ISPASS,
- Falcon, Faraboschi, et al.
- 2007
(Show Context)
Citation Context ..., requiring dynamic information for each instruction, preventing COTSon from ever running in the functional phase which produces the biggest speed improvement. 4.1.4 Dynamic Sampling Dynamic Sampling =-=[7]-=- dynamically adapts the timing simulation to the application characteristics (where application includes the full system simulation). This approach has two fundamental advantages: 1) it frees COTSon f... |
18 |
Virtual Platform: A Virtual Machine Monitor for Commodity PCs
- Rosenblum
- 1999
(Show Context)
Citation Context ...me compiling and code caching. The functional simulation is handled by the SimNow simulator which has a typical slowdown of 10× with respect to native execution. Other virtual machines such as VMWare =-=[20]-=- or QEMU [4] have smaller slowdowns of around 25%, but their lower functional fidelity and limited range of supported devices make them unsuitable for a fullsystem simulator. To speed up timing simula... |
12 |
How to simulate 1000 cores.
- Monchiero, Ahn, et al.
- 2009
(Show Context)
Citation Context ...e BIOS and the OS need substantial changes that enable them to manage a large number of cores in an efficient way. In parallel with the more general approach, COTSon has implemented a novel technique =-=[16]-=-, which — although more limited in its applicability — shows great potential for a quick study of several high-performance computing benchmark suites. COTSon converts time-multiplexed threads into spa... |
11 |
An Adaptive Synchronization Technique for Parallel Simulation of Networked Clusters,”
- Falcon, Faraboschi, et al.
- 2008
(Show Context)
Citation Context ... few microseconds, the overhead of perfect synchronization would cause about two orders of magnitude slowdown in cluster simulations. We have implemented an adaptive quantum synchronization technique =-=[8]-=- that follows COTSon’s accuracy-vs.-speed trade-off philosophy. Since applications are not always sending packets, they do not need to work at the smallest synchronization quantum during intervals whe... |
4 |
Simnow: Fast platform simulation purely in software
- Bedicheck
- 2004
(Show Context)
Citation Context ...ators. These are very complicated to build and maintain, but on the other hand, there are plenty of them being developed and maintained. COTSon’s functional simulator uses AMD’s SimNow TM simulator 1 =-=[2]-=-. COTSon’s decoupled architecture is highly modular. Many interfaces have been designed and built with sufficient generality so that exchanging many of COTSon’s functionalities is easy. This modularit... |
4 |
The NAS parallel benchmarks. http://www.nas.nasa.gov/Resources/ Software/npb.html
- Center
(Show Context)
Citation Context ...y compare with the experiments run with bigger quanta. The results shown in this figure correspond to a cluster with 8 nodes. We present two sets of results, a first one using the NAS benchmark suite =-=[17]-=-, and a second one using the NAMD benchmark [19]. Simulation Speedup 100 10 NAMD 10 NAMD dyn 2 NAMD dyn 1 NAS 10 NAMD 1000 NAS dyn 1 NAMD 100 NAS dyn 2 NAS 100 NAS 1000 1 0% 10% 20% 30% 40% 50% 60% 70... |
3 |
Massively Parallel Processor Array technology. http://www.ambric.com
- Ambric
(Show Context)
Citation Context ...ocessors [6, 10, 18], we can see that the number of cores per die is expected to grow quadratically with each new generation. Some specialized parts are already appearing with up to hundreds of cores =-=[1, 3]-=-. This trend broadens the variability of architectural design choices, such as cache hierarchy, heterogeneity, and use of lightweight cores. It also adds enormous pressure to the simulation infrastruc... |
3 |
An 8-core, 64-thread, 64-bit power efficient
- Johnson, Nawathe
- 2007
(Show Context)
Citation Context ...te with relative accuracy betweendifferent timing simulations is enough for users to discover trends for the proposed techniques. If we look at the trajectory of general purpose multicore processors =-=[6, 10, 18]-=-, we can see that the number of cores per die is expected to grow quadratically with each new generation. Some specialized parts are already appearing with up to hundreds of cores [1, 3]. This trend b... |