Results 1 - 10
of
18
The Wisconsin Wind Tunnel: Virtual Prototyping of Parallel Computers
- In Proceedings of the 1993 ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems
, 1993
"... We have developed a new technique for evaluating cache coherent, shared-memory computers. The Wisconsin Wind Tunnel (WWT) runs a parallel sharedmemory program on a parallel computer (CM-5) and uses execution-driven, distributed, discrete-event simulation to accurately calculate program execution tim ..."
Abstract
-
Cited by 187 (26 self)
- Add to MetaCart
We have developed a new technique for evaluating cache coherent, shared-memory computers. The Wisconsin Wind Tunnel (WWT) runs a parallel sharedmemory program on a parallel computer (CM-5) and uses execution-driven, distributed, discrete-event simulation to accurately calculate program execution time. WWT is a virtual prototype that exploits similarities between the system under design (the target) and an existing evaluation platform (the host). The host directly executes all target program instructions and memory references that hit in the target cache. WWT's shared memory uses the CM-5 memory 's error-correcting code (ECC) as valid bits for a fine-grained extension of shared virtual memory. Only memory references that miss in the target cache trap to WWT, which simulates a cache-coherence protocol. WWT correctly interleaves target machine events and calculates target program execution time. WWT runs on parallel computers with greater speed and memory capacity than uniprocessors. WWT'...
Parallel simulation today
- Annals of Operations Research
, 1994
"... e-j 4r.,,D I-- " h",' _ k,) r,m '3'-. IC,-.-4 Z _ O ..."
Abstract
-
Cited by 74 (16 self)
- Add to MetaCart
e-j 4r.,,D I-- " h",' _ k,) r,m '3'-. IC,-.-4 Z _ O
Reducing Synchronization Overhead in Parallel Simulation
, 1995
"... Synchronization is often the dominant cost in conservative parallel simulation, particularly in simulations of parallel computers, in whichlow-latency simulated communication requires frequent synchronization. This thesis presents local barriers and predictive barrier scheduling,two techniques for r ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
Synchronization is often the dominant cost in conservative parallel simulation, particularly in simulations of parallel computers, in whichlow-latency simulated communication requires frequent synchronization. This thesis presents local barriers and predictive barrier scheduling,two techniques for reducing synchronization overhead in the simulation of message-passing multicomputers. Local barriers use nearest-neighbor synchronization to reduce waiting time at synchronization points. Predictive barrier scheduling, a novel technique whichschedules synchronizations using both compile-time and runtime analysis, reduces the frequency of synchronization operations. These techniques were evaluated by comparing their performance to that of periodic global synchronization. Experiments show that local barriers improve performance by up to 24% for communication-bound applications, while predictive barrier scheduling improves performance by up to 65% for applications with long local computation phases. Because the two techniques are complementary, I advocate a combined approach. This work was done in the context of Parallel Proteus, a new parallel simulator of message-passing multicomputers.
Composite synchronization in parallel discrete-event simulation
- IEEE Transactions on Parallel and Distributed Systems
, 2002
"... ..."
Cost/Performance of a Parallel Computer Simulator
- ACM Transactions on Modeling and Computer Simulation
, 1994
"... i ..."
Lookahead revisited in wireless network simulations
- In Proceedings of the 16th Workshop on Parallel and Distributed Simulation (PADS’02
, 2002
"... Abstract Rapid growth in wireless communication systems motivates the development of technology supporting the simulation of large-scale wireless systems. However, it is widely recognized that wireless communications do not have substantial "lookahead " needed by conservative synchronizati ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
Abstract Rapid growth in wireless communication systems motivates the development of technology supporting the simulation of large-scale wireless systems. However, it is widely recognized that wireless communications do not have substantial "lookahead " needed by conservative synchronization protocols. This paper focuses on identifying and exploiting lookahead for such models. We find lookahead in three ways, exploiting characteristics of low power networks, the transceiver logic, and the way in which protocol stacks are typically constructed. We show how these observations allow a variety of conservative synchronization protocols to take advantage of lookahead, describe a synchronization we use, and empirically examine the performance this method offers on a large-scale simulation of a sensor network intended for homeland defense scenarios.
Design and Performance Analysis of Hardware Support for Parallel Simulations
, 1993
"... It has been established elsewhere [Reyn92] that hardware to support parallel discrete event simulations (PDES) is desirable. We describe the steps leading to the implementation of a hardware-based framework to support PDES. We begin with an exploration of the criteria necessary to make such a framew ..."
Abstract
-
Cited by 12 (7 self)
- Add to MetaCart
It has been established elsewhere [Reyn92] that hardware to support parallel discrete event simulations (PDES) is desirable. We describe the steps leading to the implementation of a hardware-based framework to support PDES. We begin with an exploration of the criteria necessary to make such a framework both practical and useful, concluding that maintenance of sequential consistency is sufficient, while "observable" sequential consistency is more desirable but difficult to attain. We derive a functional design based on these criteria, and from that derive a prototype design. Also, we establish the utility of our design, showing that computation of critical global values, such as global virtual time, can be done in times two orders of magnitude or better than typical event times in discrete event simulations. ############################################################################################## 1. Introduction The need for special purpose hardware to support efficient parallel d...
Modeling Cost/Performance of a Parallel Computer Simulator
, 1996
"... ing with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works, requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept, ACM Inc., 1515 Broadway, New York, N ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
ing with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works, requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept, ACM Inc., 1515 Broadway, New York, NY 10036 USA, fax +1 (212) 869-0481, or permissions@acm.org. Modeling Cost/Performance of a Parallel Computer Simulator Babak Falsafi and David A. Wood This paper examines the cost/performance of simulating a hypothetical target parallel computer using a commercial host parallel computer. We address the question of whether parallel simulation is simply faster than sequential simulation, or if it is also more cost-effective. To answer this, we develop a performance model of the Wisconsin Wind Tunnel (WWT), a system that simulates cache-coherent shared-memory machines on a message-passing Thinking Machines CM5. The performance model uses Kruskal and Weiss's fork-join model to account for the...
Synchronous Parallel Discrete Event Simulation on Shared-Memory Multiprocessors
- In 6 th Workshop on Parallel and Distributed Simulation
, 1992
"... This paper describes the implementation and studies the performance of a synchronous, parallel discrete event simulation (SPDES) method on a shared memory multiprocessor. The presented method aims at the efficient simulation of architectural designs for which the asynchronous PDES methods seem to be ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
This paper describes the implementation and studies the performance of a synchronous, parallel discrete event simulation (SPDES) method on a shared memory multiprocessor. The presented method aims at the efficient simulation of architectural designs for which the asynchronous PDES methods seem to be less effective. A multiprocessor machine is simulated, and the performance achieved is compared to the performance of a parallel version of the synchronous event-driven simulation method (Parsim). The results show that the SPDES method alleviates bottlenecks usually attributed to synchronous methods, and thus we are able to efficiently exploit most of the parallelism available in the simulation of synchronous architectural designs. 1 Introduction In recent years, advances in the area of computer architecture have resulted in increasingly more complex designs. As the complexity of the designs has increased it has become more and more difficult to study their behavior using analytically trac...
Improved Parallel Architectural Simulations on Shared-Memory Multiprocessors
- in Proceedings of the 8th Workshop on Parallel and Distributed Simulations
, 1994
"... this paper we present a synchronous, parallel, event-driven approach (SPaDES). It differs from previous approaches in many ways: (1) it contains a single, global synchronization operation per simulation phase; (2) it introduces a minimum number of kernel operations into the simulation; (3) it allows ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
this paper we present a synchronous, parallel, event-driven approach (SPaDES). It differs from previous approaches in many ways: (1) it contains a single, global synchronization operation per simulation phase; (2) it introduces a minimum number of kernel operations into the simulation; (3) it allows for efficient processor self-scheduling; and (4) it aggressively exposes parallelism only

