Results 1 -
2 of
2
pdes scale in environments with heterogeneous delays?” in Proc. of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation, 2013. TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 11 Jingjing Wang is a PhD student in the Department of
- Professor in the Department of Computer Science at SUNY Binghamton. His
, 1997
"... The performance and scalability of Parallel Discrete Event Simulation (PDES) is often limited by communication latencies and overheads. The emergence of multi-core processors and their expected evolution into many-cores offers the promise of low latency communication and tight memory integration bet ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
The performance and scalability of Parallel Discrete Event Simulation (PDES) is often limited by communication latencies and overheads. The emergence of multi-core processors and their expected evolution into many-cores offers the promise of low latency communication and tight memory integration between cores; these properties should significantly improve the performance of PDES in such environments. However, on clusters of multi-cores (CMs), the latency and processing overheads incurred when communicating between different machines (nodes) far outweigh those between cores on the same chip, especially when commodity networking fabrics and communication software are used. It is unclear if there is any benefit to the low latency among cores on the same node given that communication links across nodes are significantly worse. In this study, we examine the performance of a multi-threaded implementation of PDES on CMs. We demonstrate that the internode communication costs impose a substantial bottleneck on PDES and demonstrate that without optimizations addressing these long latencies, multi-threaded PDES does not significantly outperform the multiprocess version despite direct communication through shared memory on the individual nodes. We then propose three optimizations: message consolidation and routing, infrequent polling and latencysensitive model partitioning. We show that with these optimizations in place, threaded implementation of PDES significantly outperforms process-based implementation even on
Parallel Discrete Event Simulation for Multi-core Systems: Analysis and Optimization
"... Abstract—Parallel Discrete Event Simulation (PDES) can substantially improve the performance and capacity of simulation, allowing the study of larger, more detailed models, in less time. PDES is a fine-grained parallel application whose performance and scalability is limited by communication latenci ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
Abstract—Parallel Discrete Event Simulation (PDES) can substantially improve the performance and capacity of simulation, allowing the study of larger, more detailed models, in less time. PDES is a fine-grained parallel application whose performance and scalability is limited by communication latencies. Traditionally, PDES simulation kernels use message passing; often these simulators are written for distributed environments, and shared memory is used to optimize message passing among processes on the same machine. In this paper, we develop, characterize and optimize a thread-based version of a PDES simulator on three representative multi-core platforms. The multi-threaded implementation eliminates multiple message copying and significantly minimizes synchronization delays. We study the performance of the simulator on three hardware platforms: an Intel Core i7 machine, and a 48-core AMD Opteron Magny-Cours system, and a 64-core Tilera TilePro64. We discover that the three platforms encounter substantially different bottlenecks because of their different architectures. We identify these bottlenecks and propose mechanisms to overcome them. Our results show that multi-threaded implementation improves the performance over an MPI-based version by up to a factor of 3 on the Core i7, 1.4 on the AMD Magny-Cours, and 2.8 on the Tilera Tile64.