Results 1 - 10
of
19
POEMS: End-to-End Performance Design of Large Parallel Adaptive Computational Systems
- IEEE Transactions on Software Engineering
, 2001
"... The POEMS project is creating an environment for end-to-end performance modeling of complex parallel and distributed systems, spanning the domains of application software, runtime and operating system software, and hardware architecture. Towards this end, the POEMS framework supports composition o ..."
Abstract
-
Cited by 44 (10 self)
- Add to MetaCart
The POEMS project is creating an environment for end-to-end performance modeling of complex parallel and distributed systems, spanning the domains of application software, runtime and operating system software, and hardware architecture. Towards this end, the POEMS framework supports composition of component models from these different domains into an end-to-end system model. This composition can be specified using a generalized graph model of a parallel system, together with interface specifications that carry information about component behaviors and evaluation methods. The POEMS Specification Language compiler, under development, will generate an end-to-end system model automatically from such a specification. The components of the target system may be modeled using different modeling paradigms (analysis, simulation, or direct measurement) and may be modeled at various levels of detail. As a result, evaluation of a POEMS end-to-end system model may require a variety of eval...
Performance Prediction of Large Parallel Applications Using Parallel Simulations
, 1999
"... Accurate simulation of large parallel applications can be facilitated with the use of direct execution and parallel discrete event simulation. This paper describes the use of COMPASS, a direct execution-driven, parallel simulator for performance prediction of programs that include both communica ..."
Abstract
-
Cited by 26 (11 self)
- Add to MetaCart
Accurate simulation of large parallel applications can be facilitated with the use of direct execution and parallel discrete event simulation. This paper describes the use of COMPASS, a direct execution-driven, parallel simulator for performance prediction of programs that include both communication and I/O intensive applications. The simulator has been used to predict the performance of such applications on both distributed memory machines like the IBM SP and shared-memory machines like the SGI Origin 2000. The paper illustrates the usefulness of COMPASS as a versatile performance prediction tool. We use both real-world applications and synthetic benchmarks to study application scalability, sensitivity to communication latency, and the interplay between factors like communication pattern and parallel file system caching on application performance. We also show that the simulator is accurate in its predictions and that it is also efficient in its ability to use parallel si...
Optimizing Threaded MPI Execution on SMP Clusters
- IN PROC. OF 15TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING
, 2001
"... Our previous work has shown that using threads to execute MPI programs can yield great performance gain on multiprogrammed shared-memory machines. This paper investigates the design and implementation of a thread-based MPI system on SMP clusters. Our study indicates that with a proper design for thr ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
Our previous work has shown that using threads to execute MPI programs can yield great performance gain on multiprogrammed shared-memory machines. This paper investigates the design and implementation of a thread-based MPI system on SMP clusters. Our study indicates that with a proper design for threaded MPI execution, both point-to-point and collective communication performance can be improved substantially, compared to a processbased MPI implementation in a cluster environment. Our contribution includes a hierarchy-aware and adaptive communication scheme for threaded MPI execution and a thread-safe network device abstraction that uses event-driven synchronization and provides separated collective and point-to-point communication channels. This paper describes the implementation of our design and illustrates its performance advantage on a Linux SMP cluster.
POEMS: End-to-End Performance Design of Large Parallel Adaptive Computational Systems
- In Proceedings of First International Workshop on Software and Performance (WOSP
, 1998
"... The POEMS project is creating an environment for endto -end performance modeling of complex parallel and distributed systems, spanning the domains of application software, runtime and operating system software, and hardware architecture. To enable end-to-end modeling of large-scale applications and ..."
Abstract
-
Cited by 20 (9 self)
- Add to MetaCart
The POEMS project is creating an environment for endto -end performance modeling of complex parallel and distributed systems, spanning the domains of application software, runtime and operating system software, and hardware architecture. To enable end-to-end modeling of large-scale applications and systems, the POEMS framework is designed to compose models of system components from these different domains, to integrate multiple modeling paradigms (analytical modeling, simulation, and actual system execution), and to allow different components to be modeled at multiple levels of detail. The key components of the POEMS framework include a generalized task graph model for describing parallel computations, automatic generation of the task graph by a parallelizing compiler, a specification language for mapping the computation on models for operating system and hardware components, a library of analytical and simulation models for components from the different domains, and a knowledge base d...
Program transformation and runtime support for threaded MPI execution on shared-memory machines
- ACM Transactions on Programming Languages and Systems
, 2000
"... Parallel programs written in MPI have been widely used for developing high-performance applications on various platforms. Because of a restriction of the MPI computation model, conventional MPI implementations on shared memory machines map each MPI node to an OS process, which can suffer serious per ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Parallel programs written in MPI have been widely used for developing high-performance applications on various platforms. Because of a restriction of the MPI computation model, conventional MPI implementations on shared memory machines map each MPI node to an OS process, which can suffer serious performance degradation in the presence of multiprogramming. This paper studies compile-time and runtime techniques for enhancing performance portability of MPI code running on multiprogrammed shared memory machines. The proposed techniques allow MPI nodes to be executed safely and efficiently as threads. Compile-time transformation eliminates global and static variables in C code using node-specific data. The runtime support includes an efficient and provablycorrect communication protocol that uses lock-free data structure and takes advantage of address space sharing among threads. The experiments on SGI Origin 2000 show that our MPI prototype called TMPI using the proposed techniques is competitive with SGI’s native MPI implementation in a dedicated environment, and that it has significant performance advantages in a multiprogrammed environment.
Improving Lookahead in Parallel Discrete Event Simulations of Large-Scale Applications Using Compiler Analysis
- In Proc. 15th Workshop on Parallel and Distributed Simulation (PADS 01), Lake Arrowhead
, 2001
"... This paper addresses the issue of efficient and accurate petformance prediction of large-scale message-passing applications on high petforniance architectures using sinidation. Such simulators are often based on parallel discrete event simulation, Qpically using the conservative protocol to synchron ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
This paper addresses the issue of efficient and accurate petformance prediction of large-scale message-passing applications on high petforniance architectures using sinidation. Such simulators are often based on parallel discrete event simulation, Qpically using the conservative protocol to synchronize the simulation threads. The paper considers how a compiler cat1 be used to autoniatically extract information about the lookahead present in the application, and how this can be used to improve the performance of the null protocol used for synchronization. These techniques are implemented in the MPI-Sin1 siniulator and dHPF compiler, which had previously been extended to work together for optimizing the sinidation of local coniputational components of an application. The results show that the availability of lookahead iiforniation iniproves the runtinie of the siniulator by factors rarigitig from 9 % up to two orders of niagnitude, with 30.60% iniprovenietits being typical for the real-world codes. The experiments also show that these iniprovements are directly correlated with reductions in the number of ruill niessages required by the simulations. 1
Compiler-Supported Simulation of Highly Scalable Parallel Applications
, 1999
"... In this paper, we propose and evaluate practical, automatic techniques that exploit compiler analysis to facilitate simulation of very large message-passing systems. We use a compilersynthesized static task graph model to identify the control-flow and the subset of the computations that determine ..."
Abstract
-
Cited by 10 (8 self)
- Add to MetaCart
In this paper, we propose and evaluate practical, automatic techniques that exploit compiler analysis to facilitate simulation of very large message-passing systems. We use a compilersynthesized static task graph model to identify the control-flow and the subset of the computations that determine the parallelism, communication and synchronization of the code, and to generate symbolic estimates of sequential task execution times. This information allows us to avoid executing or simulating large portions of the computational code during the simulation. We have used these techniques to integrate the MPI-Sim parallel simulator at UCLA with the Rice dHPF compiler infrastructure. The integrated system can simulate unmodified High Performance Fortran (HPF) programs compiled to the Message-Passing Interface standard (MPI) by the dHPF compiler, and we expect to simulate MPI programs as well. We evaluate the accuracy and benefits of these techniques for three standard benchmarks on a w...
Asynchronous Parallel Simulation of Parallel Programs
, 2000
"... Parallel simulation of parallel programs for large datasets has been shown to oer signicant reduction in the execution time of many discrete event models. This paper describes the design and implementation of MPI-SIM, a library for the execution driven parallel simulation of task and data paralle ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
Parallel simulation of parallel programs for large datasets has been shown to oer signicant reduction in the execution time of many discrete event models. This paper describes the design and implementation of MPI-SIM, a library for the execution driven parallel simulation of task and data parallel programs. MPI-SIM can be used to predict the performance of existing programs written using MPI for message-passing, or written in UC, a data parallel language, compiled to use message-passing. The simulation models can be executed sequentially or in parallel. Parallel execution of the models are synchronized using a set of asynchronous conservative protocols. This paper demonstrates how protocol performance is improved by the use of application-level, runtime analysis. The analysis targets the communication patterns of the application. We show the application-level analysis for message passing and data parallel languages. We present the validation and performance results for the ...
Compile/Run-time Support for Threaded MPI Execution on Multiprogrammed Shared Memory Machines
- In Proceedings of the 7th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, 1999
"... MPI is a message-passing standard widely used for developing high-performance parallel applications. Because of the restriction in the MPI computation model, conventional implementations on shared memory machines map each MPI node to an OS process, which suffers serious performance degradation in th ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
MPI is a message-passing standard widely used for developing high-performance parallel applications. Because of the restriction in the MPI computation model, conventional implementations on shared memory machines map each MPI node to an OS process, which suffers serious performance degradation in the presence of multiprogramming, especially when a space/time sharing policy is employed in OS job scheduling. In this paper, we study compile-time and run-time support for MPI by using threads and demonstrate our optimization techniques for executing a large class of MPI programs written in C. The compile-time transformation adopts thread-specific data structures to eliminate the use of global and static variables in C code. The run-time support includes an efficient pointto -point communication protocol based on a novel lock-free queue management scheme. Our experiments on an SGI Origin 2000 show that our MPI prototype called TMPI using the proposed techniques is competitive with SGI's nati...

