Results 1 -
5 of
5
PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node
- In Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP ’10
, 2010
"... For designers of large-scale parallel computers, it is greatly desired that performance of parallel applications can be predicted at the design phase. However, this is difficult because the execution time of parallel applications is determined by several factors, including sequential computation tim ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
(Show Context)
For designers of large-scale parallel computers, it is greatly desired that performance of parallel applications can be predicted at the design phase. However, this is difficult because the execution time of parallel applications is determined by several factors, including sequential computation time in each process, communication time and their convolution. Despite previous efforts, it remains an open problem to estimate sequential computation time in each process accurately and efficiently for large-scale parallel applications on non-existing target machines. This paper proposes a novel approach to predict the sequential computation time accurately and efficiently. We assume that there is at least one node of the target platform but the whole target system need not be available. We make two main technical contributions. First, we employ deterministic replay techniques to execute
MPIWiz: subgroup reproducible replay of MPI applications
- in Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, ser. PPoPP '09
"... ABSTRACT Message Passing Interface (MPI) is a widely used standard for managing coarse-grained concurrency on distributed computers. Debugging parallel MPI applications, however, has always been a particularly challenging task due to their high degree of concurrent execution and non-deterministic b ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
(Show Context)
ABSTRACT Message Passing Interface (MPI) is a widely used standard for managing coarse-grained concurrency on distributed computers. Debugging parallel MPI applications, however, has always been a particularly challenging task due to their high degree of concurrent execution and non-deterministic behavior. Deterministic replay is a potentially powerful technique for addressing these challenges, with existing MPI replay tools adopting either data-replay or orderreplay approaches. Unfortunately, each approach has its tradeoffs. Data-replay generates substantial log sizes by recording every communication message. Order-replay generates small logs, but requires all processes to be replayed together. We believe that these drawbacks are the primary reasons that inhibit the wide adoption of deterministic replay as the critical enabler of cyclic debugging of MPI applications. This paper describes subgroup reproducible replay (SRR), a hybrid deterministic replay method that provides the benefits of both data-replay and order-replay while balancing their trade-offs. SRR divides all processes into disjoint groups. It records the contents of messages crossing group boundaries as in data-replay, but records just message orderings for communication within a group as in order-replay. In this way, SRR can exploit the communication locality of traffic patterns in MPI applications. During replay, developers can then replay each group individually. SRR reduces recording overhead by not recording intra-group communication, and at the same time reduces replay overhead by limiting the size of each replay group. Exposing these tradeoffs gives the user the necessary control for making deterministic replay practical for MPI applications. We have implemented a prototype, MPIWiz, to demonstrate and evaluate SRR. MPIWiz employs a replay framework that allows transparent binary instrumentation of both library and system calls. As a result, MPIWiz replays MPI applications with no source code modification and relinking, and handles non-determinism in both MPI and OS system calls. Our preliminary results show that MPIWiz can reduce recording overhead by over a factor of four relative to data-replay, yet without requiring the entire application to be replayed as in order-replay. Recording increases execution time by 27% while the application can be replayed in just 53% of its base execution time.
The Design of MDB: a Macrodebugger for Wireless Embedded Networks
, 2009
"... Creating and debugging programs for WENs is notoriously difficult. Macroprogramming is an emerging technology that aims to address this problem by providing high-level programming ab-stractions. We present MDB, the first system to support the debugging of macroprograms. MDB allows the user to set br ..."
Abstract
- Add to MetaCart
(Show Context)
Creating and debugging programs for WENs is notoriously difficult. Macroprogramming is an emerging technology that aims to address this problem by providing high-level programming ab-stractions. We present MDB, the first system to support the debugging of macroprograms. MDB allows the user to set breakpoints and step through a macroprogram using a source-level debugging interface similar to GDB, a process we call macrodebugging. A key challenge of MDB is to step through a macroprogram in sequential order even though it executes on the network in a distributed, asynchronous manner. Besides allowing the user to view distributed state, MDB also provides the ability to search for bugs over the entire history of distributed states. Finally, MDB allows the user to make hypothetical changes to a macroprogram and to see the effect on distributed state without the need to redeploy, execute, and test the new code. We show that macrodebugging is both easy and efficient: MDB consumes few system resources and requires few user commands to find the cause of bugs. We also provide a lightweight version of MDB called MDB Lite that can be used during the deployment phase to reduce resource consumption while still eliminating the possibility of heisenbugs: changes in the manifestation of bugs caused by enabling or disabling the debugger.
Application Development for Cyber-Physical Systems: Programming Language Concepts and Case Studies
, 2012
"... by ..."
(Show Context)
CONCURRENT SOFTWARE
, 2011
"... This Dissertation-Open Access is brought to you for free and open access ..."
(Show Context)