Results 1 - 10
of
84
KOJAK - a tool set for automatic performance analysis of parallel applications
- In Proc. of the European Conference on Parallel Computing (EuroPar
, 2003
"... Abstract. Today’s parallel computers with SMP nodes provide both multithread-ing and message passing as their modes of parallel execution. As a consequence, performance analysis and optimization becomes more difficult and creates a need for advanced performance tools that are custom made for this cl ..."
Abstract
-
Cited by 61 (5 self)
- Add to MetaCart
(Show Context)
Abstract. Today’s parallel computers with SMP nodes provide both multithread-ing and message passing as their modes of parallel execution. As a consequence, performance analysis and optimization becomes more difficult and creates a need for advanced performance tools that are custom made for this class of comput-ing environments. Current state-of-the-art tools provide valuable assistance in analyzing the performance of MPI and OpenMP programs by visualizing the run-time behavior and calculating statistics over the performance data. However, the developer of parallel programs is still required to filter out relevant parts from a huge amount of low-level information shown in numerous displays and map that information onto program abstractions without tool support. The KOJAK project (Kit for Objective Judgement and Knowledge-based Detec-tion of Performance Bottlenecks) is aiming at the development of a generic au-tomatic performance analysis environment for parallel programs. Performance problems are specified in terms of execution patterns that represent situations of inefficient behavior. These patterns are input for an analysis process that rec-
Scalable parallel trace-based performance analysis
- In Proc. 13th European PVM/MPI Conference
, 2006
"... Abstract. Automatic trace analysis is an effective method for identifying complex performance phenomena in parallel applications. However, as the size of parallel systems and the number of processors used by individual applications is continuously raised, the traditional approach of analyzing a sing ..."
Abstract
-
Cited by 55 (23 self)
- Add to MetaCart
(Show Context)
Abstract. Automatic trace analysis is an effective method for identifying complex performance phenomena in parallel applications. However, as the size of parallel systems and the number of processors used by individual applications is continuously raised, the traditional approach of analyzing a single global trace file, as done by KOJAK’s EXPERT trace analyzer, becomes increasingly constrained by the large number of events. In this article, we present a scalable version of the EXPERT analysis based on analyzing separate local trace files with a parallel tool which ‘replays ’ the target application’s communication behavior. We describe the new parallel analyzer architecture and discuss first empirical results. 1
The SCALASCA performance toolset architecture
- In International Workshop on Scalable Tools for High-End Computing (STHEC
, 2008
"... www.scalasca.org SCALASCA is a performance toolset that has been specifically designed to analyze parallel application execution behavior on large-scale systems. It offers an incremental performanceanalysis procedure that integrates runtime summaries with in-depth studies of concurrent behavior via ..."
Abstract
-
Cited by 48 (8 self)
- Add to MetaCart
(Show Context)
www.scalasca.org SCALASCA is a performance toolset that has been specifically designed to analyze parallel application execution behavior on large-scale systems. It offers an incremental performanceanalysis procedure that integrates runtime summaries with in-depth studies of concurrent behavior via event tracing, adopting a strategy of successively refined measurement configurations. Distinctive features are its ability to identify wait states in applications with very large numbers of processes and combine these with efficiently summarized local measurements. In this article, we review the current toolset architecture, emphasizing its scalable design and the role of the different components in transforming raw measurement data into knowledge of application execution behavior. The scalability and effectiveness of SCALASCA are then surveyed from experience measuring and analyzing real-world applications on a range of computer systems. 1
An Algebra for Cross-Experiment Performance Analysis
- In Proc. of the International Conference on Parallel Processing (ICPP
, 2004
"... Performance tuning of parallel applications usually involves multiple experiments to compare the effects of different optimization strategies. This article describes an algebra that can be used to compare, integrate, and summarize performance data from multiple sources. The algebra consists of a dat ..."
Abstract
-
Cited by 42 (22 self)
- Add to MetaCart
(Show Context)
Performance tuning of parallel applications usually involves multiple experiments to compare the effects of different optimization strategies. This article describes an algebra that can be used to compare, integrate, and summarize performance data from multiple sources. The algebra consists of a data model to represent the data in a platformindependent fashion plus arithmetic operations to merge, subtract, and average the data from different experiments. A distinctive feature of this approach is its closure property, which allows processing and viewing all instances of the data model in the same way- regardless of whether they represent original or derived data- in addition to an arbitrary and easy composition of operations.
Efficient Pattern Search in Large Traces through Successive Refinement
- In Proc. of the European Conference on Parallel Computing (EuroPar
, 2004
"... Abstract. Event tracing is a well-accepted technique for post-mortem performance analysis of parallel applications. The expert tool supports the analysis of large traces by automatically searching them for execution patterns that indicate inefficient behavior. However, the current search algorithm w ..."
Abstract
-
Cited by 21 (11 self)
- Add to MetaCart
(Show Context)
Abstract. Event tracing is a well-accepted technique for post-mortem performance analysis of parallel applications. The expert tool supports the analysis of large traces by automatically searching them for execution patterns that indicate inefficient behavior. However, the current search algorithm works with independent pattern specifications and ignores the specialization hierarchy existing between them, resulting in a long analysis time caused by repeated matching attempts as well as in replicated code. This article describes an optimized design taking advantage of specialization relationships and leading to a significant runtime improvement as well as to more compact pattern specifications. 1
Automatic trace-based performance analysis of metacomputing applications
- In InternationalParallelandDistributedProcessing Symposium
, 2007
"... The processing power and memory capacity of independent and heterogeneous parallel machines can be combined to form a single parallel system that is more powerful than any of its constituents. However, achieving satisfactory application performance on such a metacomputer is hard because the high lat ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
(Show Context)
The processing power and memory capacity of independent and heterogeneous parallel machines can be combined to form a single parallel system that is more powerful than any of its constituents. However, achieving satisfactory application performance on such a metacomputer is hard because the high latency of inter-machine communication as well as differences in hardware of constituent machines may introduce various types of wait states. In our earlier work, we have demonstrated that automatic pattern search in event traces can identify the sources of wait states in parallel applications running on a single computer. In this article, we describe how this approach can be extended to metacomputing environments with special emphasis on performance problems related to inter-machine communication. In addition, we demonstrate the benefits of our solution using a real-world multi-physics application.
ompP: A profiling tool for OpenMP
- IN: PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON OPENMP (IWOMP 2005
, 2005
"... In this paper we present a simple but useful profiling tool for OpenMP applications similar in spirit to the MPI profiler mpiP [15]. We describe the implementation of our tool and demonstrate its functionality on a number of test applications. ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
(Show Context)
In this paper we present a simple but useful profiling tool for OpenMP applications similar in spirit to the MPI profiler mpiP [15]. We describe the implementation of our tool and demonstrate its functionality on a number of test applications.
Performance analysis of large-scale OpenMP and hybrid MPI/OpenMP applications with VampirNG
- in Proceedings of the First International Workshop on OpenMP (IWOMP 2005
, 2005
"... Abstract. This paper presents a tool setup for comprehensive eventbased performance analysis of large-scale openmp and hybrid openmp/ mpi applications. The kojak framework is used for portable code instrumentation and automatic analysis while the new Vampir NG infrastructure serves as generic visual ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
(Show Context)
Abstract. This paper presents a tool setup for comprehensive eventbased performance analysis of large-scale openmp and hybrid openmp/ mpi applications. The kojak framework is used for portable code instrumentation and automatic analysis while the new Vampir NG infrastructure serves as generic visualization engine for both openmp and mpi performance properties. The tools share the same data base which enables a smooth transition from bottleneck auto-detection to manual in-depth visualization and analysis. With Vampir NG being a distributed dataparallel architecture, large problems on very large scale systems can be addressed.
Instrumentation and compiler optimizations for MPI/OpenMP applications
- in: International Workshop on OpenMP (IWOMP
, 2006
"... Abstract. This article describes how the integration of the OpenUH OpenMP compiler with the KOJAK performance analysis tool can assist developers of OpenMP and hybrid codes in optimizing their applications with as little user intervention as possible. In particular, we (i) describe how the compiler’ ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
(Show Context)
Abstract. This article describes how the integration of the OpenUH OpenMP compiler with the KOJAK performance analysis tool can assist developers of OpenMP and hybrid codes in optimizing their applications with as little user intervention as possible. In particular, we (i) describe how the compiler’s ability to automatically instrument user code down to the flow-graph level can improve the location of performance problems and (ii) outline how the performance feedback provided by KOJAK will direct the compiler’s optimization decisions in the future. To demonstrate our methodology, we present experimental results showing how reasons for the performance slow down of the ASPCG benchmark could be identified. 1
Holistic Hardware Counter Performance Analysis of Parallel Programs
- In Proceedings of Parallel Computing 2005 (ParCo 2005), Malaga, Spain, Sep 2005 . Time Execution Overhead Idle Threads MPI OpenMP SHMEM Communication IO Synchronization Flush Fork Synchronization Communication Synchronization
, 2005
"... Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise requires pri ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
(Show Context)
Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise requires prior specific permission by the publisher mentioned above.