• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Automatic Performance Analysis of Hybrid MPI/OpenMP Applications. (2003)

by Felix Wolf, Bernd Mohr
Venue:Proceedings
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 84
Next 10 →

KOJAK - a tool set for automatic performance analysis of parallel applications

by Bernd Mohr, Felix Wolf - In Proc. of the European Conference on Parallel Computing (EuroPar , 2003
"... Abstract. Today’s parallel computers with SMP nodes provide both multithread-ing and message passing as their modes of parallel execution. As a consequence, performance analysis and optimization becomes more difficult and creates a need for advanced performance tools that are custom made for this cl ..."
Abstract - Cited by 61 (5 self) - Add to MetaCart
Abstract. Today’s parallel computers with SMP nodes provide both multithread-ing and message passing as their modes of parallel execution. As a consequence, performance analysis and optimization becomes more difficult and creates a need for advanced performance tools that are custom made for this class of comput-ing environments. Current state-of-the-art tools provide valuable assistance in analyzing the performance of MPI and OpenMP programs by visualizing the run-time behavior and calculating statistics over the performance data. However, the developer of parallel programs is still required to filter out relevant parts from a huge amount of low-level information shown in numerous displays and map that information onto program abstractions without tool support. The KOJAK project (Kit for Objective Judgement and Knowledge-based Detec-tion of Performance Bottlenecks) is aiming at the development of a generic au-tomatic performance analysis environment for parallel programs. Performance problems are specified in terms of execution patterns that represent situations of inefficient behavior. These patterns are input for an analysis process that rec-
(Show Context)

Citation Context

...ysis and presentation features, see Felix Wolf’s Ph.D. thesis [2]. The theoretical aspects can also be found in [3]. A more detailed overview (than this short description) about KOJAK can be found in =-=[1]-=-. – Details on instrumentation of OpenMP applications based on the POMP interface are described in [6] and [7]. – More information on the source-code instrumentation of user functions can be found on ...

Scalable parallel trace-based performance analysis

by Markus Geimer, Felix Wolf, Brian J. N. Wylie, Bernd Mohr - In Proc. 13th European PVM/MPI Conference , 2006
"... Abstract. Automatic trace analysis is an effective method for identifying complex performance phenomena in parallel applications. However, as the size of parallel systems and the number of processors used by individual applications is continuously raised, the traditional approach of analyzing a sing ..."
Abstract - Cited by 55 (23 self) - Add to MetaCart
Abstract. Automatic trace analysis is an effective method for identifying complex performance phenomena in parallel applications. However, as the size of parallel systems and the number of processors used by individual applications is continuously raised, the traditional approach of analyzing a single global trace file, as done by KOJAK’s EXPERT trace analyzer, becomes increasingly constrained by the large number of events. In this article, we present a scalable version of the EXPERT analysis based on analyzing separate local trace files with a parallel tool which ‘replays ’ the target application’s communication behavior. We describe the new parallel analyzer architecture and discuss first empirical results. 1
(Show Context)

Citation Context

...cution behavior using a zoomable time-line display. However, in view of the large amounts of data usually generated, automatic off-line trace analyzers, such as the EXPERT tool from the KOJAK toolset =-=[3,4]-=-, can provide relevant information more quickly by automatically searching traces for complex patterns of inefficient behavior and quantifying their significance. In addition to usually being faster t...

The SCALASCA performance toolset architecture

by Markus Geimer, Felix Wolf, Brian J. N. Wylie, Erika Ábrahám, Daniel Becker, Bernd Mohr, Forschungszentrum Jülich - In International Workshop on Scalable Tools for High-End Computing (STHEC , 2008
"... www.scalasca.org SCALASCA is a performance toolset that has been specifically designed to analyze parallel application execution behavior on large-scale systems. It offers an incremental performanceanalysis procedure that integrates runtime summaries with in-depth studies of concurrent behavior via ..."
Abstract - Cited by 48 (8 self) - Add to MetaCart
www.scalasca.org SCALASCA is a performance toolset that has been specifically designed to analyze parallel application execution behavior on large-scale systems. It offers an incremental performanceanalysis procedure that integrates runtime summaries with in-depth studies of concurrent behavior via event tracing, adopting a strategy of successively refined measurement configurations. Distinctive features are its ability to identify wait states in applications with very large numbers of processes and combine these with efficiently summarized local measurements. In this article, we review the current toolset architecture, emphasizing its scalable design and the role of the different components in transforming raw measurement data into knowledge of application execution behavior. The scalability and effectiveness of SCALASCA are then surveyed from experience measuring and analyzing real-world applications on a range of computer systems. 1
(Show Context)

Citation Context

...to escalating memory requirements, limited I/O bandwidth, or renditions that fail). Developed at Jülich Supercomputing Centre in cooperation with the University of Tennessee as the successor of KOJAK =-=[16]-=-, SCALASCA is an open-source performance-analysis toolset that has been specifically designed for use on large-scale systems including IBM Blue Gene and Cray XT, but is also well-suited for small- and...

An Algebra for Cross-Experiment Performance Analysis

by Fengguang Song, Felix Wolf, Nikhil Bhatia, Jack Dongarra, Shirley Moore - In Proc. of the International Conference on Parallel Processing (ICPP , 2004
"... Performance tuning of parallel applications usually involves multiple experiments to compare the effects of different optimization strategies. This article describes an algebra that can be used to compare, integrate, and summarize performance data from multiple sources. The algebra consists of a dat ..."
Abstract - Cited by 42 (22 self) - Add to MetaCart
Performance tuning of parallel applications usually involves multiple experiments to compare the effects of different optimization strategies. This article describes an algebra that can be used to compare, integrate, and summarize performance data from multiple sources. The algebra consists of a data model to represent the data in a platformindependent fashion plus arithmetic operations to merge, subtract, and average the data from different experiments. A distinctive feature of this approach is its closure property, which allows processing and viewing all instances of the data model in the same way- regardless of whether they represent original or derived data- in addition to an arbitrary and easy composition of operations.
(Show Context)

Citation Context

...erent performance metrics recorded by different tools and using different modes of the same tool into one highly-integrated view. CUBE is currently used by two performance tools: CONE [21] and EXPERT =-=[22, 23]-=-. As CUBE provides a generic API, every tool producing performance data matching the very general CUBE data model can take advantage of the CUBE algebra and display. CONE. CONE is a call-graph profile...

Efficient Pattern Search in Large Traces through Successive Refinement

by Felix Wolf, Bernd Mohr, Jack Dongarra, Shirley Moore - In Proc. of the European Conference on Parallel Computing (EuroPar , 2004
"... Abstract. Event tracing is a well-accepted technique for post-mortem performance analysis of parallel applications. The expert tool supports the analysis of large traces by automatically searching them for execution patterns that indicate inefficient behavior. However, the current search algorithm w ..."
Abstract - Cited by 21 (11 self) - Add to MetaCart
Abstract. Event tracing is a well-accepted technique for post-mortem performance analysis of parallel applications. The expert tool supports the analysis of large traces by automatically searching them for execution patterns that indicate inefficient behavior. However, the current search algorithm works with independent pattern specifications and ignores the specialization hierarchy existing between them, resulting in a long analysis time caused by repeated matching attempts as well as in replicated code. This article describes an optimized design taking advantage of specialization relationships and leading to a significant runtime improvement as well as to more compact pattern specifications. 1
(Show Context)

Citation Context

... can provide the user with the desired information more quickly by automatically transforming the data into a more compact representation on a higher level of abstraction. The expert performance tool =-=[9]-=- supports the performance analysis of mpi and/or openmp applications by automatically searching traces for execution patterns that indicate inefficient behavior. The performance problems addressed inc...

Automatic trace-based performance analysis of metacomputing applications

by Daniel Becker, Felix Wolf, Wolfgang Frings, Markus Geimer, Brian J. N. Wylie, Bernd Mohr, Forschungszentrum Jülich - In InternationalParallelandDistributedProcessing Symposium , 2007
"... The processing power and memory capacity of independent and heterogeneous parallel machines can be combined to form a single parallel system that is more powerful than any of its constituents. However, achieving satisfactory application performance on such a metacomputer is hard because the high lat ..."
Abstract - Cited by 13 (2 self) - Add to MetaCart
The processing power and memory capacity of independent and heterogeneous parallel machines can be combined to form a single parallel system that is more powerful than any of its constituents. However, achieving satisfactory application performance on such a metacomputer is hard because the high latency of inter-machine communication as well as differences in hardware of constituent machines may introduce various types of wait states. In our earlier work, we have demonstrated that automatic pattern search in event traces can identify the sources of wait states in parallel applications running on a single computer. In this article, we describe how this approach can be extended to metacomputing environments with special emphasis on performance problems related to inter-machine communication. In addition, we demonstrate the benefits of our solution using a real-world multi-physics application.
(Show Context)

Citation Context

...e optimization for a single machine is already a non-trivial task that requires substantial tool support, we argue that this is even more important for metacomputing environments. In our earlier work =-=[18]-=-, we have shown that automatic analysis of event traces is an effective method for identifying complex performance phenomena in parallel applications. Time-stamped events, such as entering a function ...

ompP: A profiling tool for OpenMP

by Karl Fürlinger, Michael Gerndt - IN: PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON OPENMP (IWOMP 2005 , 2005
"... In this paper we present a simple but useful profiling tool for OpenMP applications similar in spirit to the MPI profiler mpiP [15]. We describe the implementation of our tool and demonstrate its functionality on a number of test applications. ..."
Abstract - Cited by 13 (4 self) - Add to MetaCart
In this paper we present a simple but useful profiling tool for OpenMP applications similar in spirit to the MPI profiler mpiP [15]. We describe the implementation of our tool and demonstrate its functionality on a number of test applications.
(Show Context)

Citation Context

...nce in parallel region”. Previous work already tested existing OpenMP performance analysis tools with respect to their ability to detect the performance problems in the ATS framework [2]. With Expert =-=[16]-=-, also a POMP-based tool was tested and generally with ompP a developer is able to detect the same set of OpenMP related problems as Expert (although with Expert the process is somewhat more automated...

Performance analysis of large-scale OpenMP and hybrid MPI/OpenMP applications with VampirNG

by Vampir Ng, Holger Brunst, Bernd Mohr - in Proceedings of the First International Workshop on OpenMP (IWOMP 2005 , 2005
"... Abstract. This paper presents a tool setup for comprehensive eventbased performance analysis of large-scale openmp and hybrid openmp/ mpi applications. The kojak framework is used for portable code instrumentation and automatic analysis while the new Vampir NG infrastructure serves as generic visual ..."
Abstract - Cited by 11 (0 self) - Add to MetaCart
Abstract. This paper presents a tool setup for comprehensive eventbased performance analysis of large-scale openmp and hybrid openmp/ mpi applications. The kojak framework is used for portable code instrumentation and automatic analysis while the new Vampir NG infrastructure serves as generic visualization engine for both openmp and mpi performance properties. The tools share the same data base which enables a smooth transition from bottleneck auto-detection to manual in-depth visualization and analysis. With Vampir NG being a distributed dataparallel architecture, large problems on very large scale systems can be addressed.
(Show Context)

Citation Context

...ess, the user has a fully instrumented executable. Running this executable generates a trace file in the epilog format. After program termination, the trace file is fed into the expert analyzer. (See =-=[11]-=- for details of the automatic analysis, which is outside of the scope of this paper.) In addition, the automatic analysis can be combined with a manual analysis using Vampir [12] or Vampir NG [13], wh...

Instrumentation and compiler optimizations for MPI/OpenMP applications

by Oscar Hern, Fengguang Song, Barbara Chapman, Jack Dongarra, Bernd Mohr Shirley Moore - in: International Workshop on OpenMP (IWOMP , 2006
"... Abstract. This article describes how the integration of the OpenUH OpenMP compiler with the KOJAK performance analysis tool can assist developers of OpenMP and hybrid codes in optimizing their applications with as little user intervention as possible. In particular, we (i) describe how the compiler’ ..."
Abstract - Cited by 10 (6 self) - Add to MetaCart
Abstract. This article describes how the integration of the OpenUH OpenMP compiler with the KOJAK performance analysis tool can assist developers of OpenMP and hybrid codes in optimizing their applications with as little user intervention as possible. In particular, we (i) describe how the compiler’s ability to automatically instrument user code down to the flow-graph level can improve the location of performance problems and (ii) outline how the performance feedback provided by KOJAK will direct the compiler’s optimization decisions in the future. To demonstrate our methodology, we present experimental results showing how reasons for the performance slow down of the ASPCG benchmark could be identified. 1
(Show Context)

Citation Context

...ty of sources of feedback. To demonstrate our ideas, in this paper we describe the integration of existing, open source software - the OpenUH compiler [10] and the automatic trace analysis tool KOJAK =-=[19]-=- – into a single, coherent environment, called COPPER, for collaborative application ⋆ This material is based upon work supported by the National Science Foundation under grant No. 0444363 and 0444468...

Holistic Hardware Counter Performance Analysis of Parallel Programs

by B. J. N. Wylie, B. Mohr, F. Wolf, G. R. Joubert, W. E. Nagel, F. J. Peters, O. Plata, P. Tirado, E. Zapata, Brian J. N. Wylie A, Bernd Mohr A, Felix Wolf A - In Proceedings of Parallel Computing 2005 (ParCo 2005), Malaga, Spain, Sep 2005 . Time Execution Overhead Idle Threads MPI OpenMP SHMEM Communication IO Synchronization Flush Fork Synchronization Communication Synchronization , 2005
"... Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise requires pri ..."
Abstract - Cited by 9 (5 self) - Add to MetaCart
Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise requires prior specific permission by the publisher mentioned above.
(Show Context)

Citation Context

...omatic analysis of performance problems arising from inefficient usage of parallel programming 1 Download available fromhttp://www.fz-juelich.de/zam/kojak/ 187s188 interfaces (such as MPI and OpenMP) =-=[1,2]-=-. Performance problems are classified by type and quantified by severity, for investigation via an interactive browser (CUBE) which presents an integrated, hierarchical view of performance behaviour, ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University