Results 1 -
6 of
6
Performance Measurement, Visualization and Modeling of Parallel and Distributed Programs using the AIMS Toolkit
, 1995
"... this paper, we first address fundamental issues in building useful performance-tuning tools and then describe our experience with the AIMS toolkit for tuning parallel and distributed programs on a variety of platforms. AIMS supports source-code instrumentation, run-time monitoring, graphical executi ..."
Abstract
-
Cited by 60 (2 self)
- Add to MetaCart
this paper, we first address fundamental issues in building useful performance-tuning tools and then describe our experience with the AIMS toolkit for tuning parallel and distributed programs on a variety of platforms. AIMS supports source-code instrumentation, run-time monitoring, graphical execution profiles, performance indices and automated modeling techniques as ways to expose performance problems of programs. Using several examples representing a broad range of scientific applications, we illustrate AIMS' effectiveness in exposing performance problems in parallel and distributed programs
Performance Analysis of Distributed Applications using Automatic Classification of Communication Inefficiencies
, 2000
"... We present a technique for performance analysis that helps users understand the communication behavior of their message passing applications. Our method automatically classifies individual communication operations and it reveals the cause of communication inefficiencies in the application. This clas ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
We present a technique for performance analysis that helps users understand the communication behavior of their message passing applications. Our method automatically classifies individual communication operations and it reveals the cause of communication inefficiencies in the application. This classification allows the developer to focus quickly on the culprits of truly inefficient behavior, rather than manually foraging through massive amounts of performance data. Specifically, we trace the message operations of MPI applications and then classify each individual communication event using decision tree classification, a supervised learning technique. We train our decision tree using microbenchmarks that demonstrate both efficient and inefficient communication. Since our technique adapts to the target system's configuration through these microbenchmarks, we can simultaneously automate the performance analysis process and improve classification accuracy. Our experiments on four applications demonstrate that our technique can improve the accuracy of performance analysis, and dramatically reduce the amount of data that users must encounter.
Monitoring the Performance of Multidisciplinary Applications on the iPSC/860
- In Proceedings of the 1994 Scalable High Performance Computing Conference
, 1994
"... Communication between multiple partitions of a multiprocessor facilitates the implementation of multidisciplinary applications representative of many grand challenge problems of the High Performance Computing and Communications Program (HPCCP). An extremely fast and efficient Intercube Communication ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Communication between multiple partitions of a multiprocessor facilitates the implementation of multidisciplinary applications representative of many grand challenge problems of the High Performance Computing and Communications Program (HPCCP). An extremely fast and efficient Intercube Communication Library has been developed to support communication between multiple partitions (allocated cubes) on the Intel iPSC/860. A tool which provides performance data in this context is invaluable to the application developer attempting to identify performance bottlenecks and allocate computational resources to achieve load-balancing. This paper describes how the Automated Instrumentation and Monitoring System (AIMS), a software toolkit that facilitates performance evaluation of parallel applications, has been extended to enable intercube programs to be monitored and visualized. 1: Introduction Although many MIMD multiprocessors (such as Intel's iPSC/860 and TMC's CM-5) have been available on th...
A Refinement Strategy for a User-Oriented Performance Analysis.
"... Abstract. We introduce a refinement strategy to bring the parallel performance analysis closer to the user. The analysis starts with a simple high-level performance model. It is based on first-order approximations, in terms of the logical constituents of the parallel program and characteristics of t ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. We introduce a refinement strategy to bring the parallel performance analysis closer to the user. The analysis starts with a simple high-level performance model. It is based on first-order approximations, in terms of the logical constituents of the parallel program and characteristics of the system. This model is then progressively refined with more detailed low-level performance aspects, to explain divergences from a ‘normal’, linear regime. We use a causal model to structure the relations between all variables involved. The approach intends to serve as a link between detailed performance data and the developer. It is demonstrated with a parallel matrix multiplication algorithm. 1
A Practical Development Process for Parallel Large-Scale Applications and its underlying Formal Framework
, 1995
"... The development of parallel large-scale application codes is a challenging problem, because it requires a combination of application knowledge, understanding of the various aspects of parallelism involved, and software engineering. Moreover, the size of largescale applications usually is input-depen ..."
Abstract
- Add to MetaCart
The development of parallel large-scale application codes is a challenging problem, because it requires a combination of application knowledge, understanding of the various aspects of parallelism involved, and software engineering. Moreover, the size of largescale applications usually is input-dependent, and the parallel algorithm needs to be scalable to various numbers of processors. This paper combines the theoretical as well as the practical aspects required for the understanding, realisation, and manageability of the development process of parallel large-scale applications. It provides a formal framework in which their (partly machine-model specific) potential parallelism can be expressed and requirements on scheduling and implementation are given. The paper further describes a practical software-engineering development approach build on this framework, and discusses and illustrates its usage in two large case studies. 1 INTRODUCTION Current research on parallel software developme...
Performance Analysis of
"... Parallel Processing PARALLEL processing is the only answer to the ever-increasing demand for more computational power. Nowadays, the big giants in hardware and software, like Intel and Microsoft, are increasingly aware of it and have pounced onto the market. But unlike sequential programs running on ..."
Abstract
- Add to MetaCart
Parallel Processing PARALLEL processing is the only answer to the ever-increasing demand for more computational power. Nowadays, the big giants in hardware and software, like Intel and Microsoft, are increasingly aware of it and have pounced onto the market. But unlike sequential programs running on the Van Neumann computer, the parallelization of programs is not trivial. It depends quite heavily on the underlying parallel system architecture. Automatic parallelization of programs is a 50-year old dream in which a program is efficiently matched with the available computing resources. This has become possible, but only for a very limited number of applications, the class of trivially parallelizable programs. For those, the computational work can be divided into parts which can be processed completely independently. Other programs, on the other hand, need manual adaptation to the available resources. This cannot be achieved without a detailed understanding of the algorithm. Intelligent reasoning is necessary to engineer the matching of the patterns of the concurrently operating entities to the pattern of the processors and the network resources, in order to obtain an efficient interplay of computation and communication. The aim of a performance analysis is to provide support for the developer of parallel programs.

