Results 1 -
9 of
9
An Overview of the Pablo Performance Analysis Environment
, 1992
"... As massively parallel, distributed memory systems replace traditional vector supercomputers, effective application program optimization and system resource management become more than research curiosities --- they are crucial to achieving substantial fractions of peak performance for scientific appl ..."
Abstract
-
Cited by 80 (6 self)
- Add to MetaCart
As massively parallel, distributed memory systems replace traditional vector supercomputers, effective application program optimization and system resource management become more than research curiosities --- they are crucial to achieving substantial fractions of peak performance for scientific application codes. By recording dynamic activity, either at the application or system software level, one can identify and remove performance bottlenecks. Pablo is a performance analysis environment designed to provide performance data capture, analysis, and presentation across a wide variety of scalable parallel systems. The Pablo environment includes software performance instrumentation, graphical performance data reduction and analysis, and support for mapping performance data to both graphics and sound. Current research directions include complete performance data immersion via head-mounted displays and the integration of Pablo with data parallel Fortran compilers based on the emerging High ...
Dynamic Control of Performance Monitoring on Large Scale Parallel Systems
, 1993
"... Performance monitoring of large scale parallel computers creates a dilemma: we need to collect detailed information to find performance bottlenecks, yet collecting all this data can introduce serious data collection bottlenecks. At the same time, users are being inundated with volumes of complex gra ..."
Abstract
-
Cited by 53 (10 self)
- Add to MetaCart
Performance monitoring of large scale parallel computers creates a dilemma: we need to collect detailed information to find performance bottlenecks, yet collecting all this data can introduce serious data collection bottlenecks. At the same time, users are being inundated with volumes of complex graphs and tables that require a performance expert to interpret. We present a new approach called the W 3 Search Model, that addresses both these problems by combining dynamic on-the-fly selection of what performance data to collect with decision support to assist users with the selection and presentation of performance data. We present a case study describing how a prototype implementation of our technique was able to identify the bottlenecks in three real programs. In addition, we were able to reduce the amount of performance data collected by a factor ranging from 13 to 700 compared to traditional sampling and trace based instrumentation techniques. 1. Introduction Performance monitorin...
Real-Time Statistical Clustering For Event Trace Reduction
- International Journal of Supercomputer Applications and High Performance Computing
, 1997
"... Event tracing provides the detailed data needed to understand the dynamics of interactions among application resource demands and system responses. However, capturing the large volume of dynamic performance data inherent in detailed tracing can perturb program execution and stress secondary storage ..."
Abstract
-
Cited by 25 (5 self)
- Add to MetaCart
Event tracing provides the detailed data needed to understand the dynamics of interactions among application resource demands and system responses. However, capturing the large volume of dynamic performance data inherent in detailed tracing can perturb program execution and stress secondary storage systems. Moreover, it can overwhelm a user or performance analyst with potentially irrelevant data. Using the Pablo performance environment's support for real-time data analysis, we show that dynamic statistical data clustering can dramatically reduce the volume of captured performance data by identifying and recording event traces only from representative processors. In turn, this makes possible low overhead, interactive visualization and performance tuning.
Performance Instrumentation Techniques for Parallel Systems
- SPRINGER-VERLAG LECTURE NOTES IN COMPUTER SCIENCE
, 1993
"... Although the nascent state of parallel systems makes empirical performance measurement, analysis and tuning critical, rapid technological evolution, coupled with short product life cycles, has often made it difficult to isolate fundamental experimental principles from implementation artifacts. By ..."
Abstract
-
Cited by 17 (8 self)
- Add to MetaCart
Although the nascent state of parallel systems makes empirical performance measurement, analysis and tuning critical, rapid technological evolution, coupled with short product life cycles, has often made it difficult to isolate fundamental experimental principles from implementation artifacts. By definition, the apparatus for experimental performance analysis (i.e., instrumentation specification, data buffering, timestamp generation, and data extraction) is shaped by the intended experiment and the object of study. In some environments, certain experiments are not feasible. Balancing the volume of captured performance data against its accuracy and timeliness requires both appropriate tools and an understanding of instrumentation costs, implementation alternatives, and support infrastructure.
Monitoring of Distributed Memory Multicomputer Programs
, 1993
"... ion (EBBA). Some of these methods will be briefly introduced here. The simplest way of analyzing runtime information consists of collecting statistics from the individual event-records. ParaGraph [HE91b] and the Crystal [RR89b] are typical examples of tools that use this approach. Among the gathere ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
ion (EBBA). Some of these methods will be briefly introduced here. The simplest way of analyzing runtime information consists of collecting statistics from the individual event-records. ParaGraph [HE91b] and the Crystal [RR89b] are typical examples of tools that use this approach. Among the gathered statistics are cumulative busy/idle times, cumulative communication times, number of bytes sent, etc. More complex analysis is provided by the integration into the monitoring environment of statistical analysis packages, that allow the interactive analysis of the trace-data. The SIMPLE environment adopted this approach by integrating the data analysis and graphics package S from AT&T [Moh90]. The Event Based Behavioral Abstraction (EBBA) approach is more than simply a way of analyzing event-records, and constitutes a complete high-level approach to debugging [Bat89]. Globally speaking the approach consists of constructing high-level models that describe the expected behavior of the monit...
A Performance Monitor For The Msparc Multicomputer
- in Proceedings of the IEEE SOUTHEASTCON ‘92, IEEE
, 1992
"... This paper describes a hybrid performance monitor developed for MSPARC --- a mesh--connected, message--passing multicomputer. The development of the hybrid performance monitor is a cross--disciplinary enterprise requiring custom hardware and a range of software support including monitor code, driver ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
This paper describes a hybrid performance monitor developed for MSPARC --- a mesh--connected, message--passing multicomputer. The development of the hybrid performance monitor is a cross--disciplinary enterprise requiring custom hardware and a range of software support including monitor code, driver interfaces, probe history acquisition and processing, graphical display and application probe injection. Programmable hardware is designed to unobtrusively collect events on each node and maintain their accurate chronological order. This distributed collection system is coupled by its independent network to a central monitor where data selection and presentation techniques play an important role in the visualization of the parallel system 's execution. I. INTRODUCTION Computer systems designers and users are increasingly dependent on sophisticated tools for program development and hardware simulation, synthesis and testing. A fundamental reason for this trend is the need to efficiently mana...
Performance Measurement using Low Perturbation and High Precision Hardware Assists
- In Proc. 1998 IEEE Real-Time System Symposium
, 1998
"... We present the design and implementation of MultiKron PCI, a hardware performance monitor that can be plugged into any computer with a free PCI bus slot. The monitor provides a series of high-resolution timers, and the ability to monitor the utilization of the PCI bus. We also demonstrate how the mo ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We present the design and implementation of MultiKron PCI, a hardware performance monitor that can be plugged into any computer with a free PCI bus slot. The monitor provides a series of high-resolution timers, and the ability to monitor the utilization of the PCI bus. We also demonstrate how the monitor can be integrated with online performance monitoring tools such as the Paradyn parallel performance measurement tools to improve the overhead of key timer operations by a factor of 25. In addition, we present a series of case studies using the MultiKron hardware performance monitor to measure and tune high-performance parallel computing applications. By using the monitor, we were able to find and correct a performance bug in a popular implementation of the MPI message passing library that caused some communication primitives to run at one half of their potential speed.
Finding Bottlenecks In Large Scale Parallel Programs
, 1994
"... This thesis addresses the problem of trying to locate the source of performance bottlenecks in large-scale parallel and distributed applications. Performance monitoring creates a dilemma: identifying a bottleneck necessitates collecting detailed information, yet collecting all this data can introduc ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This thesis addresses the problem of trying to locate the source of performance bottlenecks in large-scale parallel and distributed applications. Performance monitoring creates a dilemma: identifying a bottleneck necessitates collecting detailed information, yet collecting all this data can introduce serious data collection bottlenecks. At the same time, users are being inundated with volumes of complex graphs and tables that require a performance expert to interpret. I have developed a new approach that addresses both these problems by combining dynamic on-the-fly selection of what performance data to collect with decision support to assist users with the selection and presentation of performance data. The approach is called the W 3 Search Model. To make it possible to implement the W 3 Search Model, I have developed a new monitoring technique for parallel programs called Dynamic Instrumentation. The premise of my work is that not only is it possible to do on-line performance debu...
I/O, Performance Analysis, and Performance Data Immersion
- In Proceedings of MASCOTS '96
, 1996
"... A large and important class of national challenge applications are irregular, with complex, data dependent execution behavior, and dynamic, with time varying resource demands. We believe the solution to the performance optimization conundrum is integration of dynamic performance instrumentation and ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
A large and important class of national challenge applications are irregular, with complex, data dependent execution behavior, and dynamic, with time varying resource demands. We believe the solution to the performance optimization conundrum is integration of dynamic performance instrumentation and on-the-fly performance data reduction with configurable, malleable resource management algorithms, and a real-time adaptive control mechanism that automatically chooses and configures resource management algorithms based on application request patterns and observed system performance. Within the context of parallel input/output optimization, we describe the components of such a closed-loop control system based on the Pablo performance analysis environment, a portable parallel file system (PPFS), and virtual environments for study of dynamic performance data and interactive control of file system policies. 1 Introduction It is increasingly clear that a large and important class of national ...

