Results 1 - 10
of
29
Falcon: On-line Monitoring for Steering Parallel Programs
- In Ninth International Conference on Parallel and Distributed Computing and Systems (PDCS’97
, 1998
"... Advances in high performance computing, communications, and user interfaces enable developers to construct increasingly interactive high performance applications. The Falcon system presented in this paper supports such interactivity by providing runtime libraries, tools, and user interfaces that per ..."
Abstract
-
Cited by 51 (13 self)
- Add to MetaCart
Advances in high performance computing, communications, and user interfaces enable developers to construct increasingly interactive high performance applications. The Falcon system presented in this paper supports such interactivity by providing runtime libraries, tools, and user interfaces that permit the on-line monitoring and steering of large-scale parallel codes. The principal aspects of Falcon described in this paper are its abstractions and tools for capture and analysis of application-specific program information, performed on-line, with controlled latencies and scalable to parallel machines of substantial size. In addition, Falcon provides support for the on-line graphical display of monitoring information, and it allows programs to be steered during their execution, by human users or algorithmically. This paper presents our basic research motivation, outlines the Falcon system's functionality, and includes a detailed evaluation of its performance characteristics in light of i...
Knowledge Specification for Automatic Performance Analysis
, 2001
"... The lack of a useful and accurate software infrastructure for measuring, modeling, and analyzing the performance of a wide variety of programming paradigms and architecture platforms is a critical issue for performance-oriented program development. Commonly, a cyclic process is employed to tune the ..."
Abstract
-
Cited by 37 (20 self)
- Add to MetaCart
The lack of a useful and accurate software infrastructure for measuring, modeling, and analyzing the performance of a wide variety of programming paradigms and architecture platforms is a critical issue for performance-oriented program development. Commonly, a cyclic process is employed to tune the performance of programs which includes the gathering of performance data through measurement and prediction and the analysis of the data collected on-the-fly or during a postmortem session to yield summary statistics and histories of program behavior. Usually, this process also involves comparison of the performance data with that of previous program versions. So far most approaches require the programmer to control this tedious, time-consuming, and error-prone process which is typically driven by some informal hypotheses about potential performance problems. Moreover, many tools are platform and language dependent and cannot correlate performance data gathered at lower levels (for example, from hardware counters) with higher-level programming paradigms. Further, they tend to focus only on specific program and machine behavior, and do not provide sufficient support to infer important performance properties. In this report we describe a novel approach to the formalization of performance bottlenecks and the data required to detect them with the aim of supporting automatic performance analysis for a large variety of programming paradigms and architectures. We present the APART Specification Language (ASL) developed as part of the APART Esprit IV Working Group on Automatic Performance Analysis: Resources and Tools. This language allows the description of performance-related data through the provision of an object-oriented specification model and supports definition of performance p...
Program Analysis Environments for Parallel Language Systems: The tau Environment
- In Proceedings of the 2nd Workshop on Environments and Tools for Parallel Scientific Computing
, 1994
"... In this paper, we discuss ø (TAU, Tuning and Analysis Utilities), the first prototype of an integrated and portable program analysis environment for pC++ , a parallel object-oriented language system. ø is unique in that it was developed specifically for pC++ and relies heavily on pC++ 's compiler an ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
In this paper, we discuss ø (TAU, Tuning and Analysis Utilities), the first prototype of an integrated and portable program analysis environment for pC++ , a parallel object-oriented language system. ø is unique in that it was developed specifically for pC++ and relies heavily on pC++ 's compiler and transformation tools (specifically, the Sage ++ toolkit) for its implementation. This tight integration allows ø to achieve a combination of portability, functionality, and usability not commonly found in high-level language environments. The paper describes the design and functionality of ø , using a new tool for breakpoint-based program analysis as an example of ø 's capabilities. 1 Introduction The trend towards using high-level parallel language systems to program scalable parallel computers must be accompanied by advances in the tools and environments for program analysis and tuning. The language system concerns are achieving programmability through parallel programming abstractions...
Modeling and Detecting Performance Problems for Distributed and Parallel Programs with JavaPSL
- In Proc. of the Conference on Supercomputers (SC2001
, 2002
"... In this paper we present JavaPSL, a Performance Specification Language that can be used for a systematic and portable specification of large classes of experiment-related data and performance properties for distributed and parallel programs. Performance properties are described in a generic and norm ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
In this paper we present JavaPSL, a Performance Specification Language that can be used for a systematic and portable specification of large classes of experiment-related data and performance properties for distributed and parallel programs. Performance properties are described in a generic and normalized way, thus interpretation and comparison of performance properties is largely alleviated. Moreover, JavaPSL provides meta-properties in order to describe new properties based on existing ones and to relate properties to each other.
CrossWalk: A Tool for Performance Profiling Across the User-Kernel Boundary
- In International Conference on Parallel Computing (ParCo
, 2003
"... fy the ultimate cause of Squid's performance problems and remove them by modifying the application's source code. 1. Introduction Many applications make heavy use of functions provided by the operating system. Naturally, the performance of such applications depends on how they make use of these fu ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
fy the ultimate cause of Squid's performance problems and remove them by modifying the application's source code. 1. Introduction Many applications make heavy use of functions provided by the operating system. Naturally, the performance of such applications depends on how they make use of these functions and how e#ciently these functions are implemented by the operating system. For example, I/O is key to the e#ciency of many high-performance applications. Network performance is often the constraining factor for such tools as Web and proxy servers, and e#cient use of synchronization primitives is crucial for multithreaded applications. Finding performance problems in OS-bound applications has always been a challenging task. A user-level profiler might locate a region of application's code where most of the system time is spent, but it might be unable to explain why this is happening or how to fix the problem. For example, if an application spent 90% of its time in the open system cal
Speedy: An Integrated Performance Extrapolation Tool for pC++ Programs
- In Quantitative Evaluation of Computing and Communication Systems: Proceedings of the 8th International Conference on Modelling Techniques and Tools for Computer Performance Evaluation, volume 977 of Lecture Notes in Computer Science
, 1995
"... . Performance extrapolation is the process of evaluating the performance of a parallel program in a target execution environment using performance information obtained for the same program in a different environment. Performance extrapolation techniques are suited for rapid performance tuning of par ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
. Performance extrapolation is the process of evaluating the performance of a parallel program in a target execution environment using performance information obtained for the same program in a different environment. Performance extrapolation techniques are suited for rapid performance tuning of parallel programs, particularly when the target environment is unavailable. This paper describes one such technique that was developed for data-parallel C++ programs written in the pC++ language. In pC++, the programmer can distribute a collection of objects to various processors and can have methods invoked on those objects execute in parallel. Using performance extrapolation in the development of pC++ applications allows tuning decisions to be made in advance of detailed execution measurements. The pC++ language system includes t, an integrated environment for analyzing and tuning the performance of pC++ programs. This paper presents speedy, a new addition to t, that predicts the performa...
Performance Tool Support for MPI-2 on Linux
, 2004
"... Programmers of message-passing codes for clusters of workstations face a daunting challenge in understanding the performance bottlenecks of their applications. This is largely due to the vast amount of performance data that is collected, and the time and expertise necessary to use traditional parall ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Programmers of message-passing codes for clusters of workstations face a daunting challenge in understanding the performance bottlenecks of their applications. This is largely due to the vast amount of performance data that is collected, and the time and expertise necessary to use traditional parallel performance tools to analyze that data. This paper reports on our recent efforts developing a performance tool for MPI applications on Linux clusters. Our target MPI implementations were LAM/MPI and MPICH2, both of which support portions of the MPI-2 Standard. We started with an existing performance tool and added support for non-shared file systems, MPI-2 one-sided communications, dynamic process creation, and MPI Object naming. We present results usingthe enhanced version of the tool to examine the performance of several applications. We describe a new performance tool benchmark suite we have developed, PPerfMark, and present results for the benchmark using the enhanced tool.
Program Analysis and Tuning Tools for a Parallel Object Oriented Language: An Experiment with the TAU System.
- in Proc. of the Workshop on Parallel Scientific Computing
, 1996
"... this paper we examine and evaluate an experimental programming environment designed at the University of Oregon that address some of these issues. This system, called ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
this paper we examine and evaluate an experimental programming environment designed at the University of Oregon that address some of these issues. This system, called
Tau - Tuning and Analysis Utilities for Portable Parallel Programming
, 1995
"... Introduction Most users find parallel programming difficult for at least four reasons. First, parallel computing abstractions (e.g., data parallelism, control or task parallelism, producer/consumer parallelism) are diverse, differing mainly by the type of parallel behavior supported (or allowed) in ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Introduction Most users find parallel programming difficult for at least four reasons. First, parallel computing abstractions (e.g., data parallelism, control or task parallelism, producer/consumer parallelism) are diverse, differing mainly by the type of parallel behavior supported (or allowed) in a program's execution. In addition to learning the parallel programming languages and tools in a particular environment, a user must decide which parallel computing model provides the "best" execution for their problem. Deciding how to choose between models often requires a sophisticated understanding of the application, its underlying algorithms, the expressiveness of the language used, and effects of the system software and hardware architecture of the parallel machine. Unfortunately, this choice is complicated by the fact that not all parallel computing abstractions are equally well supported in existing programming systems. Second, most parallel programming systems do not insula
Development and Performance Analysis of Real-World Applications for Distributed and Parallel Architectures
, 1999
"... Several large real-world applications have been developed for distributed and parallel architectures. We examine two different program development approaches: First, the usage of a high-level programming paradigm which reduces the time to create a parallel program dramatically but sometimes at the c ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Several large real-world applications have been developed for distributed and parallel architectures. We examine two different program development approaches: First, the usage of a high-level programming paradigm which reduces the time to create a parallel program dramatically but sometimes at the cost of a reduced performance. A source-to-source compiler, has been employed to automatically compile programs -- written in a high-level programming paradigm -- into message passing codes. Second, manual program development by using a low-level programming paradigm -- such as message passing -- enables the programmer to fully exploit a given architecture at the cost of a time-consuming and error-prone effort. Performance tools play a central role to support the performance-oriented development of applications for distributed and parallel architectures. Scala -- a portable instrumentation, measurement, and post-execution performance analysis system for distributed and parallel programs -- h...

