• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Performance analysis of distributed applications using automatic classification of communication inefficiencies (0)

by J Vetter
Venue:In ACM International Conference on Supercomputing 2002
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 12
Next 10 →

Modeling Application Performance by Convolving Machine Signatures with Application Profiles

by Allan Snavely, Nicole Wolter, Laura Carrington , 2001
"... This paper presents a performance modeling methodology that is faster than traditional cycle-accurate simulation, more sophisticated than performance estimation based on system peak-performance metrics, and is shown to be effective on a class of High Performance Computing benchmarks. The method ..."
Abstract - Cited by 34 (5 self) - Add to MetaCart
This paper presents a performance modeling methodology that is faster than traditional cycle-accurate simulation, more sophisticated than performance estimation based on system peak-performance metrics, and is shown to be effective on a class of High Performance Computing benchmarks. The method yields insight into the factors that affect performance on single-processor and parallel computers.

Statistical Scalability Analysis of Communication Operations in Distributed Applications

by Jeffrey S. Vetter, Michael O. Mccracken
"... Current trends in high performance computing suggest that users will soon have widespread access to clusters of multiprocessors with hundreds, if not thousands, of processors. This unprecedented degree of parallelism will undoubtedly expose scalability limitations in existing applications, where sca ..."
Abstract - Cited by 34 (2 self) - Add to MetaCart
Current trends in high performance computing suggest that users will soon have widespread access to clusters of multiprocessors with hundreds, if not thousands, of processors. This unprecedented degree of parallelism will undoubtedly expose scalability limitations in existing applications, where scalability is the ability of a parallel algorithm on a parallel architecture to effectively utilize an increasing number of processors. Users will need precise and automated techniques for detecting the cause of limited scalability. This paper addresses this dilemma. First, we argue that users face numerous challenges in understanding application scalability: managing substantial amounts of experiment data, extracting useful trends from this data, and reconciling performance information with their application's design. Second, we propose a solution to automate this data analysis problem by applying fundamental statistical techniques to scalability experiment data. Finally, we evaluate our operational prototype on several applications, and show that statistical techniques offer an effective strategy for assessing application scalability. In particular, we find that non-parametric correlation of the number of tasks to the ratio of the time for communication operations to overall communication time provides a reliable measure for identifying communication operations that scale poorly. 1

Scalable analysis techniques for microprocessor performance counter metrics

by Dong H. Ahn, Jeffrey S. Vetter - In Proc. of the Conference on Supercomputers (SC2002 , 2002
"... Contemporary microprocessors provide a rich set of integrated performance counters that allow application developers and system architects alike the opportunity to gather important information about workload behaviors. Current techniques for analyzing data produced from these counters use raw counts ..."
Abstract - Cited by 30 (1 self) - Add to MetaCart
Contemporary microprocessors provide a rich set of integrated performance counters that allow application developers and system architects alike the opportunity to gather important information about workload behaviors. Current techniques for analyzing data produced from these counters use raw counts, ratios, and visualization techniques help users make decisions about their application performance. While these techniques are appropriate for analyzing data from one process, they do not scale easily to new levels demanded by contemporary computing systems. Very simply, this paper addresses these concerns by evaluating several multivariate statistical techniques on these datasets. We find that several techniques, such as statistical clustering, can automatically extract important features from the data. These derived results can, in turn, be fed directly back to an application developer, or used as input to a more comprehensive performance analysis environment, such as a visualization or an expert system. 1

Dynamic Statistical Profiling of Communication Activity in Distributed Applications

by Jeffrey Vetter , 2002
"... Performance analysis of communication activity for a terascale application with traditional message tracing can be overwhelming in terms of overhead, perturbation, and storage. We propose a novel alternative that enables dynamic statistical profiling of an application's communication activity using ..."
Abstract - Cited by 12 (0 self) - Add to MetaCart
Performance analysis of communication activity for a terascale application with traditional message tracing can be overwhelming in terms of overhead, perturbation, and storage. We propose a novel alternative that enables dynamic statistical profiling of an application's communication activity using message sampling. We have implemented an operational prototype, named PHOTON, and our evidence shows that this new approach can provide an accurate, low-overhead, tractable alternative for performance analysis of communication activity. PHOTON consists of two components: a Message Passing Interface (MPI) profiling layer that implements sampling and analysis, and a modified MPI runtime that appends a small but necessary amount of information to individual messages. More importantly, this alternative enables an assortment of runtime analysis techniques so that, in contrast to post-mortem, trace-based techniques, the raw performance data can be jettisoned immediately after analysis. Our investigation shows that message sampling can reduce overhead to imperceptible levels for many applications. Experiments on several applications demonstrate the viability of this approach. For example, with one application, our technique reduced the analysis overhead from 154% for traditional tracing to 6% for statistical profiling. We also evaluate different sampling techniques in this framework. The coverage of the sample space provided by purely random sampling is superior to counter- and timer-based sampling. Also, PHOTON'S design reveals that frugal modifications to the MPI rtmtime system could facilitate such techniques on production computing systems, and it suggests that this sampling technique could execute continuously for longrunning applications.

HPCTOOLKIT: Tools for performance analysis of optimized parallel programs

by L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, N. R. Tallent , 2008
"... ..."
Abstract - Cited by 8 (2 self) - Add to MetaCart
Abstract not found

Automatic Search for Performance Problems in Parallel and Distributed) Programs by Using Multi-Experiment Analysis

by Thomas Fahringer, Clovis Seragiotto, Jr. - In International Conference On High Performance Computing (HiPC 2002 , 2002
"... We introduce Aksum, a novel system for performance analysis that helps programmers to locate and to understand performance problems in message passing, shared memory and mixed parallel programs. ..."
Abstract - Cited by 6 (2 self) - Add to MetaCart
We introduce Aksum, a novel system for performance analysis that helps programmers to locate and to understand performance problems in message passing, shared memory and mixed parallel programs.

Scalability Analysis of SPMD Codes Using Expectations

by Cristian Coarfa, John Mellor-Crummey, Nathan Froyd, Yuri Dotsenko - ICS'07 , 2007
"... We present a new technique for identifying scalability bottlenecks in executions of single-program, multiple-data (SPMD) parallel programs, quantifying their impact on performance, and associating this information with the program source code. Our performance analysis strategy involves three steps. ..."
Abstract - Cited by 5 (4 self) - Add to MetaCart
We present a new technique for identifying scalability bottlenecks in executions of single-program, multiple-data (SPMD) parallel programs, quantifying their impact on performance, and associating this information with the program source code. Our performance analysis strategy involves three steps. First, we collect call path profiles for two or more executions on different numbers of processors. Second, we use our expectations about how the performance of executions should differ, e.g., linear speedup for strong scaling or constant execution time for weak scaling, to automatically compute the scalability of costs incurred at each point in a program’s execution. Third, with the aid of an interactive browser, an application developer can explore a program’s performance in a top-down fashion, see the contexts in which poor scaling behavior arises, and understand exactly how much each scalability bottleneck dilates execution time. Our analysis technique is independent of the parallel programming model. We describe our experiences applying our technique to analyze parallel programs written in Co-array Fortran and Unified Parallel C, as well as message-passing programs based on MPI.

Portable High Performance and Scalability for Partitioned Global Address Space Languages

by Cristian Coarfa, Cristian Coarfa , 2007
"... Large scale parallel simulations are fundamental tools for engineers and scientists. Con-sequently, it is critical to develop both programming models and tools that enhance devel-opment time productivity, enable harnessing of massively-parallel systems, and to guide the diagnosis of poorly scaling p ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
Large scale parallel simulations are fundamental tools for engineers and scientists. Con-sequently, it is critical to develop both programming models and tools that enhance devel-opment time productivity, enable harnessing of massively-parallel systems, and to guide the diagnosis of poorly scaling programs. This thesis addresses this challenge in two ways. First, we show that Co-array Fortran (CAF), a shared-memory parallel program-ming model, can be used to write scientific codes that exhibit high performance on modern parallel systems. Second, we describe a novel technique for analyzing parallel program performance and identifying scalability bottlenecks, and apply it across multiple program-ming models. Although the message passing parallel programming model provides both portability and high performance, it is cumbersome to program. CAF eases this burden by providing a partitioned global address space, but has before now only been implemented on shared-memory machines. To significantly broaden CAF’s appeal, we show that CAF programs can deliver high-performance on commodity cluster platforms. We designed and imple-

A.D.: Model-based performance diagnosis of master-worker parallel computations

by Li Li, Allen D. Malony - In: Europar , 2006
"... Abstract. Parallel performance tuning naturally involves a diagnosis process to locate and explain sources of program inefficiency. Proposed is an approach that exploits parallel computation patterns (models) for diagnosis discovery. Knowledge of performance problems and inference rules for hypothes ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract. Parallel performance tuning naturally involves a diagnosis process to locate and explain sources of program inefficiency. Proposed is an approach that exploits parallel computation patterns (models) for diagnosis discovery. Knowledge of performance problems and inference rules for hypothesis search are engineered from model semantics and analysis expertise. In this manner, the performance diagnosis process can be automated as well as adapted for parallel model variations. We demonstrate the implementation of model-based performance diagnosis on the classic Master-Worker pattern. Our results suggest that patternbased performance knowledge can provide effective guidance for locating and explaining performance bugs at a high level of program abstraction.

Workshop on The Roadmap for the Revitalization of High-End Computing

by unknown authors
"... The views expressed in this report are those of the individual participants and are not necessarily those of their respective organizations or the workshop sponsor. © 2003 by the Computing Research Association. Permission is granted to reproduce the contents provided that such reproduction is not fo ..."
Abstract - Add to MetaCart
The views expressed in this report are those of the individual participants and are not necessarily those of their respective organizations or the workshop sponsor. © 2003 by the Computing Research Association. Permission is granted to reproduce the contents provided that such reproduction is not for profit and credit is given to the source.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University