• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

The search for lost cycles: A new approach to parallel program performance evaluation (1993)

by M Crovella, T LeBlanc
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 13
Next 10 →

Task Parallelism in a High Performance Fortran Framework

by T. Gross, D. O. Hallaron, J. Subhlok - IEEE Parallel and Distributed Technology , 1994
"... High Performance Fortran (HPF) has emerged as a standard dialect of Fortran for data parallel computing. However, for a wide variety of applications, both task and data parallelism must be exploited to achieve the best possible performance on a multicomputer. We present the design and implementation ..."
Abstract - Cited by 83 (18 self) - Add to MetaCart
High Performance Fortran (HPF) has emerged as a standard dialect of Fortran for data parallel computing. However, for a wide variety of applications, both task and data parallelism must be exploited to achieve the best possible performance on a multicomputer. We present the design and implementation of a Fortran compiler that integrates task and data parallelism in an HPF framework. A small set of simple directives allow users to express task parallel programs in a variety of domains. The user identifies opportunities for task parallelism, and the compiler handles task creation and management, as well as communication between tasks. Since a unified compiler handles both task parallelism and data parallelism, existing data parallel programs and libraries can serve as the building blocks for constructing larger task parallel programs. This paper concludes with a description of several parallel application kernels that were developed with the compiler. The examples demonstrate that exploi...

Parallel Performance Prediction using Lost Cycles Analysis

by Mark E. Crovella, Thomas J. LeBlanc - IN PROCEEDINGS OF SUPERCOMPUTING '94 , 1994
"... Most performance debugging and tuning of parallel programs is based on the "measure-modify" approach, which is heavily dependent on detailed measurements of programs during execution. This approach is extremely time-consuming and does not lend itself to predicting performance under varying condition ..."
Abstract - Cited by 62 (1 self) - Add to MetaCart
Most performance debugging and tuning of parallel programs is based on the "measure-modify" approach, which is heavily dependent on detailed measurements of programs during execution. This approach is extremely time-consuming and does not lend itself to predicting performance under varying conditions. Analytic modeling and scalability analysis provide predictive power, but are not widely used inpractice, due primarily to their emphasis on asymptotic behavior and the difficulty of developing accurate models that work for real-world programs. In this paper we describe a set of tools for performance tuning of parallel programs that bridges this gap between measurement and modeling. Our approach is based on lost cycles analysis, which involves measurement and modeling of all sources of overhead in a parallel program. We first describe a tool for measuring overheads in parallel programs that we have incorporated into the runtime environment for Fortran programs on the Kendall Square KSR1. We then describe a tool that ts these overhead measurements to analytic forms. We illustrate the use of these tools by analyzing the performance tradeoffs among parallel implementations of 2D FFT. These examples show how our tools enable programmers to develop accurate performance models of parallel applications without requiring extensive performance modeling expertise.

The CMU Task Parallel Program Suite

by Peter Dinda, Edward Segall, James Stichnoth, Jaspal Subhlok, Jon Webb, Bwolen Yang, et al. , 1994
"... ..."
Abstract - Cited by 56 (7 self) - Add to MetaCart
Abstract not found

Multivariate Statistical Techniques for Parallel Performance Prediction

by Mark J. Clement , Michael J. Quinn - IN PROC. 28TH HAWAII INT. CONF. ON SYSTEM SCIENCES, VOL. II, IEEE , 1995
"... Performance prediction can play an important role in improving the efficiency of multicomputers in executing scalable parallel applications. An accurate model of program execution time must include detailed algorithmic and architectural characterizations. The exact values for critical model paramete ..."
Abstract - Cited by 23 (4 self) - Add to MetaCart
Performance prediction can play an important role in improving the efficiency of multicomputers in executing scalable parallel applications. An accurate model of program execution time must include detailed algorithmic and architectural characterizations. The exact values for critical model parameters such as message latency and cache miss penalty can often be difficult to determine. This research uses multivariate data analysis to estimate the values of these coefficients in an analytical model. Representing the coefficients as random variables with a specified mean and variance improves the utility of a performance model. Confidence intervals for predicted execution time can be generated using the standard error values for model parameters. Improvements in the model can also be made by investigating the cause of large variance values for a particular architecture.

Generation of simple analytical models for message passing applications

by German Rodriguez, Rosa M. Badia, Jesús Labarta - In Proceedings of the Euro-Par Conference , 2004
"... Abstract. We present a methodology which allows to derive accurate and simple models which are able to describe the performance of parallel applications without looking at the source code. A trace is obtained and linear models are derived by fitting the outcome of a set of simulations varying the in ..."
Abstract - Cited by 16 (0 self) - Add to MetaCart
Abstract. We present a methodology which allows to derive accurate and simple models which are able to describe the performance of parallel applications without looking at the source code. A trace is obtained and linear models are derived by fitting the outcome of a set of simulations varying the influential parameters, such as: processor speed, network latency or bandwidth. The simplicity of the linear models allows for natural derivation of interpretations for the corresponding factors of the model, allowing for both prediction accuracy and interpretability to be maintained. We explain how we plan to extend this approach to extrapolate from these models to be apply it to predict for processor counts different to the one of the given traces. 1

Automated Performance Prediction for Scalable Parallel Computing

by Mark J. Clement , Michael J. Quinn - PARALLEL COMPUTING , 1997
"... Performance prediction is necessary in order to deal with multi-dimensional performance effects on parallel systems. The compiler-generated analytical model developed in this paper accounts for the effects of cache behavior, CPU execution time and message passing overhead for real programs writte ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
Performance prediction is necessary in order to deal with multi-dimensional performance effects on parallel systems. The compiler-generated analytical model developed in this paper accounts for the effects of cache behavior, CPU execution time and message passing overhead for real programs written in high level data-parallel languages. The performance prediction technique is shown to be effective in analyzing several nontrivial data-parallel applications as the problem size and number of processors vary. We leverage technology from the Maple symbolic manipulation system and the S-PLUS statistical package in order to present users with critical performance information necessary for performance debugging, architectural enhancement and procurement of parallel systems. The usability of these results is improved through specifying confidence intervals as well as predicted execution times for data-parallel applications.

Automatic Performance Evaluation of Parallel Programs

by Antonio Espinosa, Tomas Margalef, Emilio Luque - In IEEE Proc. of the 6th Euromicro Workshop on Parallel and Distributed Processing. IEEE Computer , 1998
"... Traditional parallel programming forces the programmer, apart from designing the application, to analyse the performance of this recently built application. This dljjjcult task of testing the behaviour of the program can be avoided with the use of an automatic performance analysis tool. Users are re ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
Traditional parallel programming forces the programmer, apart from designing the application, to analyse the performance of this recently built application. This dljjjcult task of testing the behaviour of the program can be avoided with the use of an automatic performance analysis tool. Users are released from having to understand the enormous amount of performance information obtainedfrom the execution of a program. The automatic analysis bases its work on the use of a predefine list of logical rules of production of performance problems. These rules form the “knowledge base ” of the tool. When the tool analyses an application, it looks for the occurrence of an element in the list of performance problems recorded in the ‘<knowledge base”. When one of

Symbolic Performance Prediction of Scalable Parallel Programs

by Mark J. Clement , Michael J. Quinn - IN PROC. OF 9TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM , 1995
"... Recent advances in the power of parallel computers have made them attractive for solving large computational problems. Scalable parallel programs are particularly well suited to Massively Parallel Processing (MPP) machines since the number of computations can be increased to match the available numb ..."
Abstract - Cited by 6 (1 self) - Add to MetaCart
Recent advances in the power of parallel computers have made them attractive for solving large computational problems. Scalable parallel programs are particularly well suited to Massively Parallel Processing (MPP) machines since the number of computations can be increased to match the available number of processors. Performance tuning can be particularly difficult for these applications since it must often be performed with a smaller problem size than that targeted for eventual execution. This research develops a performance prediction methodology that addresses this problem through symbolic analysis of program source code. Algebraic manipulations can then be performed on the resulting analytical model to determine performance for scaled up applications on different hardware architectures.

An Analytical Method for Predicting the Performance of Parallel Image Processing Operations

by Zoltan Juhasz , 1998
"... . This paper presents an analytical performance prediction model and methodology that can be used to predict the execution time, speedup, scalability and similar performance metrics of a large set of image processing operations running on a p-processor parallel system. The model which requires only ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
. This paper presents an analytical performance prediction model and methodology that can be used to predict the execution time, speedup, scalability and similar performance metrics of a large set of image processing operations running on a p-processor parallel system. The model which requires only a few parameters obtainable on a minimal system can help in the systematic design, evaluation and performance tuning of parallel image processing systems. Using the model one can reason about the performance of a parallel image processing system prior to implementation. The method can also support programmers in detecting critical parts of an implementation and system designers in predicting hardware performance and the eect of hardware parameter changes on performance. The execution of parallel image processing operations was studied and operations were arranged in three main problem classes based on data locality and the communication patterns of the algorithms. The core of the method is the derivation of the overhead function, as it is the overhead that determines the achievable speedup. The overheads were examined and modelled for each class. The use of the method is illustrated by four class-representative image processing algorithms: image-scalar addition, convolution, histogram calculation and the Fast Fourier Transform. The developed performance model has been validated on a 16-node parallel machine and it has been shown that the model is able to predict the parallel run-time and other performance metrics of parallel image processing operations accurately. Keywords: performance prediction, image processing, analytical model, overheads, messagepassing, communication pattern 1.

Analysis of input-dependent program behavior using active profiling

by Xipeng Shen, Chengliang Zhang, Chen Ding, Michael L. Scott S, Hya Dwarkadas Mitsunori Ogihara - In the Workshop on Experimental Computer Science , 2007
"... Utility programs, which perform similar and largely independent operations on a sequence of inputs, include such common applications as compilers, interpreters, and document parsers; databases; and compression and encoding tools. The repetitive behavior of these programs, while often clear to users, ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Utility programs, which perform similar and largely independent operations on a sequence of inputs, include such common applications as compilers, interpreters, and document parsers; databases; and compression and encoding tools. The repetitive behavior of these programs, while often clear to users, has been difficult to capture automatically. We present an active profiling technique in which controlled inputs to utility programs are used to expose execution phases, which are then marked, automatically, through binary instrumentation, enabling us to exploit phase transitions in production runs with arbitrary inputs. Experiments with five programs from the SPEC benchmark suites show that phase behavior is surprisingly predictable in many (though not all) cases. This predictability can in turn be used for optimized memory management leading to significant performance improvement. 1.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University