Results 1 - 10
of
10
Techniques for Debugging Parallel Programs with Flowback Analysis
, 1991
"... Flowback analysis is a powerful technique for debugging programs. It allows the programmer to examine dynamic dependences in a program's execution history without having to re-execute the program. The goal is to present to the programmer a graphical view of the dynamic program dependences. We are bu ..."
Abstract
-
Cited by 84 (8 self)
- Add to MetaCart
Flowback analysis is a powerful technique for debugging programs. It allows the programmer to examine dynamic dependences in a program's execution history without having to re-execute the program. The goal is to present to the programmer a graphical view of the dynamic program dependences. We are building a system, called PPD, that performs flowback analysis while keeping the execution time overhead low. We also extend the semantics of flowback analysis to parallel programs. This paper describes details of the graphs and algorithms needed to implement efficient flowback analysis for parallel programs. Execution time overhead is kept low by recording only a small amount of trace during a program's execution. We use semantic analysis and a technique called incremental tracing to keep the time and space overhead low. As part of the semantic analysis, PPD uses a static program dependence graph structure that reduces the amount of work done at compile time and takes advantage of the dynamic...
Array Privatization for Parallel Execution of Loops
- In Proceedings of the 19th International Symposium on Computer Architecture
, 1992
"... In recent experiments, array privatization played a critical role in successful parallelization of several real programs. This paper presents compiler algorithms for the program analysis for this transformation. The paper also addresses issues in the implementation. 1 Introduction The diversity of ..."
Abstract
-
Cited by 64 (9 self)
- Add to MetaCart
In recent experiments, array privatization played a critical role in successful parallelization of several real programs. This paper presents compiler algorithms for the program analysis for this transformation. The paper also addresses issues in the implementation. 1 Introduction The diversity of parallel architectures makes it difficult to write efficient parallel programs in a machine independent language. For a long time, many researchers have pursued the goal of automatic transformation of sequential programs into parallel machine code. Unfortunately, the result has been unsatisfactory. Many transformation techniques used in existing compilers do not prove to be effective in practice [EB91], mainly because they handle relatively simple cases. On the other hand, recent experiments show significant results by solving more complex cases, using hand-performed new analyses and transformations [EHJ + 91], [EHLP91]. A technique called array privatization, along with other techniques,...
Performance Prediction of Parallel Processing Systems: The PAMELA Methodology
- in Proc. 7th ACM Int. Conf. on Supercomputing
, 1993
"... In this paper we present a new methodology for the performance prediction of parallel programs on parallel platforms ranging from shared-memory to distributed-memory (vector) machines. The methodology comprises a procedural program and machine specification paradigm based on Pamela (PerformAnce ModE ..."
Abstract
-
Cited by 37 (17 self)
- Add to MetaCart
In this paper we present a new methodology for the performance prediction of parallel programs on parallel platforms ranging from shared-memory to distributed-memory (vector) machines. The methodology comprises a procedural program and machine specification paradigm based on Pamela (PerformAnce ModEling LAnguage), along with a performance calculus, called "serialization analysis". This calculus extends conventional parallel program analysis technology by explicitly accounting for resource contention, yet at the low evaluation cost typical for static techniques. It is shown that, where conventional techniques introduce fundamental errors, predictions from serialization analysis remain realistic. Apart from the merits of the methodology itself, this high reliability/cost ratio makes Pamela an attractive candidate for compile-time application within the performance prediction hierarchy often found in parallel programming environments. 1 Introduction The performance of a concurrent syste...
Optimal Control Dependence Computation and the Roman Chariots Problem
- ACM Transactions on Programming Languages and Systems
, 1997
"... this article, we introduce the augmented postdominator tree (APT ), a data structure which can be constructed in space and time proportional to the size of the program and which supports enumeration of a number of useful control dependence sets in time proportional to their size. Therefore, APT prov ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
this article, we introduce the augmented postdominator tree (APT ), a data structure which can be constructed in space and time proportional to the size of the program and which supports enumeration of a number of useful control dependence sets in time proportional to their size. Therefore, APT provides an optimal representation of control dependence. Specifically, the APT
The Design and Implementation of Genesis
, 1994
"... This paper describes the design and implementation of Genesis and demonstrates how such a generator could be used by optimizer designers. Some experiences with the generator are also described ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This paper describes the design and implementation of Genesis and demonstrates how such a generator could be used by optimizer designers. Some experiences with the generator are also described
Performance Prediction of Data-Dependent Task Parallel Programs
- in Proc. of the 7th Intl. Conference on Parallel Processing (EuroPar 2001), Manchester, United Kingdom
, 2001
"... Current analytic solutions to the execution time prediction Y of binary parallel compositions of tasks with arbitrary execution time distributions X1 and X2 are either computationally complex or very inaccurate. In this paper we introduce an analytical approach based on the use of lambda distribu ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Current analytic solutions to the execution time prediction Y of binary parallel compositions of tasks with arbitrary execution time distributions X1 and X2 are either computationally complex or very inaccurate. In this paper we introduce an analytical approach based on the use of lambda distributions to approximate execution time distributions. This allows us to predict the first 4 statistical moments of Y in terms of the first 4 moments of X i at negligible solution complexity. The prediction method applies to a wide range of workload distributions as found in practice, while its accuracy is better or equal compared to comparable low-cost approaches.
Kremlin: Rethinking and Rebooting gprof for the Multicore Age
"... Many recent parallelization tools lower the barrier for parallelizing a program, but overlook one of the first questions that a programmer needs to answer: which parts of the program should I spend time parallelizing? This paper examines Kremlin, an automatic tool that, given a serial version of a p ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Many recent parallelization tools lower the barrier for parallelizing a program, but overlook one of the first questions that a programmer needs to answer: which parts of the program should I spend time parallelizing? This paper examines Kremlin, an automatic tool that, given a serial version of a program, will make recommendations to the user as to what regions (e.g. loops or functions) of the program to attack first. Kremlin introduces a novel hierarchical critical path analysis and develops a new metric for estimating the potential of parallelizing a region: self-parallelism. We further introduce the concept of a parallelism planner, which provides a ranked order of specific regions to the programmer that are likely to have the largest performance impact when parallelized. Kremlin supports multiple planner personalities, which allow the planner to more effectively target a particular programming environment or class of machine. We demonstrate the effectiveness of one such personality, an OpenMP planner, by comparing versions of programs that are parallelized according to Kremlin’s plan against third-party manually parallelized versions. The results show that Kremlin’s OpenMP planner is highly effective, producing plans whose performance is typically comparable to, and sometimes much better than, manual parallelization. At the same time, these plans would require that the user parallelize significantly fewer regions of the program.
Issues On The Design Of Parallelizing Compilers
, 1990
"... ion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2.3.2 Symbol Table : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 2.3.3 Abstract Syntax Tree : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 2.3.4 Data Dependence Graph : : : : : : : : : : ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
ion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2.3.2 Symbol Table : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 2.3.3 Abstract Syntax Tree : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 2.3.4 Data Dependence Graph : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11 2.3.5 Call Graph : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12 2.3.6 Flow Graph : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12 2.3.7 Task Graph : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12 3 Multiversion Code Generation : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14 3.1 Preliminaries : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15 3.2 Single Loops : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 17 3.2.1 Serializing Parallel Loops : : : : : : : : : : : : : : : : : : : : : : : : : : : 17 3.2.2 Vector vers...
Performance Estimation for Embedded Systems
, 2000
"... In this document we propose a symbolic performance modeling technique to be used as the basis of the JOSES cost estimator. The approach is inspired by the need for highly parametric cost models in the initial stages in parallel program design, where absolute prediction accuracy is of less priority t ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In this document we propose a symbolic performance modeling technique to be used as the basis of the JOSES cost estimator. The approach is inspired by the need for highly parametric cost models in the initial stages in parallel program design, where absolute prediction accuracy is of less priority than solution cost, and where symbolic feedback on the effects of user mapping decisions and machine parameters is of primary concern. As illustrated by the case study, the symbolic approach provides good feedback on the effects of partitioning choices as well as the influence of computation and communication parameters on application performance.
Abstract Array Privatization for Parallel Execution of Loops
"... In recent experiments, array privatization played a critical role in successful parallelization of several real programs. This paper presents compiler algorithms for the program analysis for this transformation. The paper also addresses issues in the implementation. 1 ..."
Abstract
- Add to MetaCart
In recent experiments, array privatization played a critical role in successful parallelization of several real programs. This paper presents compiler algorithms for the program analysis for this transformation. The paper also addresses issues in the implementation. 1

