Results 1  10
of
23
Analyzing the Behavior and Performance of Parallel Programs
 Univ. of WisconsinMadison, UW CS Tech. Rep
, 1993
"... An analytical performance model for parallel programs can provide qualitative insight as well as efficient quantitative evaluation and prediction of parallel program performance. While stochastic models for parallel programs can represent execution time variance due to communication and resource con ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
(Show Context)
An analytical performance model for parallel programs can provide qualitative insight as well as efficient quantitative evaluation and prediction of parallel program performance. While stochastic models for parallel programs can represent execution time variance due to communication and resource contention delays, a qualitative assessment of previous models shows that the stochastic assumption makes it extremely difficult to compute synchronization costs and overall execution times. This thesis first reevaluates the need for the stochastic assumption by examining the influence of nondeterministic communication and resource contention delays on execution times in parallel programs. An analytical model of program behavior, combined with detailed program measurements, provides compelling evidence that in sharedmemory programs on current systems as well as programs with similar granularity on foreseeable future systems, such delays introduce extremely low variance into the execution tim...
The Influence of Random Delays on Parallel Execution Times
 IN PROC. 1993 ACM SIGMETRICS CONF. ON MEASUREMENT AND MODELLING OF COMPUTER SYSTEMS
, 1993
"... Stochastic models are widely used for the performance evaluation of parallel programs and systems. The stochastic assumptions in such models are intended to represent nondeterministic processing requirements as well as random delays due to interprocess communication and resource contention. In t ..."
Abstract

Cited by 35 (2 self)
 Add to MetaCart
Stochastic models are widely used for the performance evaluation of parallel programs and systems. The stochastic assumptions in such models are intended to represent nondeterministic processing requirements as well as random delays due to interprocess communication and resource contention. In this paper, we provide compelling analytical and experimental evidence that in current and foreseeable sharedmemory programs, communication delays introduce negligible variance into the execution time between synchronization points. Furthermore, we show using direct measurements of variance that other sources of randomness, particularly nondeterministic computational requirements, also do not introduce significant variance in many programs. We then use two examples to demonstrate the implications of these results for parallel program performance prediction models, as well as for general stochastic models of parallel systems.
Structural Prediction Models for HighPerformance Distributed Applications
 PROCEEDINGS OF THE CLUSTER COMPUTING CONFERENCE (CCC '97)
, 1997
"... We present a structural performance model that uses application profiles and component models to predict an application's performance on a set of distributed resources. We decompose application performance in accordance with the structure of the application: that is, into interacting component ..."
Abstract

Cited by 33 (10 self)
 Add to MetaCart
We present a structural performance model that uses application profiles and component models to predict an application's performance on a set of distributed resources. We decompose application performance in accordance with the structure of the application: that is, into interacting component models that correspond to component tasks. Then, using the application profile and available information as guides, we select models for each component appropriately. As a proof of concept, we have implemented this approach for two distributed applications, a masterslave genetic algorithm code and a redblack stencil successive overrelaxation code. Our predictions are within 10% of actual time. Context Clusters of distributed machines have become a common platform for high performance applications, but remain a challenging environment in which to achieve good performance. One reason for this is the difficulty of predicting an application's execution time in this variable setting, where only mi...
Parallel program performance prediction using deterministic task graph analysis
 ACM Trans. Comput. Syst
, 2004
"... In this paper, we consider analytical techniques for predicting detailed performance characteristics of a single shared memory parallel program for a particular input. Analytical models for parallel programs have been successful at providing simple qualitative insights and bounds on program scalabil ..."
Abstract

Cited by 28 (3 self)
 Add to MetaCart
(Show Context)
In this paper, we consider analytical techniques for predicting detailed performance characteristics of a single shared memory parallel program for a particular input. Analytical models for parallel programs have been successful at providing simple qualitative insights and bounds on program scalability, but have been less successful in practice for providing detailed insights and metrics for program performance (leaving these to measurement or simulation). We develop a conceptually simple modeling technique called deterministic task graph analysis that provides detailed performance prediction for sharedmemory programs with arbitrary task graphs, a wide variety of task scheduling policies, and significant communication and resource contention. Unlike many previous models that are stochastic models, our model assumes deterministic task execution times (while retaining the use of stochastic models for communication and resource contention). This assumption is supported by a previous study of the influence of nondeterministic delays in parallel programs. We evaluate our model in three ways. First, an experimental evaluation shows that our analysis technique is accurate and efficient for a variety of sharedmemory programs, including programs with large and/or complex task graphs, sophisticated task scheduling, highly nonuniform task
Static and Dynamic Processor Scheduling Disciplines in Heterogeneous Parallel Architectures
, 1995
"... Most parallel jobs cannot be fully parallelized. In a homogeneous parallel machine  one in which all processors are identical  the serial fraction of the computation has to be executed at the speed of any of the identical processors, limiting the speedup that can be obtained due to parallelism ..."
Abstract

Cited by 27 (2 self)
 Add to MetaCart
Most parallel jobs cannot be fully parallelized. In a homogeneous parallel machine  one in which all processors are identical  the serial fraction of the computation has to be executed at the speed of any of the identical processors, limiting the speedup that can be obtained due to parallelism. In a heterogeneous architecture, the sequential bottleneck can be greatly reduced by running the sequential part of the job or even the critical tasks in a faster processor. This paper uses Markov chain based models to analyze the performance of static and dynamic processor assignment policies for heterogeneous architectures. Parallel jobs are assumed to be described by acyclic directed task graphs. A new static processor assignment policy, called Largest Task First Minimum Finish Time (LTFMFT), is introduced. The analysis shows that this policy is very sensitive to the degree of heterogeneity of the architecture, and that it outperforms all other policies analyzed. Three dynamic assignmen...
Performance Prediction and Scheduling for Parallel Applications on MultiUser Clusters
, 1998
"... ..."
On Performance Prediction of Parallel Computations with Precedent Constraints
, 1994
"... Performance analysis of concurrent executions in parallel systems has been recognized as a challenging problem. The aim of this research is to study approximate but ecient solution techniques for this problem. We model the structure of a parallel machine and the structure of the jobs executing on ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
Performance analysis of concurrent executions in parallel systems has been recognized as a challenging problem. The aim of this research is to study approximate but ecient solution techniques for this problem. We model the structure of a parallel machine and the structure of the jobs executing on such a system. We investigate rich classes of jobs, which can be expressed by series, paralleland, parallelor, and probabilisticfork. We propose an efficient performance prediction method for these classes of jobs running on a parallel environment which is modeled by a standard queueing network model. The proposed prediction method is computationally efficient, it has polynomial complexity in both time and space. The time complexity is O(C²N²K) and the space complexity is O(C²N²K), where C is the number of job classes in the system, the number of tasks in each job class is O(N), and K is the number of service centers in the queueing model. The accuracy of the approxi...
A Deterministic Model for Parallel Program Performance Evaluation
, 1998
"... Parallelism Metrics Previous studies have shown that a few key parameters such as the fraction of sequential work [Amdahl 1967], average parallelism [Eager et al. 1989], and variance of parallelism [Sevcik 1989] can each provide a concise yet powerful characterization of program performance. For ex ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
(Show Context)
Parallelism Metrics Previous studies have shown that a few key parameters such as the fraction of sequential work [Amdahl 1967], average parallelism [Eager et al. 1989], and variance of parallelism [Sevcik 1989] can each provide a concise yet powerful characterization of program performance. For example, Eager et al. have derived bounds on speedup for arbitrary, workconserving, task scheduling functions using the average parallelism (A) alone [Eager et al. 1989]. Sevcik has shown that for scheduling parallel jobs on a multiprocessor system, where only very concise information about the arriving jobs can be used, effective scheduling is possible using just two parameters, average parallelism and variance of parallelism [Sevcik 1989]. While simple parameters like the fraction of sequential work and Pmax can be easily estimated for a given program, other parameters like A and the variance of parallelism are much more difficult to obtain. (For example, consider estimating A for PSIM or ...