Results 1 - 10
of
33
Analyzing the Behavior and Performance of Parallel Programs
- Univ. of Wisconsin-Madison, UW CS Tech. Rep
, 1993
"... An analytical performance model for parallel programs can provide qualitative insight as well as efficient quantitative evaluation and prediction of parallel program performance. While stochastic models for parallel programs can represent execution time variance due to communication and resource con ..."
Abstract
-
Cited by 37 (5 self)
- Add to MetaCart
An analytical performance model for parallel programs can provide qualitative insight as well as efficient quantitative evaluation and prediction of parallel program performance. While stochastic models for parallel programs can represent execution time variance due to communication and resource contention delays, a qualitative assessment of previous models shows that the stochastic assumption makes it extremely difficult to compute synchronization costs and overall execution times. This thesis first re-evaluates the need for the stochastic assumption by examining the influence of non-deterministic communication and resource contention delays on execution times in parallel programs. An analytical model of program behavior, combined with detailed program measurements, provides compelling evidence that in shared-memory programs on current systems as well as programs with similar granularity on foreseeable future systems, such delays introduce extremely low variance into the execution tim...
A Tabu Search Approach to Task Scheduling on Heterogeneous Processors under Precedence Constraints
, 1994
"... Parallel programs may be represented as a set of interrelated sequential tasks. When multiprocessors are used to execute such programs, the parallel portion of the application can be speeded up by an appropriate allocation of processors to the tasks of the application. Given a parallel application d ..."
Abstract
-
Cited by 31 (9 self)
- Add to MetaCart
Parallel programs may be represented as a set of interrelated sequential tasks. When multiprocessors are used to execute such programs, the parallel portion of the application can be speeded up by an appropriate allocation of processors to the tasks of the application. Given a parallel application defined by a task precedence graph, the goal of task scheduling (or processor assignment) is thus the minimization of the makespan of the application. In a heterogeneous multiprocessor system, task scheduling consists in determining which tasks will be assigned to each processor, as well as the execution order of the tasks assigned to each processor. In this work, we apply the tabu search metaheuristic to the solution of the task scheduling problem on a heterogeneous multiprocessor environment under precedence constraints. The topology of the Mean Value Analysis solution package for product form queueing networks is used as the framework for performance evaluation. We show that tabu search ob...
The Influence of Random Delays on Parallel Execution Times
- IN PROC. 1993 ACM SIGMETRICS CONF. ON MEASUREMENT AND MODELLING OF COMPUTER SYSTEMS
, 1993
"... Stochastic models are widely used for the performance evaluation of parallel programs and systems. The stochastic assumptions in such models are intended to represent non-deterministic processing requirements as well as random delays due to inter-process communication and resource contention. In t ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
Stochastic models are widely used for the performance evaluation of parallel programs and systems. The stochastic assumptions in such models are intended to represent non-deterministic processing requirements as well as random delays due to inter-process communication and resource contention. In this paper, we provide compelling analytical and experimental evidence that in current and foreseeable shared-memory programs, communication delays introduce negligible variance into the execution time between synchronization points. Furthermore, we show using direct measurements of variance that other sources of randomness, particularly non-deterministic computational requirements, also do not introduce significant variance in many programs. We then use two examples to demonstrate the implications of these results for parallel program performance prediction models, as well as for general stochastic models of parallel systems.
Structural Prediction Models for High-Performance Distributed Applications
- PROCEEDINGS OF THE CLUSTER COMPUTING CONFERENCE (CCC '97)
, 1997
"... We present a structural performance model that uses application profiles and component models to predict an application's performance on a set of distributed resources. We decompose application performance in accordance with the structure of the application: that is, into interacting component model ..."
Abstract
-
Cited by 27 (10 self)
- Add to MetaCart
We present a structural performance model that uses application profiles and component models to predict an application's performance on a set of distributed resources. We decompose application performance in accordance with the structure of the application: that is, into interacting component models that correspond to component tasks. Then, using the application profile and available information as guides, we select models for each component appropriately. As a proof of concept, we have implemented this approach for two distributed applications, a master-slave genetic algorithm code and a red-black stencil successive over-relaxation code. Our predictions are within 10% of actual time. Context Clusters of distributed machines have become a common platform for high performance applications, but remain a challenging environment in which to achieve good performance. One reason for this is the difficulty of predicting an application's execution time in this variable setting, where only mi...
A Probabilistic Approach to Parallel System Performance Modelling
, 1995
"... For the development of efficient parallel applications, fast but reliable performance predictions are essential. Many existing modelling formalisms are either not directly suited to model parallel applications, or too expensive. This paper describes several extensions and improvements to a previousl ..."
Abstract
-
Cited by 18 (10 self)
- Add to MetaCart
For the development of efficient parallel applications, fast but reliable performance predictions are essential. Many existing modelling formalisms are either not directly suited to model parallel applications, or too expensive. This paper describes several extensions and improvements to a previously introduced methodology, based on an extension of queueing networks. The set of machine model building blocks is extended, a new algorithm for the prediction of multiple-class parallel section completion times is introduced, and it is shown how programs containing conditional statements at the program level and memory hierarchies the machine level are modelled. The concepts introduced in this paper are illustrated by a number of examples throughout the paper, and a case study comparing the predictions to measurements carried out on an actual parallel machine.
Parallel program performance prediction using deterministic task graph analysis
- ACM Trans. Comput. Syst
, 2004
"... In this paper, we consider analytical techniques for predicting detailed performance characteristics of a single shared memory parallel program for a particular input. Analytical models for parallel programs have been successful at providing simple qualitative insights and bounds on program scalabil ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
In this paper, we consider analytical techniques for predicting detailed performance characteristics of a single shared memory parallel program for a particular input. Analytical models for parallel programs have been successful at providing simple qualitative insights and bounds on program scalability, but have been less successful in practice for providing detailed insights and metrics for program performance (leaving these to measurement or simulation). We develop a conceptually simple modeling technique called deterministic task graph analysis that provides detailed performance prediction for shared-memory programs with arbitrary task graphs, a wide variety of task scheduling policies, and significant communication and resource contention. Unlike many previous models that are stochastic models, our model assumes deterministic task execution times (while retaining the use of stochastic models for communication and resource contention). This assumption is supported by a previous study of the influence of non-deterministic delays in parallel programs. We evaluate our model in three ways. First, an experimental evaluation shows that our analysis technique is accurate and efficient for a variety of shared-memory programs, including programs with large and/or complex task graphs, sophisticated task scheduling, highly non-uniform task
Static and Dynamic Processor Scheduling Disciplines in Heterogeneous Parallel Architectures
, 1995
"... Most parallel jobs cannot be fully parallelized. In a homogeneous parallel machine --- one in which all processors are identical --- the serial fraction of the computation has to be executed at the speed of any of the identical processors, limiting the speedup that can be obtained due to parallelism ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Most parallel jobs cannot be fully parallelized. In a homogeneous parallel machine --- one in which all processors are identical --- the serial fraction of the computation has to be executed at the speed of any of the identical processors, limiting the speedup that can be obtained due to parallelism. In a heterogeneous architecture, the sequential bottleneck can be greatly reduced by running the sequential part of the job or even the critical tasks in a faster processor. This paper uses Markov chain based models to analyze the performance of static and dynamic processor assignment policies for heterogeneous architectures. Parallel jobs are assumed to be described by acyclic directed task graphs. A new static processor assignment policy, called Largest Task First Minimum Finish Time (LTFMFT), is introduced. The analysis shows that this policy is very sensitive to the degree of heterogeneity of the architecture, and that it outperforms all other policies analyzed. Three dynamic assignmen...
Analyzing Concurrent and Fault-Tolerant Software using Stochastic Reward Nets
- Journal of Parallel and Distributed Computing
, 1992
"... We present two software applications and develop models for them. The first application considers a producer-consumer tasking system with an intermediate buffer task and studies how the performance is affected by different selection policies when multiple tasks are ready to synchronize. The second a ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
We present two software applications and develop models for them. The first application considers a producer-consumer tasking system with an intermediate buffer task and studies how the performance is affected by different selection policies when multiple tasks are ready to synchronize. The second application studies the reliability of a fault-tolerant software system using the recovery block scheme. The model is incrementally augmented by considering clustered failures or the effective arrival rate of inputs to the system. We use stochastic reward nets, a variant of stochastic Petri nets, to model the two software applications. In both models, each quantity to be computed is defined in terms of either the expected value of a reward rate in steady-state or at a given time `, or as the expected value of the accumulated reward until absorption or until a given time `. This allows extreme flexibility while maintaning a rigorous formalization of these quantities. 1 Introduction Many appli...

