## Automatic scalability analysis of parallel programs based on modeling techniques (1994)

### Cached

### Download Links

- [www.cs.uoregon.edu]
- [www.cs.uoregon.edu]
- [www7.informatik.uni-erlangen.de]
- DBLP

### Other Repositories/Bibliography

Venue: | in Computer Performance Evaluation: Modelling Techniques and Tools (LNCS 794 |

Citations: | 18 - 1 self |

### BibTeX

@INPROCEEDINGS{Malony94automaticscalability,

author = {Allen D. Malony and Vassilis Mertsiotakis and Andreas Quick},

title = {Automatic scalability analysis of parallel programs based on modeling techniques},

booktitle = {in Computer Performance Evaluation: Modelling Techniques and Tools (LNCS 794},

year = {1994},

pages = {139--158},

publisher = {Springer-Verlag}

}

### OpenURL

### Abstract

When implementing parallel programs for parallel computer systems the performance scalability of these programs should be tested and analyzed on different computer configurations and problem sizes. Since a complete scalability analysis is too time consuming and is limited to only existing systems, extensions of modeling approaches can be considered for analyzing the behavior of parallel programs under different problem and system scenarios. In this paper, a method for automatic scalability analysis using modeling is presented. Initially, we identify the important problems that arise when attempting to apply modeling techniques to scalability analysis. Based on this study, we define the Parallelization Description Language (PDL) that is used to describe parallel execution attributes of a generic program workload. Based on a parallelization description, stochastic models like graph models or Petri net models can be automatically generated from a generic model to analyze performance for scaled parallel systems as well as scaled input data. The complexity of the graph models produced depends significantly on the type of parallel computation described. We present several computation classes where tractable graph models can be generated and then compare the results of these automatically scaled models with their exact solutions using the PEPP modeling tool. 1

### Citations

258 | A Class of Generalized Stochastic Petri Nets for the Performance Evaluation of Multiprocessor Systems”, retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1 .90.8002 on 17/5/2013
- Marsan, Balbo, et al.
(Show Context)
Citation Context ...rical analysis is time consuming and is limited to existing parallel computer systems. Modeling parallel programs with discrete event models like stochastic graph models [15] or stochastic Petri nets =-=[1]-=- is a well–known and proven method to analyze a program’s dynamic behavior. It can be used to predict the program’s execution time [16], and, by changing model parameters, help to understand the progr... |

48 | Implementing a Parallel C++ Runtime System for Scalable Parallel Systems
- Bodin, Beckman, et al.
- 1993
(Show Context)
Citation Context ...t numbers of processors or problem sizes. As done with performance monitoring, we also want to use modeling to analyze different parts of the program in order to obtain a detailed scalability profile =-=[2, 11]-=-. A significant advantage of modeling vs. monitoring is that model–based analysis is not restricted to existing systems and does not, necessarily, require access to existing systems for experimentatio... |

27 |
Bounds for the mean runtime of parallel programs
- Hartleb, Mertsiotakis
- 1992
(Show Context)
Citation Context ...ess because models are solved using series–parallel reduction instead of creating a state space [17]. Results obtained with bounding methods implemented in PEPP are shown in Table 2 and Figure 14. In =-=[7]-=- we have shown for various graph structures that the bounding methods implemented in PEPP are very accurate. In PEPP, three different bounding methods are implemented in order to select the best bound... |

26 |
A method for performance prediction of parallel programs
- Sotz
- 1990
(Show Context)
Citation Context ...models like stochastic graph models [15] or stochastic Petri nets [1] is a well–known and proven method to analyze a program’s dynamic behavior. It can be used to predict the program’s execution time =-=[16]-=-, and, by changing model parameters, help to understand the program’s general performance behavior, to investigate reasons for performance bottlenecks, or to identify program errors. 1sWhen using mode... |

24 |
Formal Description, time and Performance Analysis: A Framework
- Herzog
- 1990
(Show Context)
Citation Context ...e Methodology Modeling programs to be executed on parallel or distributed systems is too complicated a task to develop a model from scratch. For this reason Herzog proposed a “three step methodology” =-=[9]-=- to reduce modeling complexity. Instead of creating a single, monolithic model for each combination of workload, machine configuration, and load distribution, a workload model is developed independent... |

23 | Dynamic processor self-scheduling for general parallel nested loops - Fang, Tang, et al. - 1990 |

22 |
Stochastic bounds on execution times of parallel programs
- Yazici-Pekergin, Vincent
- 1991
(Show Context)
Citation Context ...jobs. The execution time is determined by the interarrival time of new jobs and the runtime distribution of one pipeline stage. The macropipeline task graphs have also been referred to as mesh graphs =-=[21]-=- and are characteristic of wavefront computations. Our scalability approach is similar to that for neighbor synchronization (Figure 5). We identify sets of independent tasks and separate their paralle... |

16 | PEPP: performance evaluation of parallel programs — User’s guide — Version 3.1
- Dauphin, Hartleb, et al.
- 1992
(Show Context)
Citation Context ...c modeling tools. To evaluate the efficacy of our techniques, we have integrated methods for model generation and scalability analysis into our tool PEPP (Performance Evaluation of Parallel Programs) =-=[4]-=-. Based on the parallelization description language, PDL, a model generator for other model targets (like stochastic Petri net models) can be implemented in a similar manner. The remainder of the pape... |

11 |
Reliability and performability techniques and tools: A survey
- Malhotra, Trivedi
- 1993
(Show Context)
Citation Context ...ly impact scaled graph model complexity — resulting in solution intractability — if exact execution behavior is modeled. Although modeling techniques have been developed that are “largeness tolerant” =-=[17]-=- (i.e., can deal to some extent with graph complexity), the process of creating a correct and accurate graph model is non–trivial. In order to overcome these model generation and evaluation problems, ... |

10 |
An approach to monitoring and modeling of multiprocessor and multicomputer systems
- Hofmann, Klar, et al.
- 1988
(Show Context)
Citation Context ...t numbers of processors or problem sizes. As done with performance monitoring, we also want to use modeling to analyze different parts of the program in order to obtain a detailed scalability profile =-=[2, 11]-=-. A significant advantage of modeling vs. monitoring is that model–based analysis is not restricted to existing systems and does not, necessarily, require access to existing systems for experimentatio... |

9 |
Performance Measurement Tools in a Multiprocessor Environment
- Burkhart, Millen
- 1989
(Show Context)
Citation Context ...he results are presented in a speedup chart. As done with performance monitoring, modeling can classify different parts of the program in order to obtain a detailed scalability profile (loss analysis =-=[3, 2]-=-). The relative influence of the different program phases on the program’s execution time can be determined in the model. For this, the execution time of all program phases not considered should be se... |

9 | Parallel loop constructs for multiprocessors - Davies - 1981 |

9 | Performance Prediction of Parallel Programs
- Wabnig, Kotsis, et al.
- 1993
(Show Context)
Citation Context ...ph in Figure 6; the graph shown here is for a 6 x 6 matrix. Given a large matrix, the graph would consist of several thousands of nodes, making certain solution techniques computationally intractable =-=[19]-=-. However, we can transform the generic model to a simpler scaled model. Again, our standard technique can be applied in this case by identifying independent tasks at different iteration levels. Howev... |

8 |
Stochastic analysis of parallel programs for hierarchical multiprocessor systems
- Kleinöder
- 1982
(Show Context)
Citation Context ...6 8 36.7 73.4 146.8 9 33 66 132 10 25.7 51.4 102.8 Table 3: Results of Approximate Modeling Results obtained by evaluating our scaled models are upper bounds of the mean runtime. Kleinöder has proved =-=[12]-=- that an upper bound is obtained by inserting arcs. In this case, 18sthe added barrier synchronization arcs are causing higher execution times. The deletion of arcs leads to a lower bound, because exe... |

7 | Stochastic Graph Models for Performance Evaluation of Parallel Programs and the Evaluation Tool PEPP
- Hartleb
- 1993
(Show Context)
Citation Context ...ns by applying efficient solutions methods including a series–parallel structure solver, an approximate state space analysis, and bounding methods to obtain upper and lower bounds of the mean runtime =-=[8]-=-. In order to model measured runtimes, numerical runtime distributions are allowed in all three cases. The following example illustrates how solutions to large graph models are calculated using boundi... |

6 |
A Hybrid, Combinatorial Method of Solving Performance and Reliability Models
- Sahner
- 1986
(Show Context)
Citation Context ...d problem testcases. Such empirical analysis is time consuming and is limited to existing parallel computer systems. Modeling parallel programs with discrete event models like stochastic graph models =-=[15]-=- or stochastic Petri nets [1] is a well–known and proven method to analyze a program’s dynamic behavior. It can be used to predict the program’s execution time [16], and, by changing model parameters,... |

5 |
High-Performance Compilers for
- Wolfe
- 1996
(Show Context)
Citation Context ...neric Model Scaled Model Figure 6: Scaling Fork-Join, Broadcast-Reduction Models 9s3.2.4 Paired Synchronization Our last computation class is commonly found in parallel loops with static dependencies =-=[20]-=-. In general, we are considering parallelizable loop statements, where the loop body consists essentially of independent and dependent parts. The independent parts can be executed at any time. The dep... |

4 | Stochastic modeling of scaled parallel programs
- Malony, Mertsiotakis, et al.
- 1994
(Show Context)
Citation Context ...ted scaled models were gained from our experience in model evaluation using various bounding methods, since most of them apply modifications to the models until they are series–parallel reducible. In =-=[13]-=- different bounding methods are compared in order to obtain good scaled models. 4 The Parallelization Description Language PDL In order to carry out scalability analysis of a parallel program based on... |

3 | A New Approach to Behavior Analysis of Parallel Programs Based on Monitoring
- Quick
- 1993
(Show Context)
Citation Context ...mple illustrates how solutions to large graph models are calculated using bounding methods. For systematic monitoring of parallel and distributed programs, PEPP implements the M 2 – cycle methodology =-=[14]-=-. Here, a functional model of a program to be measured is used for event selection, automatic program instrumentation, and event trace evaluation. allowing the functional model to be extended into a p... |

2 |
Synchronization Problems in Hierarchically Organized Multiprozessor Computer Systems
- Herzog, Hofmann
- 1979
(Show Context)
Citation Context ...ructure (Figure 3(a)). The main characteristic of this computation class is that a processor can start the � -th iteration only after the ¢��¡s1� -th iteration has finished on its neighbor processors =-=[10]-=-. (Although Figure 3(a) shows only two neighbor processes, in general, the number of neighbor tasks can be greater than two.) The generic graph model for the parallel computation class with neighbor s... |