Results 1 - 10
of
22
A Hierarchical Approach to Workload Characterization for Parallel Systems
, 1995
"... . Performance evaluation studies are to be an integral part of the design and tuning of parallel applications. Their structure and their behavior are the dominating factors. We propose a hierarchical approach to the systematic characterization of the workload of a parallel system, to be kept as ..."
Abstract
-
Cited by 23 (9 self)
- Add to MetaCart
. Performance evaluation studies are to be an integral part of the design and tuning of parallel applications. Their structure and their behavior are the dominating factors. We propose a hierarchical approach to the systematic characterization of the workload of a parallel system, to be kept as modular and flexible as possible. The methodology is based on three different, but related, layers: the application, the algorithm, and the routine layer. For each of these layers different characteristics representing functional, sequential, parallel, and quantitative descriptions have been identified. Taking also architectural and mapping features into consideration, the hierarchical workload characterization can be used for any type of performance studies. 1 Introduction The main reason to use parallel systems is to get more performance, i.e. either to be able to solve larger problems or to solve given problems in shorter time. So, in fact, performance is the driving force to deve...
A Graphical Toolset for Simulation Modelling of Parallel Systems
- Parallel Computing
, 1996
"... In this paper, a simulation model for incorporation within a performance-oriented parallel software development environment is presented. This development environment is composed of a graphical design tool, a simulation facility, and a visualisation tool. Simulation allows parallel program performan ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
In this paper, a simulation model for incorporation within a performance-oriented parallel software development environment is presented. This development environment is composed of a graphical design tool, a simulation facility, and a visualisation tool. Simulation allows parallel program performance to be predicted and design alternatives to be compared. The target parallel system models a virtual machine composed of a cluster of workstations interconnected by a local area network. The simulation model architecture is modular and extensible which allows re-configuration of the platform. The model description and the validation experiments which have been conducted to assess the correctness and the accuracy of the model are also presented. 1 Introduction The key obstacle to the widespread adoption of parallel computing is the difficulty in program development. Firstly, an application has to be decomposed into parallel objects (processes, or tasks) according to the computational model ...
N-map: A virtual processor discrete event simulation tool for performance predicition in capse
- In 28th Annual Hawaii International Conference on Systems Sciences
, 1995
"... The CAPSE (Computer Aided Parallel Software Engineering) environment aims to assist a perfor-mance oriented parallel program development approach by integrating tools for performance prediction in the design phase, analytical or simulation based perfor-mance analysis in the detailed specification an ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
The CAPSE (Computer Aided Parallel Software Engineering) environment aims to assist a perfor-mance oriented parallel program development approach by integrating tools for performance prediction in the design phase, analytical or simulation based perfor-mance analysis in the detailed specification and coding phase, and finally monitoring in the testing and cor-rection phase. In this work, the N-MAP tool as part of the CAPSE environment is presented. N-MAP covers the crucial aspect of performance prediction to support a perfor-mance oriented, incremental development process of parallel applications such that implementation design choices can be investigated far ahead of the full coding of the application. Methodologically, N-MAP in an automatic parse and translate step generates a simu-lation program from a skeletal SPMD program, with which the programmer expresses just the constituent and performance critical program parts, subject to an incremental refinement. The simulated execution of the SPMD skeleton supports a variety of performance studies. We demonstrate the use and performance of the N-MAP tool by developing a linear system solver for the CM-5. 1
Simulation Modelling of Parallel Systems
, 1996
"... . In this paper, a simulation model for incorporation within a performance-oriented parallel software development environment is presented. This development environment is composed of a graphical design tool, a simulation facility, and a visualisation tool. Simulation allows a parallel program perfo ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
. In this paper, a simulation model for incorporation within a performance-oriented parallel software development environment is presented. This development environment is composed of a graphical design tool, a simulation facility, and a visualisation tool. Simulation allows a parallel program performance to be predicted and design alternatives to be compared. The target parallel system models a virtual machine composed of a cluster of workstations interconnected by a local area network. The simulation model architecture is modular and extensible which allows the re-configuration of the platform. The model description and the validation experiments which have been conducted to assess the correctness and the accuracy of the model are also presented. 1 Introduction The key obstacle to the widespread adoption of parallel computing is the difficulty in program development. Firstly, an application has to be decomposed into parallel objects (processes, or tasks) according to the computation...
Performance Prediction and Scheduling for Parallel Applications on Multi-User Clusters
, 1998
"... ..."
A Comparative Evaluation of Techniques for Studying Parallel System Performance
, 1994
"... This paper presents a comparative and qualitative survey of techniques for evaluating parallel systems. We also survey metrics that have been proposed for capturing and quantifying the details of complex parallel system interactions. Experimentation, theoretical/analytical modeling and simulation ar ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
This paper presents a comparative and qualitative survey of techniques for evaluating parallel systems. We also survey metrics that have been proposed for capturing and quantifying the details of complex parallel system interactions. Experimentation, theoretical/analytical modeling and simulation are three frequently used techniques in performance evaluation. Experimentation uses real or synthetic workloads, usually called benchmarks, to measure and analyze their performance on actual hardware. Theoretical and analytical models are used to abstract details of a parallel system, providing the view of a simplified system parameterized by a limited number of degrees of freedom that are kept tractable. Simulation and related performance monitoring/visualization tools have become extremely popular becauseof their ability to capture the dynamic nature of the interaction between applications and architectures. We first present the figures of merit that are important for any performance evaluation technique. With respect to these figures of merit, we survey the three techniques and make a qualitative comparison of their pros and cons. In particular, for each of the above techniques we discuss: representative case studies; the underlying models that are used for the workload and the architecture; the feasibility and ease of quantifying standard performance metrics from the available statistics; the accuracy/validity of the output statistics; and the cost/effort that is expended in each evaluation strategy.
Speedy: An Integrated Performance Extrapolation Tool for pC++ Programs
- In Quantitative Evaluation of Computing and Communication Systems: Proceedings of the 8th International Conference on Modelling Techniques and Tools for Computer Performance Evaluation, volume 977 of Lecture Notes in Computer Science
, 1995
"... . Performance extrapolation is the process of evaluating the performance of a parallel program in a target execution environment using performance information obtained for the same program in a different environment. Performance extrapolation techniques are suited for rapid performance tuning of par ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
. Performance extrapolation is the process of evaluating the performance of a parallel program in a target execution environment using performance information obtained for the same program in a different environment. Performance extrapolation techniques are suited for rapid performance tuning of parallel programs, particularly when the target environment is unavailable. This paper describes one such technique that was developed for data-parallel C++ programs written in the pC++ language. In pC++, the programmer can distribute a collection of objects to various processors and can have methods invoked on those objects execute in parallel. Using performance extrapolation in the development of pC++ applications allows tuning decisions to be made in advance of detailed execution measurements. The pC++ language system includes t, an integrated environment for analyzing and tuning the performance of pC++ programs. This paper presents speedy, a new addition to t, that predicts the performa...
Performance Oriented Development of SPMD Programs Based on Task Structure Specifications
- Parallel Processing: CONPAR94--VAPP VI, LNCS 854
, 1994
"... . An incremental development process for parallel SPMD programs driven by performance engineering activities is proposed. We provide a methodology and set of computerized tools to support the implementation design phase and early evaluation of skeletal program designs from a performance point of vie ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
. An incremental development process for parallel SPMD programs driven by performance engineering activities is proposed. We provide a methodology and set of computerized tools to support the implementation design phase and early evaluation of skeletal program designs from a performance point of view, such that performance critical design choices can be investigated far ahead of the full coding of the application. The technique and the use of our tools are demonstrated by developing a parallel program for Ax = b (where A is real, n\Thetan) by means of a series of Householder transformations (HA)x = Hb using p processors. Although a target architecture independent specification is the starting point of the implementation, we show how the incremental refinement process successively improves performance prediction for dedicated distributed memory target systems, until a full, performance efficient implementation is reached. As an example target platform, we study a CM-5 being programmed w...
Abstracting network characteristics and locality properties of parallel systems
- In Proceedings of the First International Symposium on High Performance Computer Architecture
, 1995
"... Abstracting features of parallel systems is a technique that has been traditionally used in theoretical and analytical models for program development and performance evaluation. In this paper, we explore the use of abstractions in execution-driven simulators in order to speed up simulation. In parti ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
Abstracting features of parallel systems is a technique that has been traditionally used in theoretical and analytical models for program development and performance evaluation. In this paper, we explore the use of abstractions in execution-driven simulators in order to speed up simulation. In particular, we evaluate abstractions for the interconnection network and locality properties of parallel systems in the context of simulating cache-coherent shared memory (CC-NUMA) multiprocessors. We use the recently proposed LogP model to abstract the network. We abstract locality by modeling a cache at each processing node in the system which is maintained coherent, without modeling the overheads associated with coherence maintenance. Such an abstraction tries to capture the true communication characteristics of the application without modeling any hardware induced artifacts. Using a suite of applications and three network topologies simulated on a novel simulation platform, we show that the latency overhead modeled by LogP is fairly accurate. On the other hand, the contention overhead can become pessimistic when the applications display sufficient communication locality. Our abstraction for data locality closely models the behavior of the actual system over the chosen range of applications. The simulation model which incorporated these abstractions was around 250-300 % faster than the simulation of the actual machine.

