Results 1 - 10
of
12
Performance Prediction of Parallel Processing Systems: The PAMELA Methodology
- in Proc. 7th ACM Int. Conf. on Supercomputing
, 1993
"... In this paper we present a new methodology for the performance prediction of parallel programs on parallel platforms ranging from shared-memory to distributed-memory (vector) machines. The methodology comprises a procedural program and machine specification paradigm based on Pamela (PerformAnce ModE ..."
Abstract
-
Cited by 37 (17 self)
- Add to MetaCart
In this paper we present a new methodology for the performance prediction of parallel programs on parallel platforms ranging from shared-memory to distributed-memory (vector) machines. The methodology comprises a procedural program and machine specification paradigm based on Pamela (PerformAnce ModEling LAnguage), along with a performance calculus, called "serialization analysis". This calculus extends conventional parallel program analysis technology by explicitly accounting for resource contention, yet at the low evaluation cost typical for static techniques. It is shown that, where conventional techniques introduce fundamental errors, predictions from serialization analysis remain realistic. Apart from the merits of the methodology itself, this high reliability/cost ratio makes Pamela an attractive candidate for compile-time application within the performance prediction hierarchy often found in parallel programming environments. 1 Introduction The performance of a concurrent syste...
A Library-Based Program Development Environment for Parallel Image
- Processing, Proceedings of Scalable Parallel Libraries Conference, Mississippi State Univ
, 1993
"... Cloner is an image processing prototyping environ-ment that helps users design new parallel image pro-cessing algorithms for a target machine by building on and modifying existing library algorithms. In this pa-per we show the Cloner user interface, discuss how guided access is accomplished, and pro ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Cloner is an image processing prototyping environ-ment that helps users design new parallel image pro-cessing algorithms for a target machine by building on and modifying existing library algorithms. In this pa-per we show the Cloner user interface, discuss how guided access is accomplished, and provide an example of how Cloner supports the rapid development of high performance codes. The example demonstrates how menu options and queries are used to guide a user to select an appropriate 2-dimensional FFT algorithm based on image size and available machine resources. 1
Implementation Of Parallel Image Processing Algorithms In The Cloner Environment
, 1994
"... Cloner is a prototyping environment for computer vision and image processing (CVIP) algorithms and tasks. It is being designed to allow users to take advantage of the computing power provided by parallel processing systems without requiring an extensive understanding of the underlying architecture. ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Cloner is a prototyping environment for computer vision and image processing (CVIP) algorithms and tasks. It is being designed to allow users to take advantage of the computing power provided by parallel processing systems without requiring an extensive understanding of the underlying architecture. In this paper, we focus on the use of Cloner to achieve high-performance implementations for a class of lowlevel CVIP algorithms. INTRODUCTION Parallel computers have demonstrated remarkable potential for achieving high performance at a reasonable hardware cost for many applications. In particular, computer vision and image processing (CVIP) algorithms have two attributes that make them ideal for parallel implementation. First, they are usually computationally intensive, making parallel processing an attractive approach [18]. Second, CVIP algorithms typically operate on large data sets, either on pixels in low-level image processing or on potentially large model databases in high-level vis...
Compiling Performance Models from Parallel Programs
- In Proceedings of the 8th ACM International Conference on Supercomputing
, 1994
"... A technique is described to automatically compile performance models in the course of program translation. The performance models are fully symbolic in order to preserve as much diagnostic information as possible. Although compiled statically, the models account for the effects of resource contentio ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
A technique is described to automatically compile performance models in the course of program translation. The performance models are fully symbolic in order to preserve as much diagnostic information as possible. Although compiled statically, the models account for the effects of resource contention, due to the introduction of a novel algorithm within the symbolic compilation scheme. It is shown that the compilation approach fundamentally outperforms traditional static estimation procedures in terms of precision at a negligible increase in cost. This claim is illustrated by a case study of an LU factorization algorithm on a multiprocessor. 1 Introduction Low-cost, compile-time performance prediction provides essential, early feedback to enable program and machine parameter optimization by both the user and the compiler. In this paper we present a technique to automatically compile a symbolic performance model which accurately predicts the execution time of a parallel program given a...
The Role of Models, Software Tools, and Applications in High Performance Computing
, 1995
"... In this paper we identify and discuss technical issues we consider crucial to the HPCC program. The focus is on the usefulness of scalable parallel computers for National Challenge problems. We identify three interrelated aspects of usefulness: performance, programmability, and the role of an applic ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this paper we identify and discuss technical issues we consider crucial to the HPCC program. The focus is on the usefulness of scalable parallel computers for National Challenge problems. We identify three interrelated aspects of usefulness: performance, programmability, and the role of an applicationdriven design philosophy. We discuss the importance of algorithm design and computational model development and advocate the design of libraries and software environments to bridge the gap between algorithm designer and application programmer. Finally, we consider the role of applications for solving National Challenge problems. This work was supported by the Advanced Research Projects Agency under contract DABT6392 -C-0022. The content of the information does not necessarily reflect the position or policy of the United States Government and no official endorsement should be inferred. 1 Introduction During the last several years significant progress has been made on the Grand Challe...
Performance Estimation for Embedded Systems
, 2000
"... In this document we propose a symbolic performance modeling technique to be used as the basis of the JOSES cost estimator. The approach is inspired by the need for highly parametric cost models in the initial stages in parallel program design, where absolute prediction accuracy is of less priority t ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In this document we propose a symbolic performance modeling technique to be used as the basis of the JOSES cost estimator. The approach is inspired by the need for highly parametric cost models in the initial stages in parallel program design, where absolute prediction accuracy is of less priority than solution cost, and where symbolic feedback on the effects of user mapping decisions and machine parameters is of primary concern. As illustrated by the case study, the symbolic approach provides good feedback on the effects of partitioning choices as well as the influence of computation and communication parameters on application performance.
Simulation Of Static And Dynamic Task Scheduling On Multiprocessor Systems
, 1994
"... CONTENTS Page 1 INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 2 THE PROBLEM OF TASK SCHEDULING ON PARALLEL PROCESSING SYSTEMS : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 2.1 Defining Task Scheduling on Parallel Processing Systems : ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
CONTENTS Page 1 INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 2 THE PROBLEM OF TASK SCHEDULING ON PARALLEL PROCESSING SYSTEMS : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 2.1 Defining Task Scheduling on Parallel Processing Systems : : : : : : : : : 4 2.2 Review of Research on the Problem of Task Scheduling : : : : : : : : : : 7 3 PROGRAM MODEL: THE HIERARCHICAL TASK GRAPH : : : : : : : : : 13 3.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13 3.2 Directed Acyclic Graphs : : : : : : : : : : : : : : : : : : : : : : : : : : : 13 3.3 Task Graph Representation of Computer Programs : : : : : : : : : : : : 14 3.4 Hierarchical Representation of Task Graphs : : : : : : : : : : : : : : : : 17 3.5 Hierarchical Task Graph Generator : : :
Parallelizing Matrix Chain Products
, 1997
"... The problem of finding an optimal product sequence for sequential multiplication of matrices (the matrix chain ordering problem, MCOP) is well-known and has been studied for a long time. In this paper, we consider the problem of finding an optimal product schedule for evaluating a chain of matrix pr ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The problem of finding an optimal product sequence for sequential multiplication of matrices (the matrix chain ordering problem, MCOP) is well-known and has been studied for a long time. In this paper, we consider the problem of finding an optimal product schedule for evaluating a chain of matrix products on a parallel computer (the matrix chain scheduling problem, MCSP). The difference between MCSP and MCOP is that MCOP considers a product sequence for single processor systems and MCSP considers a sequence of concurrent matrix products for parallel systems. The approach of parallelizing each matrix product after finding an optimal product sequence for single processor systems does not always guarantee a minimal evaluation time since each parallelized matrix product may use processors inefficiently. We introduce a processor scheduling algorithm for MCSP which attempts to minimize the evaluation time of a chain of matrix products on a parallel computer, even at the expense of a slight i...
Exploiting Parallelism In Setl Programs
, 1993
"... CONTENTS CHAPTER PAGE 1 INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 2 BACKGROUND : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 2.1 Languages, Programs, Machines, and Executions : : : : : : : : : : : : : : : 4 2.2 Our Shared-M ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
CONTENTS CHAPTER PAGE 1 INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 2 BACKGROUND : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 2.1 Languages, Programs, Machines, and Executions : : : : : : : : : : : : : : : 4 2.2 Our Shared-Memory Multiprocessor: the Alliant fx2800 : : : : : : : : : : : 5 2.2.1 The fxc C Compiler : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2.2.2 Concurrency : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 2.2.3 Synchronization : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 2.3 ISETL : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 3 OPTIMIZATIONS : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13 3.1 Parallel Optimizations : : : : : : : : : : : : : :

