• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

On The Quest For Perfect Load Balance In Loop-Based Parallel Computations (1998)

by Rizos Sakellariou
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 10

Supporting Timing Analysis by Automatic Bounding of Loop Iterations

by Christopher Healy, Mikael Sjödin, Viresh Rustagi, David Whalley, Robert Van Engelen - Journal of Real-Time Systems , 2000
"... . Static timing analyzers, which are used to analyze real-time systems, need to know the minimum and maximum number of iterations associated with each loop in a real-time program so accurate timing predictions can be obtained. This paper describes three complementary methods to support timing analy ..."
Abstract - Cited by 32 (5 self) - Add to MetaCart
. Static timing analyzers, which are used to analyze real-time systems, need to know the minimum and maximum number of iterations associated with each loop in a real-time program so accurate timing predictions can be obtained. This paper describes three complementary methods to support timing analysis by bounding the number of loop iterations. First, an algorithm is presented that determines the minimum and maximum number of iterations of loops with multiple exits. Even when the number of iterations cannot be exactly determined, it is desirable to know the lower and upper iteration bounds. Second, when the number of iterations is dependent on unknown values of variables, the user is asked to provide bounds for these variables. These bounds are used to determine the minimum and maximum number of iterations. Specifying the values of variables is less error prone than specifying the number of loop iterations directly. Finally, a method is given to tightly predict the execution time of in...

Performance Improvement Through Overhead Analysis: A Case Study in Molecular Dynamics

by Graham D. Riley, J. Mark Bull, John R. Gurd - in Molecular Dynamics, Proc. 11 th ACM International Conference on Supercomputing, ACM , 1997
"... A method is presented for incremental development of high performance parallel programs, using an application from molecular dynamics as a case study. The method uses the technique of overhead analysis, which aims to explain experimental observations using models of execution behaviour, as a means o ..."
Abstract - Cited by 8 (3 self) - Add to MetaCart
A method is presented for incremental development of high performance parallel programs, using an application from molecular dynamics as a case study. The method uses the technique of overhead analysis, which aims to explain experimental observations using models of execution behaviour, as a means of determining successive steps in program development. The case study illustrates how this technique can be applied to the complexities of real-world parallel computations. 1 Introduction Parallel computers have so far failed to fulfil their promise of providing cheap high performance computing. In part, this is due to the high cost of software development required to find an acceptable implementation of an application which runs efficiently on a given parallel system: this has been termed the `best' implementation problem [4]. Execution time is, in general, unpredictable and, often, decisions made early in the development process have profound effects on the ultimate performance of an ap...

Automatic Utilization of Constraints for Timing Analysis

by Christopher A. Healy, David B. Whalley - Florida State University , 1999
"... Users of real-time systems are not only interested in obtaining correct computations from their programs, but timely responses as well. Responses that are given past a deadline is not acceptable. Areal-time system is often comprised of a set of tasks that are statically scheduled. Therefore, it is n ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
Users of real-time systems are not only interested in obtaining correct computations from their programs, but timely responses as well. Responses that are given past a deadline is not acceptable. Areal-time system is often comprised of a set of tasks that are statically scheduled. Therefore, it is necessary to determine a program’s execution

A Compile-Time Partitioning Strategy for Non-Rectangular Loop Nests

by Rizos Sakellariou - In Proceedings of the 11th International Parallel Processing Symposium (Geneva , 1997
"... This paper presents a compile-time scheme for partitioning non-rectangular loop nests which consist of inner loops whose bounds depend on the index of the outermost, parallel loop. The minimisation of load imbalance, on the basis of symbolic cost estimates, is considered the main objective; however, ..."
Abstract - Cited by 3 (2 self) - Add to MetaCart
This paper presents a compile-time scheme for partitioning non-rectangular loop nests which consist of inner loops whose bounds depend on the index of the outermost, parallel loop. The minimisation of load imbalance, on the basis of symbolic cost estimates, is considered the main objective; however, options which may increase other sources of overhead are avoided. Experimental results on a virtual shared memory computer are also presented. 1.

Parametric Timing Estimation With Newton-Gregory Formulae ∗

by Robert Van Engelen, Kyle Gallivan, Burt Walsh
"... To determine safe and tight worst-case execution time (WCET) estimates of scientific and multimedia codes that spent most of the execution time on executing loop iterations, efficient and accurate loop iteration count estimation methods are required. To support dynamic scheduling decisions based on ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
To determine safe and tight worst-case execution time (WCET) estimates of scientific and multimedia codes that spent most of the execution time on executing loop iterations, efficient and accurate loop iteration count estimation methods are required. To support dynamic scheduling decisions based on WCET estimations, an effective loop iteration count estimation method should generate parametric formulae that can be evaluated at runtime. Therefore, the loop iteration count estimation methods utilized for WCET estimation must be effective in analyzing loops with symbolic bounds, non-rectangular loops, zero-trip loops, loops with multiple critical paths, and loops with non-unit strides. In this paper we present a novel approach to parametric WCET estimation to handle loops with both affine and nonaffine loop bounds in an efficent manner using a formulation based on Newton-Gregory interpolating polynomials. 1 Introduction and Related Work Static worst-case execution of time (WCET) estimates are used in real-time system scheduling and dynamic

AGeneral Approach for Tight Timing Predictions of Non-Rectangular Loops

by Christopher Healy, Robert Van Engelen, David Whalley - WIP Proceedings of the IEEE Real-Time Technology and Applications Symposium , 1999
"... Static timing analyzers need to know the number of iterations associated with each loop in a real-time program so accurate timing predictions can be obtained. The number of iterations of non-rectangular loops vary due to dependencies on counter variables of outer loops. These loops have long present ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Static timing analyzers need to know the number of iterations associated with each loop in a real-time program so accurate timing predictions can be obtained. The number of iterations of non-rectangular loops vary due to dependencies on counter variables of outer loops. These loops have long presented a problem for timing analyzers since the resulting timing predictions are typically quite loose. This paper presents a general and efficient method for obtaining tight timing predictions of such loops. The total number of iterations executed by an inner loop inside a loop nest can be expressed in terms of summations. Equations representing such loops can be efficiently solved given that certain restrictions are met. We outline an approach for formulating the summations representing the total number of iterations of a loop, a method for solving the equation containing the summations, and a technique for integrating this method into an existing timing analyzer. 1.

Compiler Support for Parallel Program Performance Prediction

by Rizos Sakellariou, Vikram Adve , 2001
"... This talk illustrates how parallelizing compiler technology can be used to greatly simplify and, in many cases, to fully automate the process of applying diverse modeling techniques to model a parallel application. First, we will present the key features of a program workload representation that h ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
This talk illustrates how parallelizing compiler technology can be used to greatly simplify and, in many cases, to fully automate the process of applying diverse modeling techniques to model a parallel application. First, we will present the key features of a program workload representation that has been designed to support both compiler synthesis as well as detailed performance prediction. Then, we will present compiler techniques that we used to derive a concise form of this workload description automatically for a given program. The focus of these techniques is HPF programs compiled to MPI using the Rice dHPF compiler | issues related to explicit message-passing codes will also be discussed.

A Geometric Approach for Partitioning N-Dimensional Non-Rectangular Iteration Spaces

by Arun Kejariwal, Ru Nicolau, Constantine D. Polychronopoulos
"... Abstract. Parallel loops account for the greatest percentage of program parallelism. The degree to which parallelism can be exploited and the amount of overhead involved during parallel execution of a nested loop directly depend on partitioning, i.e., the way the different iterations of a parallel l ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract. Parallel loops account for the greatest percentage of program parallelism. The degree to which parallelism can be exploited and the amount of overhead involved during parallel execution of a nested loop directly depend on partitioning, i.e., the way the different iterations of a parallel loop are distributed across different processors. Thus, partitioning of parallel loops is of key importance for high performance and efficient use of multiprocessor systems. Although a significant amount of work has been done in partitioning and scheduling of rectangular iteration spaces, the problem of partitioning of non-rectangular iteration spaces- e.g. triangular, trapezoidal iteration spaces- has not been given enough attention so far. In this paper, we present a geometric approach for partitioning N-dimensional non-rectangular iteration spaces for optimizing performance on parallel processor systems. Speedup measurements for kernels (loop nests) of linear algebra packages are presented. 1

In conjunction with: International Symposium on Code Generation and Optimization (CGO) Sponsored by: ACM SIGMICRO and ACM SIGPLANACKNOWLEDGMENTS

by Tom Vander Aa, Francisco Barat, John Cavazos, Nitin Chandrachoodan, Henk Corporaal, Heiko Falk, Murali Jayapala, Tor Jeremiassen, Ossi Kalevo, Hee-seok Kim, Hong-seok Kim, Yoshinori Takeuchi, Gary Tyson , 2008
"... The organizers would like to thank all the people who made this year's Workshop for Optimization of DPS and Embedded Systems possible: ➔ The authors who submitted a paper. We had a total of 21 submissions this year. Out of these we were able to select 9 good quality papers. ➔ The program committee a ..."
Abstract - Add to MetaCart
The organizers would like to thank all the people who made this year's Workshop for Optimization of DPS and Embedded Systems possible: ➔ The authors who submitted a paper. We had a total of 21 submissions this year. Out of these we were able to select 9 good quality papers. ➔ The program committee as well as the external reviewers for their excellent reviews. Although we had quite a number of papers, we were still able to get at least four reviews per paper. ➔ Rodric Rabbah and Martien de Jong for accepting to give the keynote and an invited talk respectively. ➔ The CGO organizers, especially Chandra Krintz for taking care of all the logistics and for hosting the workshop. ➔ ACM for sponsoring the event.

A Novel Approach for Partitioning Iteration Spaces with Variable Densities

by Arun Kejariwal Alex
"... Efficient partitioning of parallel loops plays a critical role in high performance and efficient use of multiprocessor systems. Although a significant amount of work has been done in partitioning and scheduling of loops with rectangular iteration spaces, the problem of partitioning non-rectangular i ..."
Abstract - Add to MetaCart
Efficient partitioning of parallel loops plays a critical role in high performance and efficient use of multiprocessor systems. Although a significant amount of work has been done in partitioning and scheduling of loops with rectangular iteration spaces, the problem of partitioning non-rectangular iteration spaces — e.g., triangular, trapezoidal iteration spaces — with variable densities has not been addressed so far to the best of our knowledge. In this paper, we present a mathematical model for partitioning N-dimensional non-rectangular iteration spaces with variable densities. We present a unimodular loop transformation and a geometric approach for partitioning an iteration space along an axis corresponding to the outermost loop across a given number of processors to achieve near-optimal performance, i.e., to achieve nearoptimal load balance across different processors. We present a case study to illustrate the effectiveness of our approach. Categories and Subject Descriptors D.1.3 [Software]: Programming Techniques—parallel programming;
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University