Results 1  10
of
10
Supporting Timing Analysis by Automatic Bounding of Loop Iterations
 Journal of RealTime Systems
, 2000
"... . Static timing analyzers, which are used to analyze realtime systems, need to know the minimum and maximum number of iterations associated with each loop in a realtime program so accurate timing predictions can be obtained. This paper describes three complementary methods to support timing analy ..."
Abstract

Cited by 34 (5 self)
 Add to MetaCart
. Static timing analyzers, which are used to analyze realtime systems, need to know the minimum and maximum number of iterations associated with each loop in a realtime program so accurate timing predictions can be obtained. This paper describes three complementary methods to support timing analysis by bounding the number of loop iterations. First, an algorithm is presented that determines the minimum and maximum number of iterations of loops with multiple exits. Even when the number of iterations cannot be exactly determined, it is desirable to know the lower and upper iteration bounds. Second, when the number of iterations is dependent on unknown values of variables, the user is asked to provide bounds for these variables. These bounds are used to determine the minimum and maximum number of iterations. Specifying the values of variables is less error prone than specifying the number of loop iterations directly. Finally, a method is given to tightly predict the execution time of in...
Performance Improvement Through Overhead Analysis: A Case Study in Molecular Dynamics
 in Molecular Dynamics, Proc. 11 th ACM International Conference on Supercomputing, ACM
, 1997
"... A method is presented for incremental development of high performance parallel programs, using an application from molecular dynamics as a case study. The method uses the technique of overhead analysis, which aims to explain experimental observations using models of execution behaviour, as a means o ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
A method is presented for incremental development of high performance parallel programs, using an application from molecular dynamics as a case study. The method uses the technique of overhead analysis, which aims to explain experimental observations using models of execution behaviour, as a means of determining successive steps in program development. The case study illustrates how this technique can be applied to the complexities of realworld parallel computations. 1 Introduction Parallel computers have so far failed to fulfil their promise of providing cheap high performance computing. In part, this is due to the high cost of software development required to find an acceptable implementation of an application which runs efficiently on a given parallel system: this has been termed the `best' implementation problem [4]. Execution time is, in general, unpredictable and, often, decisions made early in the development process have profound effects on the ultimate performance of an ap...
Automatic Utilization of Constraints for Timing Analysis
 Florida State University
, 1999
"... Users of realtime systems are not only interested in obtaining correct computations from their programs, but timely responses as well. Responses that are given past a deadline is not acceptable. Arealtime system is often comprised of a set of tasks that are statically scheduled. Therefore, it is n ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Users of realtime systems are not only interested in obtaining correct computations from their programs, but timely responses as well. Responses that are given past a deadline is not acceptable. Arealtime system is often comprised of a set of tasks that are statically scheduled. Therefore, it is necessary to determine a program’s execution
A CompileTime Partitioning Strategy for NonRectangular Loop Nests
 In Proceedings of the 11th International Parallel Processing Symposium (Geneva
, 1997
"... This paper presents a compiletime scheme for partitioning nonrectangular loop nests which consist of inner loops whose bounds depend on the index of the outermost, parallel loop. The minimisation of load imbalance, on the basis of symbolic cost estimates, is considered the main objective; however, ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
This paper presents a compiletime scheme for partitioning nonrectangular loop nests which consist of inner loops whose bounds depend on the index of the outermost, parallel loop. The minimisation of load imbalance, on the basis of symbolic cost estimates, is considered the main objective; however, options which may increase other sources of overhead are avoided. Experimental results on a virtual shared memory computer are also presented. 1.
Parametric Timing Estimation With NewtonGregory Formulae ∗
"... To determine safe and tight worstcase execution time (WCET) estimates of scientific and multimedia codes that spent most of the execution time on executing loop iterations, efficient and accurate loop iteration count estimation methods are required. To support dynamic scheduling decisions based on ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
To determine safe and tight worstcase execution time (WCET) estimates of scientific and multimedia codes that spent most of the execution time on executing loop iterations, efficient and accurate loop iteration count estimation methods are required. To support dynamic scheduling decisions based on WCET estimations, an effective loop iteration count estimation method should generate parametric formulae that can be evaluated at runtime. Therefore, the loop iteration count estimation methods utilized for WCET estimation must be effective in analyzing loops with symbolic bounds, nonrectangular loops, zerotrip loops, loops with multiple critical paths, and loops with nonunit strides. In this paper we present a novel approach to parametric WCET estimation to handle loops with both affine and nonaffine loop bounds in an efficent manner using a formulation based on NewtonGregory interpolating polynomials. 1 Introduction and Related Work Static worstcase execution of time (WCET) estimates are used in realtime system scheduling and dynamic
AGeneral Approach for Tight Timing Predictions of NonRectangular Loops
 WIP Proceedings of the IEEE RealTime Technology and Applications Symposium
, 1999
"... Static timing analyzers need to know the number of iterations associated with each loop in a realtime program so accurate timing predictions can be obtained. The number of iterations of nonrectangular loops vary due to dependencies on counter variables of outer loops. These loops have long present ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Static timing analyzers need to know the number of iterations associated with each loop in a realtime program so accurate timing predictions can be obtained. The number of iterations of nonrectangular loops vary due to dependencies on counter variables of outer loops. These loops have long presented a problem for timing analyzers since the resulting timing predictions are typically quite loose. This paper presents a general and efficient method for obtaining tight timing predictions of such loops. The total number of iterations executed by an inner loop inside a loop nest can be expressed in terms of summations. Equations representing such loops can be efficiently solved given that certain restrictions are met. We outline an approach for formulating the summations representing the total number of iterations of a loop, a method for solving the equation containing the summations, and a technique for integrating this method into an existing timing analyzer. 1.
Compiler Support for Parallel Program Performance Prediction
, 2001
"... This talk illustrates how parallelizing compiler technology can be used to greatly simplify and, in many cases, to fully automate the process of applying diverse modeling techniques to model a parallel application. First, we will present the key features of a program workload representation that h ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This talk illustrates how parallelizing compiler technology can be used to greatly simplify and, in many cases, to fully automate the process of applying diverse modeling techniques to model a parallel application. First, we will present the key features of a program workload representation that has been designed to support both compiler synthesis as well as detailed performance prediction. Then, we will present compiler techniques that we used to derive a concise form of this workload description automatically for a given program. The focus of these techniques is HPF programs compiled to MPI using the Rice dHPF compiler  issues related to explicit messagepassing codes will also be discussed.
A Geometric Approach for Partitioning NDimensional NonRectangular Iteration Spaces
"... Abstract. Parallel loops account for the greatest percentage of program parallelism. The degree to which parallelism can be exploited and the amount of overhead involved during parallel execution of a nested loop directly depend on partitioning, i.e., the way the different iterations of a parallel l ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. Parallel loops account for the greatest percentage of program parallelism. The degree to which parallelism can be exploited and the amount of overhead involved during parallel execution of a nested loop directly depend on partitioning, i.e., the way the different iterations of a parallel loop are distributed across different processors. Thus, partitioning of parallel loops is of key importance for high performance and efficient use of multiprocessor systems. Although a significant amount of work has been done in partitioning and scheduling of rectangular iteration spaces, the problem of partitioning of nonrectangular iteration spaces e.g. triangular, trapezoidal iteration spaces has not been given enough attention so far. In this paper, we present a geometric approach for partitioning Ndimensional nonrectangular iteration spaces for optimizing performance on parallel processor systems. Speedup measurements for kernels (loop nests) of linear algebra packages are presented. 1
In conjunction with: International Symposium on Code Generation and Optimization (CGO) Sponsored by: ACM SIGMICRO and ACM SIGPLANACKNOWLEDGMENTS
, 2008
"... The organizers would like to thank all the people who made this year's Workshop for Optimization of DPS and Embedded Systems possible: ➔ The authors who submitted a paper. We had a total of 21 submissions this year. Out of these we were able to select 9 good quality papers. ➔ The program committee a ..."
Abstract
 Add to MetaCart
The organizers would like to thank all the people who made this year's Workshop for Optimization of DPS and Embedded Systems possible: ➔ The authors who submitted a paper. We had a total of 21 submissions this year. Out of these we were able to select 9 good quality papers. ➔ The program committee as well as the external reviewers for their excellent reviews. Although we had quite a number of papers, we were still able to get at least four reviews per paper. ➔ Rodric Rabbah and Martien de Jong for accepting to give the keynote and an invited talk respectively. ➔ The CGO organizers, especially Chandra Krintz for taking care of all the logistics and for hosting the workshop. ➔ ACM for sponsoring the event.
A Novel Approach for Partitioning Iteration Spaces with Variable Densities
"... Efficient partitioning of parallel loops plays a critical role in high performance and efficient use of multiprocessor systems. Although a significant amount of work has been done in partitioning and scheduling of loops with rectangular iteration spaces, the problem of partitioning nonrectangular i ..."
Abstract
 Add to MetaCart
Efficient partitioning of parallel loops plays a critical role in high performance and efficient use of multiprocessor systems. Although a significant amount of work has been done in partitioning and scheduling of loops with rectangular iteration spaces, the problem of partitioning nonrectangular iteration spaces — e.g., triangular, trapezoidal iteration spaces — with variable densities has not been addressed so far to the best of our knowledge. In this paper, we present a mathematical model for partitioning Ndimensional nonrectangular iteration spaces with variable densities. We present a unimodular loop transformation and a geometric approach for partitioning an iteration space along an axis corresponding to the outermost loop across a given number of processors to achieve nearoptimal performance, i.e., to achieve nearoptimal load balance across different processors. We present a case study to illustrate the effectiveness of our approach. Categories and Subject Descriptors D.1.3 [Software]: Programming Techniques—parallel programming;