Results 1  10
of
18
Supporting Timing Analysis by Automatic Bounding of Loop Iterations
 Journal of RealTime Systems
, 2000
"... . Static timing analyzers, which are used to analyze realtime systems, need to know the minimum and maximum number of iterations associated with each loop in a realtime program so accurate timing predictions can be obtained. This paper describes three complementary methods to support timing analy ..."
Abstract

Cited by 36 (6 self)
 Add to MetaCart
. Static timing analyzers, which are used to analyze realtime systems, need to know the minimum and maximum number of iterations associated with each loop in a realtime program so accurate timing predictions can be obtained. This paper describes three complementary methods to support timing analysis by bounding the number of loop iterations. First, an algorithm is presented that determines the minimum and maximum number of iterations of loops with multiple exits. Even when the number of iterations cannot be exactly determined, it is desirable to know the lower and upper iteration bounds. Second, when the number of iterations is dependent on unknown values of variables, the user is asked to provide bounds for these variables. These bounds are used to determine the minimum and maximum number of iterations. Specifying the values of variables is less error prone than specifying the number of loop iterations directly. Finally, a method is given to tightly predict the execution time of in...
Performance Improvement Through Overhead Analysis: A Case Study in Molecular Dynamics
 in Molecular Dynamics, Proc. 11 th ACM International Conference on Supercomputing, ACM
, 1997
"... A method is presented for incremental development of high performance parallel programs, using an application from molecular dynamics as a case study. The method uses the technique of overhead analysis, which aims to explain experimental observations using models of execution behaviour, as a means o ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
A method is presented for incremental development of high performance parallel programs, using an application from molecular dynamics as a case study. The method uses the technique of overhead analysis, which aims to explain experimental observations using models of execution behaviour, as a means of determining successive steps in program development. The case study illustrates how this technique can be applied to the complexities of realworld parallel computations. 1 Introduction Parallel computers have so far failed to fulfil their promise of providing cheap high performance computing. In part, this is due to the high cost of software development required to find an acceptable implementation of an application which runs efficiently on a given parallel system: this has been termed the `best' implementation problem [4]. Execution time is, in general, unpredictable and, often, decisions made early in the development process have profound effects on the ultimate performance of an ap...
Symbolic Evaluation of Sums for Parallelising Compilers
 In IMACS World Congress on Scientific Computation, Modelling and Applied Mathematics. Wissenshaft & Technik
, 1997
"... The evaluation of sums over polynomials when symbolic, i.e., unknown, variables are involved in the bounds of the sums is considered. Such sums typically occur when analysing, in computer programs, the properties of loops which can be executed in parallel. Existing packages for symbolic mathematical ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
The evaluation of sums over polynomials when symbolic, i.e., unknown, variables are involved in the bounds of the sums is considered. Such sums typically occur when analysing, in computer programs, the properties of loops which can be executed in parallel. Existing packages for symbolic mathematical computations are not capable of handling these sums properly. The problems which may arise are identified and an algorithm to overcome them is presented.
Compiletime minimisation of load imbalance in loop nests
 In Proceedings of the International Conference on Supercomputing
, 1997
"... Parallelising compilers typically need some performance estimation capability in order to evaluate the tradeoffs between different transformations. Such a capability requires sophisticated techniques for analysing the program and providing quantitative estimates to the compiler’s internal cost m ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Parallelising compilers typically need some performance estimation capability in order to evaluate the tradeoffs between different transformations. Such a capability requires sophisticated techniques for analysing the program and providing quantitative estimates to the compiler’s internal cost model. Making use of techniques for symbolic evaluation of the number of iterations in a loop, this paper describes a novel compiletime scheme for partitioning loop nests in such a way that load imbalance is minimised. The scheme is based on a property of the class of canonical loop nests, namely that, upon partitioning into essentially equalsized partitions along the index of the outermost loop, these can be combined in such a way as to achieve a balanced distribution of the computational load in the loop nest asawhole. A technique for handling noncanonical loop nests is
Automatic Utilization of Constraints for Timing Analysis
 Florida State University
, 1999
"... Users of realtime systems are not only interested in obtaining correct computations from their programs, but timely responses as well. Responses that are given past a deadline is not acceptable. Arealtime system is often comprised of a set of tasks that are statically scheduled. Therefore, it is n ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Users of realtime systems are not only interested in obtaining correct computations from their programs, but timely responses as well. Responses that are given past a deadline is not acceptable. Arealtime system is often comprised of a set of tasks that are statically scheduled. Therefore, it is necessary to determine a program’s execution
Parametric Timing Estimation With NewtonGregory Formulae ∗
"... To determine safe and tight worstcase execution time (WCET) estimates of scientific and multimedia codes that spent most of the execution time on executing loop iterations, efficient and accurate loop iteration count estimation methods are required. To support dynamic scheduling decisions based on ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
To determine safe and tight worstcase execution time (WCET) estimates of scientific and multimedia codes that spent most of the execution time on executing loop iterations, efficient and accurate loop iteration count estimation methods are required. To support dynamic scheduling decisions based on WCET estimations, an effective loop iteration count estimation method should generate parametric formulae that can be evaluated at runtime. Therefore, the loop iteration count estimation methods utilized for WCET estimation must be effective in analyzing loops with symbolic bounds, nonrectangular loops, zerotrip loops, loops with multiple critical paths, and loops with nonunit strides. In this paper we present a novel approach to parametric WCET estimation to handle loops with both affine and nonaffine loop bounds in an efficent manner using a formulation based on NewtonGregory interpolating polynomials. 1 Introduction and Related Work Static worstcase execution of time (WCET) estimates are used in realtime system scheduling and dynamic
A CompileTime Partitioning Strategy for NonRectangular Loop Nests
 In Proceedings of the 11th International Parallel Processing Symposium (Geneva
, 1997
"... This paper presents a compiletime scheme for partitioning nonrectangular loop nests which consist of inner loops whose bounds depend on the index of the outermost, parallel loop. The minimisation of load imbalance, on the basis of symbolic cost estimates, is considered the main objective; however, ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
This paper presents a compiletime scheme for partitioning nonrectangular loop nests which consist of inner loops whose bounds depend on the index of the outermost, parallel loop. The minimisation of load imbalance, on the basis of symbolic cost estimates, is considered the main objective; however, options which may increase other sources of overhead are avoided. Experimental results on a virtual shared memory computer are also presented. 1.
Parallelising Serial Code: A Comparison Of Three HighPerformance Parallel Programming Methods
, 1997
"... ..."
A Geometric Approach for Partitioning NDimensional NonRectangular Iteration Spaces
"... Abstract. Parallel loops account for the greatest percentage of program parallelism. The degree to which parallelism can be exploited and the amount of overhead involved during parallel execution of a nested loop directly depend on partitioning, i.e., the way the different iterations of a parallel l ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. Parallel loops account for the greatest percentage of program parallelism. The degree to which parallelism can be exploited and the amount of overhead involved during parallel execution of a nested loop directly depend on partitioning, i.e., the way the different iterations of a parallel loop are distributed across different processors. Thus, partitioning of parallel loops is of key importance for high performance and efficient use of multiprocessor systems. Although a significant amount of work has been done in partitioning and scheduling of rectangular iteration spaces, the problem of partitioning of nonrectangular iteration spaces e.g. triangular, trapezoidal iteration spaces has not been given enough attention so far. In this paper, we present a geometric approach for partitioning Ndimensional nonrectangular iteration spaces for optimizing performance on parallel processor systems. Speedup measurements for kernels (loop nests) of linear algebra packages are presented. 1
Compiler Support for Parallel Program Performance Prediction
, 2001
"... This talk illustrates how parallelizing compiler technology can be used to greatly simplify and, in many cases, to fully automate the process of applying diverse modeling techniques to model a parallel application. First, we will present the key features of a program workload representation that h ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This talk illustrates how parallelizing compiler technology can be used to greatly simplify and, in many cases, to fully automate the process of applying diverse modeling techniques to model a parallel application. First, we will present the key features of a program workload representation that has been designed to support both compiler synthesis as well as detailed performance prediction. Then, we will present compiler techniques that we used to derive a concise form of this workload description automatically for a given program. The focus of these techniques is HPF programs compiled to MPI using the Rice dHPF compiler  issues related to explicit messagepassing codes will also be discussed.