Results 1 - 10
of
13
Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation
, 2000
"... Loop tiling and unrolling are two important program transformations to exploit locality and expose instruction level parallelism, respectively. However, these transformations are not independent and each can adversely affect the goal of the other. Furthermore, the best combination will vary drama ..."
Abstract
-
Cited by 78 (9 self)
- Add to MetaCart
Loop tiling and unrolling are two important program transformations to exploit locality and expose instruction level parallelism, respectively. However, these transformations are not independent and each can adversely affect the goal of the other. Furthermore, the best combination will vary dramatically from one processor to the next. In this paper, we therefore address the problem of how to select tile sizes and unroll factors simultaneously. We approach this problem in an architecturally adaptive manner by means of iterative compilation, where we generate many versions of a program and decide upon the best by actually executing them and measuring their execution time. We evaluate several iterative strategies based on genetic algorithms, random sampling and simulated annealing. We compare the levels of optimization obtained by iterative compilation to several well-known static techniques and show that we outperform each of them on a range of benchmarks across a variety of ar...
The effect of cache models on iterative compilation for combined tiling and unrolling
, 2004
"... ..."
Iterative Compilation and Performance Prediction for Numerical Applications
, 2004
"... As the current rate of improvement in processor performance far exceeds the rate of memory performance, memory latency is the dominant overhead in many performance critical applications. In many cases, automatic compiler-based approaches to improving memory performance are limited and programmers fr ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
As the current rate of improvement in processor performance far exceeds the rate of memory performance, memory latency is the dominant overhead in many performance critical applications. In many cases, automatic compiler-based approaches to improving memory performance are limited and programmers frequently resort to manual optimisation techniques. However, this process is tedious and time-consuming. Furthermore, a diverse range of a rapidly evolving hardware makes the optimisation process even more complex. It is often hard to predict the potential benefits from different optimisations and there are no simple criteria to stop optimisations i.e. when optimal memory performance has been achieved or sufficiently approached. This thesis presents a platform independent optimisation approach for numerical applications based on iterative feedback-directed program restructuring using a new reasonably fast and accurate performance prediction technique for guiding optimisations. New strategies for searching the optimisation space, by means of
Automating Selective Dynamic Compilation
, 2002
"... Run-time specialization of programs can potentially improve execution time by exploiting the (semi-)invariance of values. Several research prototypes have been developed that enable the user to apply run-time specialization using annotations in the source code. While they were a great improvement o ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Run-time specialization of programs can potentially improve execution time by exploiting the (semi-)invariance of values. Several research prototypes have been developed that enable the user to apply run-time specialization using annotations in the source code. While they were a great improvement over previous manual systems, in reality their use has been limited because writing annotations that achieve good program speedups is a challenging task for humans, requiring a deep understanding of the program's characteristics and of the applicable run-time optimizations and their effects on the underlying computer architecture. In this
Fast and Accurate Method for Determining a Lower Bound on Execution Time
, 2004
"... In performance critical applications, memory latency is frequently the dominant overhead. In many cases, automatic compiler-based optimizations to improve memory performance are limited and programmers frequently resort to manual optimization techniques. However, this process is tedious and time-con ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
In performance critical applications, memory latency is frequently the dominant overhead. In many cases, automatic compiler-based optimizations to improve memory performance are limited and programmers frequently resort to manual optimization techniques. However, this process is tedious and time-consuming. Furthermore, as the potential benefit from optimization is unknown there is no way to judge the amount of effort worth expending, nor when the process can stop, i.e. when optimal memory performance has been achieved or sufficiently approached. Architecture simulators can provide such information but designing an accurate model of an existing architecture is difficult and simulation times are excessively long. In this article, we propose and implement a technique that is both fast and reasonably accurate for estimating a lower bound on execution time for scientific applications. This technique has been tested on a wide range of programs from the SPEC benchmark suite and two commercial applications, where it has been used to guide a manual optimization process and iterative compilation. We compare our technique with that of a simulator with an ideal memory behaviour and demonstrate that our technique provides comparable information on memory performance and yet is over two orders of magnitude faster. We further show that our technique is considerably more accurate than hardware counters. KEY WORDS: memory performance; optimization tool; memory latency analysis
Incorporating Cache Models in Iterative Compilation for Combined Tiling and Unrolling
, 2000
"... In this paper we further investigate the notion of iterative compilation, in which the problem of determining the optimal program transformation is approached by generating many versions of the source program and by searching for the best by actually executing these versions on the target hardwar ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In this paper we further investigate the notion of iterative compilation, in which the problem of determining the optimal program transformation is approached by generating many versions of the source program and by searching for the best by actually executing these versions on the target hardware to measure their execution time. In previous work we have shown that this approach can obtain high levels of optimization, outperforming existing static techniques signicantly. In this paper we address how to incorporate static models in the search procedure in order to reduce the number of program executions. We focus on cache models since exploitation of the memory hierarchy is very important in obtaining execution speed. First, we show that by using 1 these models alone and no proling, far lower levels of optimization are obtained than by using proling information. Second, we show that including accurate cache models can reduce the number of program executions by 50% and sti...
Using Iterative Compilation to Reduce Energy Consumption
- In Proc. of ASCI 2004
, 2004
"... The rapid range of architectural changes in processors puts compiler technology under an enormous stress. This is emphasized by new demands added to compilers, like reducing static code size, energy consumption or power dissipation. Iterative compilation has been proposed as an approach to find the ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The rapid range of architectural changes in processors puts compiler technology under an enormous stress. This is emphasized by new demands added to compilers, like reducing static code size, energy consumption or power dissipation. Iterative compilation has been proposed as an approach to find the best sequence of optimizations (such as loop transformations) for an application, in order to improve its performance. In this paper, we study both the effect of loop transformations on energy consumption as well as the possibility of using the iterative compilation method in order to find the best compiled code for energy and for the combination of energy and performance. From analyzed benchmarks, we conclude that performance improvement is coming together with decreasing energy consumption. Iterative compilation seems therefore a promising approach to the compilation for energy problem, but a larger set of loop transformations and their combinations needs to be studied for a definitive conclusion.
Microarchitectural and Compile-Time Optimizations for Performance Improvement of Procedural and Object-Oriented Languages
- Northeastern University
, 2000
"... Applications, and their associated programming models, have had a profound influence on computer architecture evolution. Programs developed in procedural languages (e.g., C and fortran) have traditionally served this role. The popularity of the Object Oriented Programming (OOP) paradigm has been gro ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Applications, and their associated programming models, have had a profound influence on computer architecture evolution. Programs developed in procedural languages (e.g., C and fortran) have traditionally served this role. The popularity of the Object Oriented Programming (OOP) paradigm has been growing rapidly, especially through the use of languages such as C++ and Java. OOP languages support the concepts of data encapsulation, polymorphism and inheritance, which promise to increase code reuse and result in more reliable code. Applications developed in object oriented languages exhibit different execution behavior compared to their procedural language counterparts. We focus our work on two primary differences encountered as we move to applications developed in OO languages: i) the increased number of procedures and their higher calling frequencies, and ii) the increased use of indirect branches. Equipped with a set of C and C++ benchmark applications, we propose microarchitectural m...
Evaluating the Relationship Between the Usefulness and Accuracy of Profiles
- of Profiles,” in Proc. Workshop on Duplicating, Deconstructing, and Debunking
, 2003
"... The relationship between how accurately a profile predicts future program behavior and how useful it is for profile directed optimization is not straightforward. We gathered extensive data on the results of profile-driven optimization using two different optimization systems (cc [1] and alto [4]) an ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The relationship between how accurately a profile predicts future program behavior and how useful it is for profile directed optimization is not straightforward. We gathered extensive data on the results of profile-driven optimization using two different optimization systems (cc [1] and alto [4]) and selected benchmarks and benchmark runs from the SPEC95 and SPEC2000 suites. Instead of following the traditional SPEC guidelines of training only with the designated "train" profiles and gathering performance statistics with the designated "reference" benchmark runs, we evaluate nearly all possible combinations of training and evaluation runs. We summarize the usefulness of basic block profiles in this wider context, evaluate the reliability of the results that we derived from using a range of evaluation runs, and evaluate the apparently uncontroversial claim that more accurate basic block profiles are connected to better profile-driven optimization performance. We find that while in the alto optimization context, there is a significant correlation between more accurate profiles and more useful profiles, no such correlation existed in the cc system.
Overcoming the Challenges to Feedback-Directed Optimization
"... Feedback-directed optimization (FDO) is a general term used to describe any technique that alters a program ~ execution based on tendencies observed in its present or past runs. This paper reviews the current state of affairs in FDO and discusses the challenges inhibiting further acceptance of these ..."
Abstract
- Add to MetaCart
Feedback-directed optimization (FDO) is a general term used to describe any technique that alters a program ~ execution based on tendencies observed in its present or past runs. This paper reviews the current state of affairs in FDO and discusses the challenges inhibiting further acceptance of these techniques. It also argues that current trends in hardware and software technology have resulted in an execution environment where immutable executables and traditional static optimizations are no longer sufficient. It explains how we can improve the effectiveness of our optimizers by increasing our understanding of program behavior, and it provides examples of temporal behavior that we can (or could in the future) exploit during optimization. 1

