Results 1 - 10
of
18
Rapidly selecting good compiler optimizations using performance counters
- In Proceedings of the 5th Annual International Symposium on Code Generation and Optimization (CGO
, 2007
"... Applying the right compiler optimizations to a particular program can have a significant impact on program performance. Due to the non-linear interaction of compiler optimizations, however, determining the best setting is nontrivial. There have been several proposed techniques that search the space ..."
Abstract
-
Cited by 22 (10 self)
- Add to MetaCart
Applying the right compiler optimizations to a particular program can have a significant impact on program performance. Due to the non-linear interaction of compiler optimizations, however, determining the best setting is nontrivial. There have been several proposed techniques that search the space of compiler options to find good solutions; however such approaches can be expensive. This paper proposes a different approach using performance counters as a means of determining good compiler optimization settings. This is achieved by learning a model off-line which can then be used to determine good settings for any new program. We show that such an approach outperforms the state-ofthe-art and is two orders of magnitude faster on average. Furthermore, we show that our performance counter based approach outperforms techniques based on static code features. Finally, we show that such improvements are stable across varying input data sets. Using our technique we achieve a 10 % improvement over the highest optimization setting of the commercial PathScale EKOPath 2.3.1 optimizing compiler on the SPEC benchmark suite on a recent AMD Athlon 64 3700+ platform in just three evaluations. 1
Method-specific dynamic compilation using logistic regression
- of ACM SIGPLAN Conferences on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA'06
, 2006
"... Abstract Determining the best set of optimizations to apply to a programhas been a long standing problem for compiler writers. To reduce ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
Abstract Determining the best set of optimizations to apply to a programhas been a long standing problem for compiler writers. To reduce
Automatic performance model construction for the fast software exploration of new hardware designs
- In ACM International Conference on Compilers, Architecture and Synthesis for Embedded Systems
, 2006
"... ..."
Building a practical iterative interactive compiler
- In 1st Workshop on Statistical and Machine Learning Approaches Applied to Architectures and Compilation (SMART’07), colocated with HiPEAC 2007 conference
, 2007
"... Abstract. Current compilers fail to deliver satisfactory levels of performance on modern processors, due to rapidly evolving hardware, fixed and black-box optimization heuristics, simplistic hardware models, inability to fine-tune the application of transformations, and highly dynamic behavior of th ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
Abstract. Current compilers fail to deliver satisfactory levels of performance on modern processors, due to rapidly evolving hardware, fixed and black-box optimization heuristics, simplistic hardware models, inability to fine-tune the application of transformations, and highly dynamic behavior of the system. This analysis suggests to revisit the structure and interactions of optimizing compilers. Building on the empirical knowledge accumulated from previous iterative optimization prototypes, we propose to open the compiler, exposing its control and decision mechanisms to external optimization heuristics. We suggest a simple, practical, and non-intrusive way to modify current compilers, allowing an external tool to access and modify all compiler optimization decisions. To avoid the pitfall of revealing all the compiler intermediate representation and libraries to a point where it would rigidify the whole internals and stiffen further evolution, we choose to control the decision process itself, granting access to the only high-level features needed to effectively take a decision. This restriction is compatible with our fine-tuning and fine-grained interaction, and allows to tune
Microarchitecture sensitive empirical models for compiler optimizations
- In CGO
, 2007
"... This paper proposes the use of empirical modeling techniques for building microarchitecture sensitive models for compiler optimizations. The models we build relate program performance to settings of compiler optimization flags, associated heuristics and key microarchitectural parameters. Unlike trad ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This paper proposes the use of empirical modeling techniques for building microarchitecture sensitive models for compiler optimizations. The models we build relate program performance to settings of compiler optimization flags, associated heuristics and key microarchitectural parameters. Unlike traditional analytical modeling methods, this relationship is learned entirely from data obtained by measuring performance at a small number of carefully selected compiler/microarchitecture configurations. We evaluate three different learning techniques in this context viz. linear regression, adaptive regression splines and radial basis function networks. We use the generated models to a) predict program performance at arbitrary compiler/microarchitecture configurations, b) quantify the significance of complex interactions between optimizations and the microarchitecture, and c) efficiently search for ’optimal’ settings of optimization flags and heuristics for any given microarchitectural configuration. Our evaluation using benchmarks from the SPEC CPU2000 suits suggests that accurate models (< 5 % average error in prediction) can be generated using a reasonable number of simulations. We also find that using compiler settings prescribed by a model-based search can improve program performance by as much as 19 % (with an average of 9.5%) over highly optimized binaries. 1.
Evaluating heuristic optimization phase order search algorithms
- In Proceedings of the International Symposium on Code Generation and Optimization (CGO’07
, 2007
"... Program-specific or function-specific optimization phase sequences are universally accepted to achieve better overall performance than any fixed optimization phase ordering. A number of heuristic phase order space search algorithms have been devised to find customized phase orderings achieving high ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Program-specific or function-specific optimization phase sequences are universally accepted to achieve better overall performance than any fixed optimization phase ordering. A number of heuristic phase order space search algorithms have been devised to find customized phase orderings achieving high performance for each function. However, to make this approach of iterative compilation more widely accepted and deployed in mainstream compilers, it is essential to modify existing algorithms, or develop new ones that find near-optimal solutions quickly. As a step in this direction, in this paper we attempt to identify and understand the important properties of some commonly employed heuristic search methods by using information collected during an exhaustive exploration of the phase order search space. We compare the performance obtained by each algorithm with all others, as well as with the optimal phase ordering performance. Finally, we show how we can use the features of the phase order space to improve existing algorithms as well as devise new, and better performing search algorithms. 1.
In search of near-optimal optimization phase orderings
- in Proceedings of the 2006 ACM Conference on Languages, Compilers, and Tools for Embedded Systems
, 2006
"... Phase ordering is a long standing challenge for traditional optimizing compilers. Varying the order of applying optimization phases to a program can produce different code, with potentially significant performance variation amongst them. A key insight to addressing the phase ordering problem is that ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Phase ordering is a long standing challenge for traditional optimizing compilers. Varying the order of applying optimization phases to a program can produce different code, with potentially significant performance variation amongst them. A key insight to addressing the phase ordering problem is that many different optimization sequences produce the same code. In an earlier study, we used this observation to restate the phase ordering problem to concentrate on finding all distinct function instances that can be produced due to different phase orderings, instead of attempting to generate code for all possible optimization sequences. Using a novel search algorithm we were able to show that it is possible to exhaustively enumerate the set of all possible function instances that can be produced by different phase orderings in our compiler for most of the functions in our benchmark suite [1]. Finding the optimal function instance within this set for almost any dynamic measure of performance still appears impractical since that would involve execution/simulation of all generated function instances. To find the dynamically optimal function instance we exploit the observation that the enumeration space for a function typically contains a very small number of distinct control flow paths. We simulate only one function instance from each group of function instances having the identical control flow, and use that information to estimate the dynamic performance of the remaining functions in that group. We further show that the estimated dynamic frequency counts obtained by using our method correlate extremely well to simulated processor cycle counts. Thus, by using our measure of dynamic frequencies to identify a small number of the best performing function instances we can often find the optimal phase ordering for a function within a reasonable amount of time. Finally, we perform a case study to evaluate how adept our genetic algorithm is for finding optimal phase orderings within our compiler, and demonstrate how the algorithm can be improved.
Techniques and tools for dynamic optimization
- Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International
, 2006
"... Traditional code optimizers have produced significant performance improvements over the past forty years. While promising avenues of research still exist, traditional static and profiling techniques have reached the point of diminishing returns. The main problem is that these approaches have only a ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Traditional code optimizers have produced significant performance improvements over the past forty years. While promising avenues of research still exist, traditional static and profiling techniques have reached the point of diminishing returns. The main problem is that these approaches have only a limited view of the program and have difficulty taking advantage of the actual run-time behavior of a program. We are addressing this problem through the development of a dynamic optimization system suited for aggressive optimization—using the full power of the most beneficial optimizations. We have designed our optimizer to operate using a software dynamic translation (SDT) execution system. Difficult challenges in this research include reducing SDT overhead and determining what optimizations to apply and where in the code to apply them. Another challenge is having the necessary tools to ensure the reliability of software that is dynamically optimized. In this paper, we describe our efforts in reducing overhead in SDT and efficient techniques for instrumenting the application code. We also describe our approach to determine what and where an optimization should be applied. We discuss other fundamental issues in developing a dynamic optimizer and finally present a basic debugger for SDT systems. 1.
Self-Configuring Applications for Heterogeneous Systems: Program Composition Using Cognitive Techniques
- Proceedings of the IEEE
, 2008
"... Abstract—This paper describes several challenges facing programmers of future edge computing systems, the complex and diverse multi- and many-core devices that will soon exemplify commodity mainstream systems. To call attention to programming challenges ahead, this paper focuses on the most complex ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract—This paper describes several challenges facing programmers of future edge computing systems, the complex and diverse multi- and many-core devices that will soon exemplify commodity mainstream systems. To call attention to programming challenges ahead, this paper focuses on the most complex of such architectures: integrated, power-conserving systems, inherently parallel and heterogeneous, with distributed address spaces. When programming such complex systems, several new concerns arise, such as computation partitioning across functional units, data movement and synchronization, managing a diversity of programming models for different devices, and reusing existing legacy and library software. We observe that many of these challenges are also faced in programming applications for large-scale, heterogeneous distributed computing environments, and solutions used in practice as well as future research directions in distributed computing can be adapted to edge computing environments. Further, optimization decisions are inherently complex due to large search spaces of possible solutions and the difficulty of predicting performance on increasingly complex architectures. Cognitive techniques are well-suited for managing systems of such complexity. We discuss how recent trends of using cognitive techniques for code mapping and optimization support this point. We describe how cognitive techniques could provide a fundamentally new programming paradigm for complex heterogeneous systems, where programmers design selfconfiguring applications and the system automates optimization decisions and manages the allocation of heterogeneous resources to codes. Index Terms—Optimizing compilers, Learning systems, Computer architectures, Distributed computing, Multi-core
Practical Run-time Adaptation with Procedure Cloning to Enable Continuous Collective Compilation
"... Iterative feedback-directed optimization is now a popular technique to obtain better performance and code size improvements for statically compiled programs over the default settings in a compiler. The offline evaluation of multiple optimization strategies for a given program is a potentially costly ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Iterative feedback-directed optimization is now a popular technique to obtain better performance and code size improvements for statically compiled programs over the default settings in a compiler. The offline evaluation of multiple optimization strategies for a given program is a potentially costly operation. The number of iterations typically grows with the complexity of the program transformation search space, and with the number of input datasets used for performance assessment. In addition, as the behavior of a program can vary considerably across different datasets, it is often preferable to generate different optimization versions, covering the full spectrum of the program’s representative datasets. Continuous and collective optimization are targeted at these issues. Continuous optimization searches for the best program transformation at run-time, taking advantages of the phase behavior of programs to evaluate multiple optimization versions within a single run, and dynamically adapting to changing execution contexts. Collective optimization interleaves optimization iterations with program executions

