Results 1 - 10
of
66
ABSTRACT COLE: Compiler Optimization Level Exploration
"... Modern compilers implement a large number of optimizations which all interact in complex ways, and which all have a different impact on code quality, compilation time, code size, energy consumption, etc. For this reason, compilers typically provide a limited number of standard optimization levels, s ..."
Abstract
-
Cited by 40 (2 self)
- Add to MetaCart
(Show Context)
Modern compilers implement a large number of optimizations which all interact in complex ways, and which all have a different impact on code quality, compilation time, code size, energy consumption, etc. For this reason, compilers typically provide a limited number of standard optimization levels, such as-O1,-O2,-O3 and-Os, that combine various optimizations providing a number of trade-offs between multiple objective functions (such as code quality, compilation time and code size). The construction of these optimization levels, i.e., choosing which optimizations to activate at each level, is a manual process typically done using high-level heuristics based on the compiler developer’s experience. This paper proposes COLE, Compiler Optimization Level Exploration, a framework for automatically finding Pareto optimal optimization levels through multi-objective evolutionary searching. Our experimental results using GCC and the SPEC CPU benchmarks show that the automatic construction of optimization levels is feasible in practice, and in addition, yields better optimization levels than GCC’s manually derived (-Os,-O1,-O2 and-O3) optimization levels, as well as the optimization levels obtained through random sampling. We also demonstrate that COLE can be used to gain insight into the effectiveness of compiler optimizations as well as to better understand a benchmark’s sensitivity to compiler optimizations.
MILEPOST GCC: machine learning based research compiler
- In GCC
, 2008
"... Tuning hardwired compiler optimizations for rapidly evolving hardware makes porting an optimizing compiler for each new platform extremely challenging. Our radical approach is to develop a modular, extensible, self-optimizing compiler that automatically learns the best optimization heuristics based ..."
Abstract
-
Cited by 34 (12 self)
- Add to MetaCart
Tuning hardwired compiler optimizations for rapidly evolving hardware makes porting an optimizing compiler for each new platform extremely challenging. Our radical approach is to develop a modular, extensible, self-optimizing compiler that automatically learns the best optimization heuristics based on the behavior of the platform. In this paper we describe MILEPOST 1 GCC, a machine-learning-based compiler that automatically adjusts its optimization heuristics to improve the execution time, code size, or compilation time of specific programs on different architectures. Our preliminary experimental results show that it is possible to considerably reduce execution time of the MiBench benchmark suite on a range of platforms entirely automatically. 1
Mapping parallelism to multi-cores: a machine learning based approach
- In Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP ’09
, 2009
"... Abstract The efficient mapping of program parallelism to multi-core processors is highly dependent on the underlying architecture. This paper proposes a portable and automatic compiler-based approach to mapping such parallelism using machine learning. It develops two predictors: a data sensitive an ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
Abstract The efficient mapping of program parallelism to multi-core processors is highly dependent on the underlying architecture. This paper proposes a portable and automatic compiler-based approach to mapping such parallelism using machine learning. It develops two predictors: a data sensitive and a data insensitive predictor to select the best mapping for parallel programs. They predict the number of threads and the scheduling policy for any given program using a model learnt off-line. By using low-cost profiling runs, they predict the mapping for a new unseen program across multiple input data sets. We evaluate our approach by selecting parallelism mapping configurations for OpenMP programs on two representative but different multi-core platforms (the Intel Xeon and the Cell processors). Performance of our technique is stable across programs and architectures. On average, it delivers above 96% performance of the maximum available on both platforms. It achieve, on average, a 37% (up to 17.5 times) performance improvement over the OpenMP runtime default scheme on the Cell platform. Compared to two recent prediction models, our predictors achieve better performance with a significant lower profiling cost.
Collective optimization
- In Proceedings of the International Conference on High Performance Embedded Architectures & Compilers (HiPEAC
, 2009
"... Abstract. Iterative compilation is an efficient approach to optimize programs on rapidly evolving hardware, but it is still only scarcely used in practice due to a necessity to gather a large number of runs often with the same data set and on the same environment in order to test many different opti ..."
Abstract
-
Cited by 25 (12 self)
- Add to MetaCart
(Show Context)
Abstract. Iterative compilation is an efficient approach to optimize programs on rapidly evolving hardware, but it is still only scarcely used in practice due to a necessity to gather a large number of runs often with the same data set and on the same environment in order to test many different optimizations and to select the most appropriate ones. Naturally, in many cases, users cannot afford a training phase, will run each data set once, develop new programs which are not yet known, and may regularly change the environment the programs are run on. In this article, we propose to overcome that practical obstacle using Collective Optimization, where the task of optimizing a program leverages the experience of many other users, rather than being performed in isolation, and often redundantly, by each user. Collective optimization is an unobtrusive approach, where performance information obtained after each run is sent back to a central database, which is then queried for optimizations suggestions, and the program is then recompiled accordingly. We show that it is possible to learn across data sets, programs and architectures in non-dynamic environments using static function cloning and run-time adaptation without even a reference run to compute speedups over the baseline optimization. We also show that it is possible to simultaneously learn and improve performance, since there are no longer two separate training and test phases, as in most studies. We demonstrate that extensively relying on competition among pairs of optimizations (program reaction to optimizations) provides a robust and efficient method for capturing the impact of optimizations, and for reusing this knowledge across data sets, programs and environments. We implemented our approach in GCC and will publicly disseminate it in the near future. 1
Predictive Modeling in a Polyhedral Optimization Space
- INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO'11)
, 2011
"... Significant advances in compiler optimization have been made in recent years, enabling many transformations such as tiling, fusion, parallelization and vectorization on imperfectly nested loops. Nevertheless, the problem of finding the best combination of loop transformations remains a major challen ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
(Show Context)
Significant advances in compiler optimization have been made in recent years, enabling many transformations such as tiling, fusion, parallelization and vectorization on imperfectly nested loops. Nevertheless, the problem of finding the best combination of loop transformations remains a major challenge. Polyhedral models for compiler optimization have demonstrated strong potential for enhancing program performance, in particular for compute-intensive applications. But existing static cost models to optimize polyhedral transformations have significant limitations, and iterative compilation has become a very promising alternative to these models to find the most effective transformations. But since the number of polyhedral optimization alternatives can be enormous, it is often impractical to iterate over a significant fraction of the entire space of polyhedrally transformed variants. Recent research has focused on iterating over this search space either with manually-constructed heuristics or with automatic but very expensive search algorithms (e.g., genetic algorithms) that can eventually find good points in the polyhedral space. In this paper, we propose the use of machine learning to address the problem of selecting the best polyhedral optimizations. We show that these models can quickly find high-performance program variants in the polyhedral space, without resorting to extensive empirical search. We introduce models that take as input a characterization of a program based on its dynamic behavior, and predict the performance of aggressive high-level polyhedral transformations that includes tiling, parallelization and vectorization. We allow for a minimal empirical search on the target machine, discovering on average 83 % of the searchspace-optimal combinations in at most 5 runs. Our end-to-end framework is validated using numerous benchmarks on two multi-core platforms.
Evaluating iterative optimization across 1000 data sets
- In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI
, 2010
"... While iterative optimization has become a popular compiler optimization approach, it is based on a premise which has never been truly evaluated: that it is possible to learn the best compiler optimizations across data sets. Up to now, most iterative optimization studies find the best optimizations t ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
(Show Context)
While iterative optimization has become a popular compiler optimization approach, it is based on a premise which has never been truly evaluated: that it is possible to learn the best compiler optimizations across data sets. Up to now, most iterative optimization studies find the best optimizations through repeated runs on the same data set. Only a handful of studies have attempted to exercise iterative optimization on a few tens of data sets. In this paper, we truly put iterative compilation to the test for the first time by evaluating its effectiveness across a large number of data sets. We therefore compose KDataSets, a data set suite with 1000 data sets for 32 programs, which we release to the public. We characterize the diversity of KDataSets, and subsequently use it to evaluate iterative optimization. We demonstrate that it is possible to derive a robust iterative optimization strategy across data sets: for all 32 programs, we find that there exists at least one combination of compiler optimizations that achieves 86 % or more of the best possible speedup across all data sets using Intel’s ICC (83 % for GNU’s GCC). This optimal combination is program-specific and yields speedups up to 1.71 on ICC and 2.23 on GCC over the highest optimization level (-fast and-O3, respectively). This finding makes the task of optimizing programs across data sets much easier than previously anticipated, and it paves the way for the practical and reliable usage of iterative optimization. Finally, we derive pre-shipping and post-shipping optimization strategies for software vendors.
Milepost GCC: machine learning enabled self-tuning compiler
, 2009
"... Contact: ..."
(Show Context)
Scenario based optimization: A framework for statically enabling online optimizations
- In CGO ’09: Proceedings of the 2009 International Symposium on Code Generation and Optimization
, 2009
"... Abstract—Online optimization allows the continuous restructuring and adaptation of an executing application using live information about its execution environment. The further advancement of performance monitoring hardware presents new opportunities for online optimization techniques. While managed ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
(Show Context)
Abstract—Online optimization allows the continuous restructuring and adaptation of an executing application using live information about its execution environment. The further advancement of performance monitoring hardware presents new opportunities for online optimization techniques. While managed runtime environments lend themselves nicely to online and dynamic optimizations, such techniques remain difficult to successfully achieve at the binary level. Binary level Dynamic optimizers introduce virtual layers which at the binary level produces overhead that is often prohibitive. Another challenge comes from the lack of source level information that can significantly aid optimization. In this paper we present a new static/dynamic collaborative approach to online optimization that takes advantage of the strength of static optimization and the adaptive nature of dynamic optimization techniques. We call this hybrid optimization framework Scenario Based Optimization (SBO). Statically we multiversion and specialize functions for different dynamic scenarios that can be identified by monitoring microarchitectural events. Using these events to infer the current scenario, we dynamically reroute execution to the relevant code tuned to that scenario. We have implemented our static SBO infrastructure in GCC 4.3.1 and designed our Dynamic Introspection Engine using the Perfmon2 infrastructure. To demonstrate the effectiveness of our Scenario Based Optimization framework we have designed an SBO optimization we call the Online Aiding and Abetting of Aggressive Optimizations (OAAAO). Using SPEC2006 this optimization shows a speedup of 7 % to 10 % for a number of benchmarks and in our best case (h264ref) a 17 % speedup over native execution compiled at the-O2 optimization level. I.
Finding representative sets of optimizations for adaptive multiversioning applications
- In 3rd Workshop on Statistical and Machine Learning Approaches Applied to Architectures and Compilation (SMART’09), colocated with HiPEAC’09 conference
, 2009
"... Abstract. Iterative compilation is a widely adopted technique to optimize programs for different constraints such as performance, code size and power consumption in rapidly evolving hardware and software environments. However, in case of statically compiled programs, it is often restricted to optimi ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
(Show Context)
Abstract. Iterative compilation is a widely adopted technique to optimize programs for different constraints such as performance, code size and power consumption in rapidly evolving hardware and software environments. However, in case of statically compiled programs, it is often restricted to optimizations for a specific dataset and may not be applicable to applications that exhibit different run-time behavior across program phases, multiple datasets or when executed in heterogeneous, reconfigurable and virtual environments. Several frameworks have been recently introduced to tackle these problems and enable run-time optimization and adaptation for statically compiled programs based on static function multiversioning and monitoring of online program behavior. In this article, we present a novel technique to select a minimal set of representative optimization variants (function versions) for such frameworks while avoiding performance loss across available datasets and code-size explosion. We developed a novel mapping mechanism using popular decision tree or rule induction based machine learning techniques to rapidly select best code versions at run-time based on dataset features and minimize selection overhead. These techniques enable creation of self-tuning static binaries or libraries adaptable to changing behavior and environments at run-time using staged compilation that do not require complex recompilation frameworks while effectively outperforming traditional singleversion non-adaptable code. 1
Computer-aided design of highperformance algorithms
, 2008
"... High-performance algorithms play an important role in many areas of computer science and are core components of many software systems used in real-world applications. Traditionally, the creation of these algorithms requires considerable expertise and experience, often in combination with a substanti ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
High-performance algorithms play an important role in many areas of computer science and are core components of many software systems used in real-world applications. Traditionally, the creation of these algorithms requires considerable expertise and experience, often in combination with a substantial amount of trial and error. Here, we outline a new approach to the process of designing high-performance algorithms that is based on the use of automated procedures for exploring potentially very large spaces of candidate designs. We contrast this computer-aided design approach with the traditional approach and discuss why it can be expected to yield better performing, yet simpler algorithms. Finally, we sketch out the high-level design of a software environment that supports our new design approach. Existing work on algorithm portfolios, algorithm selection, algorithm configuration and parameter tuning, but also on general methods for discrete and continuous optimisation methods fits naturally into our design approach and can be integrated into the proposed software environment. 1