Results 1 - 10
of
27
A Practical Method for Quickly Evaluating Program Optimizations
- In Proceedings of the International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2005
, 2005
"... This article aims at making iterative optimization practical and usable by speeding up the evaluation of a large range of optimizations. Instead of using a full run to evaluate a single program optimization, we take advantage of periods of stable performance, called phases. For that purpose, we prop ..."
Abstract
-
Cited by 25 (9 self)
- Add to MetaCart
This article aims at making iterative optimization practical and usable by speeding up the evaluation of a large range of optimizations. Instead of using a full run to evaluate a single program optimization, we take advantage of periods of stable performance, called phases. For that purpose, we propose a low-overhead phase detection scheme geared toward fast optimization space pruning, using code instrumentation and versioning implemented in a production compiler. Our approach is driven by simplicity and practicality. We show that a simple phase detection scheme can be sufficient for optimization space pruning. We also show it is possible to search for complex optimizations at run-time without resorting to sophisticated dynamic compilation frameworks. Beyond iterative optimization, our approach also enables one to quickly design selftuned applications.
Online performance auditing: using hot optimizations without getting burned
- In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation
, 2006
"... As hardware complexity increases and virtualization is added at more layers of the execution stack, predicting the performance impact of optimizations becomes increasingly difficult. Production compilers and virtual machines invest substantial development effort in performance tuning to achieve good ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
As hardware complexity increases and virtualization is added at more layers of the execution stack, predicting the performance impact of optimizations becomes increasingly difficult. Production compilers and virtual machines invest substantial development effort in performance tuning to achieve good performance for a range of benchmarks. Although optimizations typically perform well on average, they often have unpredictable impact on running time, sometimes degrading performance significantly. Today’s VMs perform sophisticated feedback-directed optimizations, but these techniques do not address performance degradations, and they actually make the situation worse by making the system more unpredictable. This paper presents an online framework for evaluating the effectiveness of optimizations, enabling an online system to automatically identify and correct performance anomalies that occur at runtime. This work opens the door for a fundamental shift in the way optimizations are developed and tuned for online systems, and may allow the body of work in offline empirical optimization search to be applied automatically at runtime. We present our implementation and evaluation of this system in a product Java VM.
Rapidly selecting good compiler optimizations using performance counters
- In Proceedings of the 5th Annual International Symposium on Code Generation and Optimization (CGO
, 2007
"... Applying the right compiler optimizations to a particular program can have a significant impact on program performance. Due to the non-linear interaction of compiler optimizations, however, determining the best setting is nontrivial. There have been several proposed techniques that search the space ..."
Abstract
-
Cited by 22 (10 self)
- Add to MetaCart
Applying the right compiler optimizations to a particular program can have a significant impact on program performance. Due to the non-linear interaction of compiler optimizations, however, determining the best setting is nontrivial. There have been several proposed techniques that search the space of compiler options to find good solutions; however such approaches can be expensive. This paper proposes a different approach using performance counters as a means of determining good compiler optimization settings. This is achieved by learning a model off-line which can then be used to determine good settings for any new program. We show that such an approach outperforms the state-ofthe-art and is two orders of magnitude faster on average. Furthermore, we show that our performance counter based approach outperforms techniques based on static code features. Finally, we show that such improvements are stable across varying input data sets. Using our technique we achieve a 10 % improvement over the highest optimization setting of the commercial PathScale EKOPath 2.3.1 optimizing compiler on the SPEC benchmark suite on a recent AMD Athlon 64 3700+ platform in just three evaluations. 1
Method-specific dynamic compilation using logistic regression
- of ACM SIGPLAN Conferences on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA'06
, 2006
"... Abstract Determining the best set of optimizations to apply to a programhas been a long standing problem for compiler writers. To reduce ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
Abstract Determining the best set of optimizations to apply to a programhas been a long standing problem for compiler writers. To reduce
Automatic performance model construction for the fast software exploration of new hardware designs
- In ACM International Conference on Compilers, Architecture and Synthesis for Embedded Systems
, 2006
"... ..."
Fast compiler optimisation evaluation using code-feature based performance prediction
- In CF
, 2007
"... Performance tuning is an important and time consuming task which may have to be repeated for each new application and platform. Although iterative optimisation can automate this process, it still requires many executions of different versions of the program. As execution time is frequently the limit ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Performance tuning is an important and time consuming task which may have to be repeated for each new application and platform. Although iterative optimisation can automate this process, it still requires many executions of different versions of the program. As execution time is frequently the limiting factor in the number of versions or transformed programs that can be considered, what is needed is a mechanism that can automatically predict the performance of a modified program without actually having to run it. This paper presents a new machine learning based technique to automatically predict the speedup of a modified program using a performance model based on the code features of the tuned programs. Unlike previous approaches it does not require any prior learning over a benchmark suite. Furthermore, it can be used to predict the performance of any tuning and is not restricted to a prior seen transformation space. We show that it can deliver predictions with a high correlation coefficient and can be used to dramatically reduce the cost of search.
Automatic feature generation for machine learning based optimizing compilation
- In Code Generation and Optimization (CGO
, 2009
"... Recent work has shown that machine learning can automate and in some cases outperform hand crafted compiler optimizations. Central to such an approach is that machine learning techniques typically rely upon summaries or features of the program. The quality of these features is critical to the accura ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Recent work has shown that machine learning can automate and in some cases outperform hand crafted compiler optimizations. Central to such an approach is that machine learning techniques typically rely upon summaries or features of the program. The quality of these features is critical to the accuracy of the resulting machine learned algorithm; no machine learning method will work well with poorly chosen features. However, due to the size and complexity of programs, theoretically there are an infinite number of potential features to choose from. The compiler writer now has to expend effort in choosing the best features from this space. This paper develops a novel mechanism to automatically find those features which most improve the quality of the machine learned heuristic. The feature space is described by a grammar and is then searched with genetic programming and predictive modeling. We apply this technique to loop unrolling in GCC 4.3.1 and evaluate our approach on a Pentium 6. On a benchmark suite of 57 programs, GCC’s hard-coded heuristic achieves only 3 % of the maximum performance available, while a state of the art machine learning approach with hand-coded features obtains 59%. Our feature generation technique is able to achieve 76% of the maximum available speedup, outperforming existing approaches. I.
Building a practical iterative interactive compiler
- In 1st Workshop on Statistical and Machine Learning Approaches Applied to Architectures and Compilation (SMART’07), colocated with HiPEAC 2007 conference
, 2007
"... Abstract. Current compilers fail to deliver satisfactory levels of performance on modern processors, due to rapidly evolving hardware, fixed and black-box optimization heuristics, simplistic hardware models, inability to fine-tune the application of transformations, and highly dynamic behavior of th ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
Abstract. Current compilers fail to deliver satisfactory levels of performance on modern processors, due to rapidly evolving hardware, fixed and black-box optimization heuristics, simplistic hardware models, inability to fine-tune the application of transformations, and highly dynamic behavior of the system. This analysis suggests to revisit the structure and interactions of optimizing compilers. Building on the empirical knowledge accumulated from previous iterative optimization prototypes, we propose to open the compiler, exposing its control and decision mechanisms to external optimization heuristics. We suggest a simple, practical, and non-intrusive way to modify current compilers, allowing an external tool to access and modify all compiler optimization decisions. To avoid the pitfall of revealing all the compiler intermediate representation and libraries to a point where it would rigidify the whole internals and stiffen further evolution, we choose to control the decision process itself, granting access to the only high-level features needed to effectively take a decision. This restriction is compatible with our fine-tuning and fine-grained interaction, and allows to tune
Microarchitecture sensitive empirical models for compiler optimizations
- In CGO
, 2007
"... This paper proposes the use of empirical modeling techniques for building microarchitecture sensitive models for compiler optimizations. The models we build relate program performance to settings of compiler optimization flags, associated heuristics and key microarchitectural parameters. Unlike trad ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This paper proposes the use of empirical modeling techniques for building microarchitecture sensitive models for compiler optimizations. The models we build relate program performance to settings of compiler optimization flags, associated heuristics and key microarchitectural parameters. Unlike traditional analytical modeling methods, this relationship is learned entirely from data obtained by measuring performance at a small number of carefully selected compiler/microarchitecture configurations. We evaluate three different learning techniques in this context viz. linear regression, adaptive regression splines and radial basis function networks. We use the generated models to a) predict program performance at arbitrary compiler/microarchitecture configurations, b) quantify the significance of complex interactions between optimizations and the microarchitecture, and c) efficiently search for ’optimal’ settings of optimization flags and heuristics for any given microarchitectural configuration. Our evaluation using benchmarks from the SPEC CPU2000 suits suggests that accurate models (< 5 % average error in prediction) can be generated using a reasonable number of simulations. We also find that using compiler settings prescribed by a model-based search can improve program performance by as much as 19 % (with an average of 9.5%) over highly optimized binaries. 1.
Quick and practical run-time evaluation of multiple program optimizations
- Transactions on High Performance Embedded Architectures and Compilation Techniques (HiPEAC
"... Abstract. This article aims at making iterative optimization practical and usable by speeding up the evaluation of a large range of optimizations. Instead of using a full run to evaluate a single program optimization, we take advantage of periods of stable performance, called phases. For that purpos ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract. This article aims at making iterative optimization practical and usable by speeding up the evaluation of a large range of optimizations. Instead of using a full run to evaluate a single program optimization, we take advantage of periods of stable performance, called phases. For that purpose, we propose a low-overhead phase detection scheme geared toward fast optimization space pruning, using code instrumentation and versioning implemented in a production compiler. Our approach is driven by simplicity and practicality. We show that a simple phase detection scheme can be sufficient for optimization space pruning. We also show it is possible to search for complex optimizations at run-time without resorting to sophisticated dynamic compilation frameworks. Beyond iterative optimization, our approach also enables one to quickly design selftuned applications. Considering 5 representative SpecFP2000 benchmarks, our approach speeds up iterative search for the best program optimizations by a factor of 32 to 962. Phase prediction is 99.4% accurate on average, with an overhead of only 2.6%. The resulting self-tuned implementations bring an average speed-up of 1.4. 1

