Results 1 - 10
of
38
A scalable auto-tuning framework for compiler optimization
- In Proceedings of the 23rd IEEE International Parallel And Distributed Computing Symposium (IPDPS
"... We describe a scalable and general-purpose framework for auto-tuning compiler-generated code. We combine Active Harmony’s parallel search backend with the CHiLL compiler transformation framework to generate in parallel a set of alternative implementations of computation kernels and automatically sel ..."
Abstract
-
Cited by 49 (10 self)
- Add to MetaCart
(Show Context)
We describe a scalable and general-purpose framework for auto-tuning compiler-generated code. We combine Active Harmony’s parallel search backend with the CHiLL compiler transformation framework to generate in parallel a set of alternative implementations of computation kernels and automatically select the one with the best-performing implementation. The resulting system achieves performance of compiler-generated code comparable to the fully automated version of the ATLAS library for the tested kernels. Performance for various kernels is 1.4 to 3.6 times faster than the native Intel compiler without search. Our search algorithm simultaneously evaluates different combinations of compiler optimizations and converges to solutions in only a few tens of search-steps. 1
The Polyhedral Model Is More Widely Applicable Than You Think
"... Abstract. The polyhedral model is a powerful framework for automatic optimization and parallelization. It is based on an algebraic representation of programs, allowing to construct and search for complex sequences of optimizations. This model is now mature and reaches production compilers. The main ..."
Abstract
-
Cited by 33 (9 self)
- Add to MetaCart
(Show Context)
Abstract. The polyhedral model is a powerful framework for automatic optimization and parallelization. It is based on an algebraic representation of programs, allowing to construct and search for complex sequences of optimizations. This model is now mature and reaches production compilers. The main limitation of the polyhedral model is known to be its restriction to statically predictable, loop-based program parts. This paper removes this limitation, allowing to operate on general data-dependent control-flow. We embed control and exit predicates as first-class citizens of the algebraic representation, from program analysis to code generation. Complementing previous (partial) attempts in this direction, our work concentrates on extending the code generation step and does not compromise the expressiveness of the model. We present experimental evidence that our extension is relevant for program optimization and parallelization, showing performance improvements on benchmarks that were thought to be out of reach of the polyhedral model. 1
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework
"... Abstract—Today’s multi-core era places significant demands on an optimizing compiler, which must parallelize programs, exploit memory hierarchy, and leverage the ever-increasing SIMD capabilities of modern processors. Existing model-based heuristics for performance optimization used in compilers are ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
(Show Context)
Abstract—Today’s multi-core era places significant demands on an optimizing compiler, which must parallelize programs, exploit memory hierarchy, and leverage the ever-increasing SIMD capabilities of modern processors. Existing model-based heuristics for performance optimization used in compilers are limited in their ability to identify profitable parallelism/locality trade-offs and usually lead to sub-optimal performance. To address this problem, we distinguish optimizations for which effective model-based heuristics and profitability estimates exist, from optimizations that require empirical search to achieve good performance in a portable fashion. We have developed a completely automatic framework in which we focus the empirical search on the set of valid possibilities to perform fusion/code motion, and rely on model-based mechanisms to perform tiling, vectorization and parallelization on the transformed program. We demonstrate the effectiveness of this approach in terms of strong performance improvements on a single target as well as performance portability across different target architectures. I.
N.: Loop transformations: convexity, pruning and optimization
- SIGPLAN Not
, 2011
"... Abstract High-level loop transformations are a key instrument in mapping computational kernels to effectively exploit resources in modern processor architectures. However, determining appropriate compositions of loop transformations to achieve this remains a significantly challenging task; current ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
(Show Context)
Abstract High-level loop transformations are a key instrument in mapping computational kernels to effectively exploit resources in modern processor architectures. However, determining appropriate compositions of loop transformations to achieve this remains a significantly challenging task; current compilers may achieve significantly lower performance than hand-optimized programs. To address this fundamental challenge, we first present a convex characterization of all distinct, semantics-preserving, multidimensional affine transformations. We then bring together algebraic, algorithmic, and performance analysis results to design a tractable optimization algorithm over this highly expressive space. The framework has been implemented and validated experimentally on a representative set of benchmarks run on state-of-the-art multi-core platforms.
Predictive Modeling in a Polyhedral Optimization Space
- INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO'11)
, 2011
"... Significant advances in compiler optimization have been made in recent years, enabling many transformations such as tiling, fusion, parallelization and vectorization on imperfectly nested loops. Nevertheless, the problem of finding the best combination of loop transformations remains a major challen ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
(Show Context)
Significant advances in compiler optimization have been made in recent years, enabling many transformations such as tiling, fusion, parallelization and vectorization on imperfectly nested loops. Nevertheless, the problem of finding the best combination of loop transformations remains a major challenge. Polyhedral models for compiler optimization have demonstrated strong potential for enhancing program performance, in particular for compute-intensive applications. But existing static cost models to optimize polyhedral transformations have significant limitations, and iterative compilation has become a very promising alternative to these models to find the most effective transformations. But since the number of polyhedral optimization alternatives can be enormous, it is often impractical to iterate over a significant fraction of the entire space of polyhedrally transformed variants. Recent research has focused on iterating over this search space either with manually-constructed heuristics or with automatic but very expensive search algorithms (e.g., genetic algorithms) that can eventually find good points in the polyhedral space. In this paper, we propose the use of machine learning to address the problem of selecting the best polyhedral optimizations. We show that these models can quickly find high-performance program variants in the polyhedral space, without resorting to extensive empirical search. We introduce models that take as input a characterization of a program based on its dynamic behavior, and predict the performance of aggressive high-level polyhedral transformations that includes tiling, parallelization and vectorization. We allow for a minimal empirical search on the target machine, discovering on average 83 % of the searchspace-optimal combinations in at most 5 runs. Our end-to-end framework is validated using numerous benchmarks on two multi-core platforms.
Loop Transformation Recipes for Code Generation and Auto-Tuning
"... Abstract. In this paper, we describe transformation recipes, which provide a high-level interface to the code transformation and code generation capability of a compiler. These recipes can be generated by compiler decision algorithms or savvy software developers. This interface is part of an auto-tu ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
Abstract. In this paper, we describe transformation recipes, which provide a high-level interface to the code transformation and code generation capability of a compiler. These recipes can be generated by compiler decision algorithms or savvy software developers. This interface is part of an auto-tuning framework that explores a set of different implementations of the same computation and automatically selects the best-performing implementation. Along with the original computation, a transformation recipe specifies a range of implementations of the computation resulting from composing a set of high-level code transformations. In our system, an underlying polyhedral framework coupled with transformation algorithms takes this set of transformations, composes them and automatically generates correct code. We first describe an abstract interface for transformation recipes, which we propose to facilitate interoperability with other transformation frameworks. We then focus on the specific transformation recipe interface used in our compiler and present performance results on its application to kernel and library tuning and tuning of key computations in high-end applications. We also show how this framework can be used to generate and auto-tune parallel OpenMP or CUDA code from a high-level specification. 1
Graphite two years after: First lessons learned from real-world polyhedral compilation
- In GCC Research Opportunities Workshop (GROW’10
, 2010
"... Abstract. Modern compilers are responsible for adapting the semantics of source programs into a form that makes efficient use of a highly complex, hetero-geneous machine. This adaptation amounts to solve an optimization problem in a huge and unstructured search space, while predicting the performanc ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
Abstract. Modern compilers are responsible for adapting the semantics of source programs into a form that makes efficient use of a highly complex, hetero-geneous machine. This adaptation amounts to solve an optimization problem in a huge and unstructured search space, while predicting the performance outcome of complex sequences of program transformations. The polyhedral model of com-pilation is aimed at these challenges. Its geometrical, non-inductive semantics enables the construction of better-structured optimization problems and pre-cise analytical models. Recent work demonstrated the scalability of the main polyhedral algorithms to real-world programs. Its integration into production compilers is under way, pioneered by the graphite branch of the GNU Compiler Collection (GCC). Two years after the effective beginning of the project, this paper reports on original questions and innovative solutions that arose during the design and implementation of graphite. 1
Polyhedral extraction tool
- In Second International Workshop on Polyhedral Compilation Techniques (IMPACT’12
, 2012
"... We present a new library for extracting a polyhedral model from C source. The library is based on clang, the LLVM C frontend, and isl, a library for manipulating quasi-affine sets and relations. The use of clang for parsing the C code brings advanced diagnostics and full support for C99. The use of ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
(Show Context)
We present a new library for extracting a polyhedral model from C source. The library is based on clang, the LLVM C frontend, and isl, a library for manipulating quasi-affine sets and relations. The use of clang for parsing the C code brings advanced diagnostics and full support for C99. The use of isl allows for an easy construction and a powerful and compact representation of the polyhedral model. Besides allowing arbitrary piecewise quasi-affine index expressions and con-ditions, the library also supports some data dependent con-structs and has special treatment for unsigned integers. The library has been successfully used to obtain polyhedral mod-els for use in an equivalence checker, a tool for constructing polyhedral process networks, a parallelizer targeting GPUs and an interactive polyhedral environment.
Autotuning gemm kernels for the fermi gpu
- Parallel and Distributed Systems, IEEE Transactions on
"... Abstract—In recent years, the use of graphics chips has been recognized as a viable way of accelerating scientific and engineering applications, even more so since the introduction of the Fermi architecture by NVIDIA, with features essential to numerical computing, such as fast double precision arit ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
(Show Context)
Abstract—In recent years, the use of graphics chips has been recognized as a viable way of accelerating scientific and engineering applications, even more so since the introduction of the Fermi architecture by NVIDIA, with features essential to numerical computing, such as fast double precision arithmetic and memory protected with error correction codes. Being the crucial component of numerical software packages, such as LAPACK and ScaLAPACK, the general dense matrix multiplication routine is one of the more important workloads to be implemented on these devices. This paper presents a methodology for producing matrix multiplication kernels tuned for a specific architecture, through a canonical process of heuristic autotuning, based on generation of multiple code variants and selecting the fastest ones through benchmarking. The key contribution of this work is in the method for generating the search space; specifically, pruning it to a manageable size. Performance numbers match or exceed other available implementations.
When Polyhedral Transformations Meet SIMD Code Generation
"... Data locality and parallelism are critical optimization objectives for performance on modern multi-core machines. Both coarse-grain parallelism (e.g., multi-core) and fine-grain parallelism (e.g., vector SIMD) must be effectively exploited, but despite decades of progress at both ends, current compi ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
(Show Context)
Data locality and parallelism are critical optimization objectives for performance on modern multi-core machines. Both coarse-grain parallelism (e.g., multi-core) and fine-grain parallelism (e.g., vector SIMD) must be effectively exploited, but despite decades of progress at both ends, current compiler optimization schemes that attempt to address data locality and both kinds of parallelism often fail at one of the three objectives. We address this problem by proposing a 3-step framework, which aims for integrated data locality, multi-core parallelism and SIMD execution of programs. We define the concept of vectorizable codelets, with properties tailored to achieve effective SIMD code generation for the codelets. We leverage the power of a modern high-level transformation framework to restructure a program to expose good ISA-independent vectorizable codelets, exploiting multi-dimensional data reuse. Then,