• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

CHiLL: A framework for composing high-level loop transformations (2008)

by C Chen, J Chame, M Hall
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 38
Next 10 →

A scalable auto-tuning framework for compiler optimization

by Ananta Tiwari, Chun Chen, Jacqueline Chame, Mary Hall, Jeffrey K. Hollingsworth - In Proceedings of the 23rd IEEE International Parallel And Distributed Computing Symposium (IPDPS
"... We describe a scalable and general-purpose framework for auto-tuning compiler-generated code. We combine Active Harmony’s parallel search backend with the CHiLL compiler transformation framework to generate in parallel a set of alternative implementations of computation kernels and automatically sel ..."
Abstract - Cited by 49 (10 self) - Add to MetaCart
We describe a scalable and general-purpose framework for auto-tuning compiler-generated code. We combine Active Harmony’s parallel search backend with the CHiLL compiler transformation framework to generate in parallel a set of alternative implementations of computation kernels and automatically select the one with the best-performing implementation. The resulting system achieves performance of compiler-generated code comparable to the fully automated version of the ATLAS library for the tested kernels. Performance for various kernels is 1.4 to 3.6 times faster than the native Intel compiler without search. Our search algorithm simultaneously evaluates different combinations of compiler optimizations and converges to solutions in only a few tens of search-steps. 1
(Show Context)

Citation Context

...odes rapidly during the search by adjusting parameter values, without costly compiler reanalysis. It also demands that the compiler have a clean interface to a separate parameter search engine. CHiLL =-=[5, 6]-=-, a polyhedral loop transformation and code generation framework, provides such capability for composing high-level loop transformations with a script interface to describe the transformations and sea...

The Polyhedral Model Is More Widely Applicable Than You Think

by Mohamed-walid Benabderrahmane, Louis-noël Pouchet, Albert Cohen, Cédric Bastoul
"... Abstract. The polyhedral model is a powerful framework for automatic optimization and parallelization. It is based on an algebraic representation of programs, allowing to construct and search for complex sequences of optimizations. This model is now mature and reaches production compilers. The main ..."
Abstract - Cited by 33 (9 self) - Add to MetaCart
Abstract. The polyhedral model is a powerful framework for automatic optimization and parallelization. It is based on an algebraic representation of programs, allowing to construct and search for complex sequences of optimizations. This model is now mature and reaches production compilers. The main limitation of the polyhedral model is known to be its restriction to statically predictable, loop-based program parts. This paper removes this limitation, allowing to operate on general data-dependent control-flow. We embed control and exit predicates as first-class citizens of the algebraic representation, from program analysis to code generation. Complementing previous (partial) attempts in this direction, our work concentrates on extending the code generation step and does not compromise the expressiveness of the model. We present experimental evidence that our extension is relevant for program optimization and parallelization, showing performance improvements on benchmarks that were thought to be out of reach of the polyhedral model. 1
(Show Context)

Citation Context

...ive use of the polyhedral model to compile for multicore architectures, including GCC 4.4 and IBM XL. Compilers based on the Polyhedral model — including recent research tools like PoCC [29] or CHiLL =-=[8]-=- — target code parts that exactly fit the affine constraints of the model. Only loop nests with affine bounds and conditional expressions can be translated to a polyhedral representation. The reason b...

Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework

by Louis-noël Pouchet, Uday Bondhugula, Cédric Bastoul, Albert Cohen, J. Ramanujam
"... Abstract—Today’s multi-core era places significant demands on an optimizing compiler, which must parallelize programs, exploit memory hierarchy, and leverage the ever-increasing SIMD capabilities of modern processors. Existing model-based heuristics for performance optimization used in compilers are ..."
Abstract - Cited by 26 (5 self) - Add to MetaCart
Abstract—Today’s multi-core era places significant demands on an optimizing compiler, which must parallelize programs, exploit memory hierarchy, and leverage the ever-increasing SIMD capabilities of modern processors. Existing model-based heuristics for performance optimization used in compilers are limited in their ability to identify profitable parallelism/locality trade-offs and usually lead to sub-optimal performance. To address this problem, we distinguish optimizations for which effective model-based heuristics and profitability estimates exist, from optimizations that require empirical search to achieve good performance in a portable fashion. We have developed a completely automatic framework in which we focus the empirical search on the set of valid possibilities to perform fusion/code motion, and rely on model-based mechanisms to perform tiling, vectorization and parallelization on the transformed program. We demonstrate the effectiveness of this approach in terms of strong performance improvements on a single target as well as performance portability across different target architectures. I.
(Show Context)

Citation Context

...be found via empirical search. Powerful semi-automatic polyhedral frameworks have been designed as building blocks for compiler construction or (autotuned) library generation systems [37], [38], [1], =-=[39]-=-, [40]. They capture partitioning, but neither do they define automatic iteration schemes nor do they integrate a model-based heuristic to construct profitable parallelization and tiling strategies. T...

N.: Loop transformations: convexity, pruning and optimization

by Louis-Noël Pouchet , Uday Bondhugula , Cédric Bastoul , Albert Cohen , J Ramanujam , P Sadayappan , Nicolas Vasilache - SIGPLAN Not , 2011
"... Abstract High-level loop transformations are a key instrument in mapping computational kernels to effectively exploit resources in modern processor architectures. However, determining appropriate compositions of loop transformations to achieve this remains a significantly challenging task; current ..."
Abstract - Cited by 22 (3 self) - Add to MetaCart
Abstract High-level loop transformations are a key instrument in mapping computational kernels to effectively exploit resources in modern processor architectures. However, determining appropriate compositions of loop transformations to achieve this remains a significantly challenging task; current compilers may achieve significantly lower performance than hand-optimized programs. To address this fundamental challenge, we first present a convex characterization of all distinct, semantics-preserving, multidimensional affine transformations. We then bring together algebraic, algorithmic, and performance analysis results to design a tractable optimization algorithm over this highly expressive space. The framework has been implemented and validated experimentally on a representative set of benchmarks run on state-of-the-art multi-core platforms.
(Show Context)

Citation Context

... of transformations presented in this paper, while focusing the search only on semantics-preserving transformation candidates. Several powerful semi-automatic frameworks based on the polyhedral model =-=[9, 11, 20, 24, 43]-=- have been proposed; these frameworks are able to capture fusion structures, but do not construct profitable parallelization and tiling strategies using a modelbased heuristic. R-Stream is a source-to...

Predictive Modeling in a Polyhedral Optimization Space

by Eunjung Park, Louis-noël Pouchet, John Cavazos, Albert Cohen, et al. - INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO'11) , 2011
"... Significant advances in compiler optimization have been made in recent years, enabling many transformations such as tiling, fusion, parallelization and vectorization on imperfectly nested loops. Nevertheless, the problem of finding the best combination of loop transformations remains a major challen ..."
Abstract - Cited by 18 (3 self) - Add to MetaCart
Significant advances in compiler optimization have been made in recent years, enabling many transformations such as tiling, fusion, parallelization and vectorization on imperfectly nested loops. Nevertheless, the problem of finding the best combination of loop transformations remains a major challenge. Polyhedral models for compiler optimization have demonstrated strong potential for enhancing program performance, in particular for compute-intensive applications. But existing static cost models to optimize polyhedral transformations have significant limitations, and iterative compilation has become a very promising alternative to these models to find the most effective transformations. But since the number of polyhedral optimization alternatives can be enormous, it is often impractical to iterate over a significant fraction of the entire space of polyhedrally transformed variants. Recent research has focused on iterating over this search space either with manually-constructed heuristics or with automatic but very expensive search algorithms (e.g., genetic algorithms) that can eventually find good points in the polyhedral space. In this paper, we propose the use of machine learning to address the problem of selecting the best polyhedral optimizations. We show that these models can quickly find high-performance program variants in the polyhedral space, without resorting to extensive empirical search. We introduce models that take as input a characterization of a program based on its dynamic behavior, and predict the performance of aggressive high-level polyhedral transformations that includes tiling, parallelization and vectorization. We allow for a minimal empirical search on the target machine, discovering on average 83 % of the searchspace-optimal combinations in at most 5 runs. Our end-to-end framework is validated using numerous benchmarks on two multi-core platforms.
(Show Context)

Citation Context

...lexity of the restructuring transformations automatically generated by the polyhedral framework. Regarding iterative compilation in the polyhedral model, Chen et al. provided the CHiLL infrastructure =-=[9]-=-, a polyhedral loop transformation and code generation framework. Tiwari et al. [36] coupled the Active Harmony search engine to automatically tune some high-level transformation parameters, such as t...

Loop Transformation Recipes for Code Generation and Auto-Tuning

by Mary Hall, Jacqueline Chame, Chun Chen, Jaewook Shin, Gabe Rudy, Malik Murtaza Khan
"... Abstract. In this paper, we describe transformation recipes, which provide a high-level interface to the code transformation and code generation capability of a compiler. These recipes can be generated by compiler decision algorithms or savvy software developers. This interface is part of an auto-tu ..."
Abstract - Cited by 18 (2 self) - Add to MetaCart
Abstract. In this paper, we describe transformation recipes, which provide a high-level interface to the code transformation and code generation capability of a compiler. These recipes can be generated by compiler decision algorithms or savvy software developers. This interface is part of an auto-tuning framework that explores a set of different implementations of the same computation and automatically selects the best-performing implementation. Along with the original computation, a transformation recipe specifies a range of implementations of the computation resulting from composing a set of high-level code transformations. In our system, an underlying polyhedral framework coupled with transformation algorithms takes this set of transformations, composes them and automatically generates correct code. We first describe an abstract interface for transformation recipes, which we propose to facilitate interoperability with other transformation frameworks. We then focus on the specific transformation recipe interface used in our compiler and present performance results on its application to kernel and library tuning and tuning of key computations in high-end applications. We also show how this framework can be used to generate and auto-tune parallel OpenMP or CUDA code from a high-level specification. 1

Graphite two years after: First lessons learned from real-world polyhedral compilation

by Konrad Trifunovic, Albert Cohen, David Edelsohn, Li Feng, Tobias Grosser, Harsha Jagasia Razya Ladelsky - In GCC Research Opportunities Workshop (GROW’10 , 2010
"... Abstract. Modern compilers are responsible for adapting the semantics of source programs into a form that makes efficient use of a highly complex, hetero-geneous machine. This adaptation amounts to solve an optimization problem in a huge and unstructured search space, while predicting the performanc ..."
Abstract - Cited by 12 (6 self) - Add to MetaCart
Abstract. Modern compilers are responsible for adapting the semantics of source programs into a form that makes efficient use of a highly complex, hetero-geneous machine. This adaptation amounts to solve an optimization problem in a huge and unstructured search space, while predicting the performance outcome of complex sequences of program transformations. The polyhedral model of com-pilation is aimed at these challenges. Its geometrical, non-inductive semantics enables the construction of better-structured optimization problems and pre-cise analytical models. Recent work demonstrated the scalability of the main polyhedral algorithms to real-world programs. Its integration into production compilers is under way, pioneered by the graphite branch of the GNU Compiler Collection (GCC). Two years after the effective beginning of the project, this paper reports on original questions and innovative solutions that arose during the design and implementation of graphite. 1

Polyhedral extraction tool

by Sven Verdoolaege, Tobias Grosser - In Second International Workshop on Polyhedral Compilation Techniques (IMPACT’12 , 2012
"... We present a new library for extracting a polyhedral model from C source. The library is based on clang, the LLVM C frontend, and isl, a library for manipulating quasi-affine sets and relations. The use of clang for parsing the C code brings advanced diagnostics and full support for C99. The use of ..."
Abstract - Cited by 8 (4 self) - Add to MetaCart
We present a new library for extracting a polyhedral model from C source. The library is based on clang, the LLVM C frontend, and isl, a library for manipulating quasi-affine sets and relations. The use of clang for parsing the C code brings advanced diagnostics and full support for C99. The use of isl allows for an easy construction and a powerful and compact representation of the polyhedral model. Besides allowing arbitrary piecewise quasi-affine index expressions and con-ditions, the library also supports some data dependent con-structs and has special treatment for unsigned integers. The library has been successfully used to obtain polyhedral mod-els for use in an equivalence checker, a tool for constructing polyhedral process networks, a parallelizer targeting GPUs and an interactive polyhedral environment.
(Show Context)

Citation Context

...he input language is similar to Fortran and the parser includes some advanced features such as induction variable recognition and forward substitution of scalars. Like the pet predecessor pers, CHiLL =-=[13]-=- uses SUIF for parsing, whence no support for C99, and appears to handle even fewer constructs than pers. The LooPo [16] 3http://www.dps.uibk.ac.at/insieme/index.html project includes a polyhedral par...

Autotuning gemm kernels for the fermi gpu

by Jakub Kurzak, Stanimire Tomov, Jack Dongarra, Life Fellow - Parallel and Distributed Systems, IEEE Transactions on
"... Abstract—In recent years, the use of graphics chips has been recognized as a viable way of accelerating scientific and engineering applications, even more so since the introduction of the Fermi architecture by NVIDIA, with features essential to numerical computing, such as fast double precision arit ..."
Abstract - Cited by 8 (0 self) - Add to MetaCart
Abstract—In recent years, the use of graphics chips has been recognized as a viable way of accelerating scientific and engineering applications, even more so since the introduction of the Fermi architecture by NVIDIA, with features essential to numerical computing, such as fast double precision arithmetic and memory protected with error correction codes. Being the crucial component of numerical software packages, such as LAPACK and ScaLAPACK, the general dense matrix multiplication routine is one of the more important workloads to be implemented on these devices. This paper presents a methodology for producing matrix multiplication kernels tuned for a specific architecture, through a canonical process of heuristic autotuning, based on generation of multiple code variants and selecting the fastest ones through benchmarking. The key contribution of this work is in the method for generating the search space; specifically, pruning it to a manageable size. Performance numbers match or exceed other available implementations.
(Show Context)

Citation Context

...UDACHiLL source-to-source compiler transformation and code generation framework, which transforms sequential loop nests to high-performance GPU code, based on a polyhedral transformation system CHiLL =-=[9]-=-. Autotuning was used to explore a small parameter space (tiling in multiples of 16, up to 128). Fermi SGEMM A B kernel was produced with performance slightly lower than CUBLAS, due to not using textu...

When Polyhedral Transformations Meet SIMD Code Generation

by Martin Kong, Richard Veras, Kevin Stock, Franz Franchetti, Louis-noël Pouchet
"... Data locality and parallelism are critical optimization objectives for performance on modern multi-core machines. Both coarse-grain parallelism (e.g., multi-core) and fine-grain parallelism (e.g., vector SIMD) must be effectively exploited, but despite decades of progress at both ends, current compi ..."
Abstract - Cited by 7 (2 self) - Add to MetaCart
Data locality and parallelism are critical optimization objectives for performance on modern multi-core machines. Both coarse-grain parallelism (e.g., multi-core) and fine-grain parallelism (e.g., vector SIMD) must be effectively exploited, but despite decades of progress at both ends, current compiler optimization schemes that attempt to address data locality and both kinds of parallelism often fail at one of the three objectives. We address this problem by proposing a 3-step framework, which aims for integrated data locality, multi-core parallelism and SIMD execution of programs. We define the concept of vectorizable codelets, with properties tailored to achieve effective SIMD code generation for the codelets. We leverage the power of a modern high-level transformation framework to restructure a program to expose good ISA-independent vectorizable codelets, exploiting multi-dimensional data reuse. Then,
(Show Context)

Citation Context

...5, 26, 28, 30, 43]. These work are usually focusing on the back-end part, that is the actual SIMD code generation from a parallel loop [15, 28, 30], or on the highlevel loop transformation angle only =-=[12, 26, 38, 40]-=-. To the best of our knowledge, our work is the first to address simultaneously both problems by setting a well-defined interface between a powerful polyhedral high-level transformation engine and a s...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University