Results 1  10
of
10
Code generation in the polyhedral model is easier than you think
 In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT’04
, 2004
"... Many advances in automatic parallelization and optimization have been achieved through the polyhedral model. It has been extensively shown that this computational model provides convenient abstractions to reason about and apply program transformations. Nevertheless, the complexity of code generation ..."
Abstract

Cited by 167 (16 self)
 Add to MetaCart
(Show Context)
Many advances in automatic parallelization and optimization have been achieved through the polyhedral model. It has been extensively shown that this computational model provides convenient abstractions to reason about and apply program transformations. Nevertheless, the complexity of code generation has long been a deterrent for using polyhedral representation in optimizing compilers. First, code generators have a hard time coping with generated code size and control overhead that may spoil theoretical benefits achieved by the transformations. Second, this step is usually time consuming, hampering the integration of the polyhedral framework in production compilers or feedbackdirected, iterative optimization schemes. Moreover, current code generation algorithms only cover a restrictive set of possible transformation functions. This paper discusses a general transformation framework able to deal with nonunimodular, noninvertible, nonintegral or even nonuniform functions. It presents several improvements to a stateoftheart code generation algorithm. Two directions are explored: generated code size and code generator efficiency. Experimental evidence proves the ability of the improved method to handle reallife problems. 1.
Iterative optimization in the polyhedral model: Part II, multidimensional time
 IN PLDI ’08: PROCEEDINGS OF THE 2008 ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION. USA: ACM
"... Highlevel loop optimizations are necessary to achieve good performance over a wide variety of processors. Their performance impact can be significant because they involve indepth program transformations that aiming to sustain a balanced workload over the computational, storage, and communication r ..."
Abstract

Cited by 55 (16 self)
 Add to MetaCart
(Show Context)
Highlevel loop optimizations are necessary to achieve good performance over a wide variety of processors. Their performance impact can be significant because they involve indepth program transformations that aiming to sustain a balanced workload over the computational, storage, and communication resources of the target architecture. Therefore, it is mandatory that the compiler accurately models the target architecture and the effects of complex code restructuring. However, because optimizing compilers (1) use simplistic performance models that abstract away many of the complexities of modern architectures, (2) rely on inaccurate dependence analysis, and (3) lack frameworks to express complex interactions of transformation sequences, they typically uncover only a fraction of the peak performance available on many applications. We propose a complete iterative framework to address these issues. We rely on the polyhedral model to construct and traverse a large and expressive search space. This space encompasses only legal, distinct versions resulting from the restructuring of any static control loop nest. We first propose a feedbackdriven iterative heuristic tailored to the search space properties of the polyhedral model. Though, it quickly converges to good solutions for small kernels, larger benchmarks containing higher dimensional spaces are more challenging and our heuristic misses opportunities for significant performance improvement. Thus, we introduce the use of a genetic algorithm with specialized operators that leverage the polyhedral representation of program dependences. We provide experimental evidence that the genetic algorithm effectively traverses huge optimization spaces, achieving good performance improvements on large loop nests.
Improving data locality by chunking
 In CC’12 Intl. Conference on Compiler Construction, LNCS 2622
, 2003
"... Abstract. Cache memories were invented to decouple fast processors from slow memories. However, this decoupling is only partial, and many researchers have attempted to improve cache use by program optimization. Potential benefits are significant since both energy dissipation and performance highly d ..."
Abstract

Cited by 28 (10 self)
 Add to MetaCart
(Show Context)
Abstract. Cache memories were invented to decouple fast processors from slow memories. However, this decoupling is only partial, and many researchers have attempted to improve cache use by program optimization. Potential benefits are significant since both energy dissipation and performance highly depend on the traffic between memory levels. But modeling the traffic is difficult; this observation has led to the use of heuristic methods for steering program transformations. In this paper, we propose another approach: we simplify the cache model and we organize the target program in such a way that an asymptotic evaluation of the memory traffic is possible. This information is used by our optimization algorithm in order to find the best reordering of the program operations, at least in an asymptotic sense. Our method optimizes both temporal and spatial locality. It can be applied to any static control program with arbitrary dependences. The optimizer has been partially implemented and applied to nontrivial programs. We present experimental evidence that the amount of cache misses is drastically reduced with corresponding performance improvements. 1
Unimodular Transformations of NonPerfectly Nested Loops
 Parallel Computing
, 1997
"... A framework is described in which a class of imperfectly nested loops can be restructured using unimodular transformations. In this framework, an imperfect loop nest is converted to a perfect loop nest using AbuSufah's NonBasictoBasicLoop transformation. Conditions for the legality of this ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
(Show Context)
A framework is described in which a class of imperfectly nested loops can be restructured using unimodular transformations. In this framework, an imperfect loop nest is converted to a perfect loop nest using AbuSufah's NonBasictoBasicLoop transformation. Conditions for the legality of this transformation and techniques for their verification are discussed. An iteration space, which extends the usual concept so as to represent explicitly the executions of individual statements, is proposed to model the converted loop nest. Since the converted loop nest is a perfect loop nest, data dependences can be extracted and the optimal transformation can be selected for parallelism and/or locality in the normal manner. To generate the restructured code for a unimodular transformation, a code generation method is provided that produces the restructured code that is free of if statements by construction. Keywords: Unimodular transformation; Imperfect loop nest; Data dependence; NonBasic toBa...
Code Generation in the Polytope Model
 In IEEE PACT
, 1998
"... Automatic parallelization of nested loops, based on a mathematical model, the polytope model, has been improved significantly over the last decade: stateoftheart methods allow flexible distributions of computations in space and time, which lead to highquality parallelism. However, these methods h ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
(Show Context)
Automatic parallelization of nested loops, based on a mathematical model, the polytope model, has been improved significantly over the last decade: stateoftheart methods allow flexible distributions of computations in space and time, which lead to highquality parallelism. However, these methods have not found their way into practical parallelizing compilers due to the lack of code generation schemes which are able to deal with the newfound flexibility. To close this gap is the purpose of this paper. 1. Introduction In recent years, methods for automatic parallelization of nested loops based on a mathematical model, the polytope model [9, 12], have been improved significantly. The focus has been on identifying good schedules, i.e., distributions of computations in time, e.g., [6, 8], and allocations, i.e., distributions of computations in space, e.g., [5, 14]. Thus, the spacetime mapping, i.e., the combination of schedule and allocation, derived by stateoftheart techniques oft...
Set and Relation Manipulation for the Sparse Polyhedral Framework
"... Abstract. The Sparse Polyhedral Framework (SPF) extends the Polyhedral Model by using the uninterpreted function call abstraction for the compiletime specification of runtime reordering transformations such as loop and data reordering and sparse tiling approaches that schedule irregular sets of it ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Abstract. The Sparse Polyhedral Framework (SPF) extends the Polyhedral Model by using the uninterpreted function call abstraction for the compiletime specification of runtime reordering transformations such as loop and data reordering and sparse tiling approaches that schedule irregular sets of iteration across loops. The Polyhedral Model represents sets of iteration points in imperfectly nested loops with unions of polyhedral and represents loop transformations with affine functions applied to such polyhedra sets. Existing tools such as ISL, Cloog, and Omega manipulate polyhedral sets and affine functions, however the ability to represent the sets and functions where some of the constraints include uninterpreted function calls such as those needed in the SPF is nonexistant or severely restricted. This paper presents algorithms for manipulating sets and relations with uninterpreted function symbols to enable the Sparse Polyhedral Framework. The algorithms have been implemented in an open source, C++ library called IEGenLib (The Inspector/Executor Generator Library). 1
Reordering methods for data locality improvement
 in « Workshop on Compilers for Parallel Computers
, 2003
"... Cache memories were invented to decouple fast processors from slow memories. However, this decoupling is only partial, and many researchers have attempted to improve cache use by program optimization. Potential benefits are significant since both energy dissipation and performance highly depend on t ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Cache memories were invented to decouple fast processors from slow memories. However, this decoupling is only partial, and many researchers have attempted to improve cache use by program optimization. Potential benefits are significant since both energy dissipation and performance highly depend on the traffic between memory levels. But modeling the traffic is difficult; this observation has led to the use of heuristic methods for steering program transformations. In this paper, we propose another approach: we simplify the cache model and we organize the target program in such a way that an asymptotic evaluation of the memory traffic is possible. This information is used by our optimization algorithm in order to find the best reordering of the program operations, at least in an asymptotic sense. Our method optimizes both temporal and spatial locality. It can be applied to any static control program with arbitrary dependences. The optimizer has been partially implemented and applied to nontrivial programs. We present experimental evidence that the amount of cache misses is drastically reduced with corresponding performance improvements. 1
EigenvectorsBased Parallelisation of Nested Loops with Affine Dependences
"... This paper presents a method for parallelising nested loops with affine dependences. The data dependences of a program are represented exactly using a dependence matrix rather than an imprecise dependence abstraction. By a careful analysis of the eigenvectors and eigenvalues of the dependence matrix ..."
Abstract
 Add to MetaCart
This paper presents a method for parallelising nested loops with affine dependences. The data dependences of a program are represented exactly using a dependence matrix rather than an imprecise dependence abstraction. By a careful analysis of the eigenvectors and eigenvalues of the dependence matrix, we detect the parallelism inherent in the program, partition the iteration space of the program into sequential and parallel regions and generate parallel code to execute these regions. For a class of programs considered in the paper, the proposed method can expose more coarsegrain and finegrain parallelism than a hyperplanebased loop transformation. 1
Code Optimization in the Polyhedron Model – Improving the Efficiency of Parallel Loop Nests
"... This thesis was supported by the DFG through the LooPo/HPF project. The text was typeset using the teTEX package and Xemacs; most figures were created using the open source program xfig. Most experiments were conducted using the free HPF compiler ADAPTOR and the open source GNUFortran compiler g77. ..."
Abstract
 Add to MetaCart
This thesis was supported by the DFG through the LooPo/HPF project. The text was typeset using the teTEX package and Xemacs; most figures were created using the open source program xfig. Most experiments were conducted using the free HPF compiler ADAPTOR and the open source GNUFortran compiler g77. The LooPo project makes extensive use of the free software packages Polylib, CLooG, PIP, and Omega. Therefore, thanks are due to all authors of the respective packages! Special thanks are due to all members of the LooPo project – former and present: from the beginning, it has always been a pleasure to be part of this team. I am particularly indebted to Dipl. Inf. Thomas Wondrak. He conducted and analyzed an incredible number of experiments. He also provided valuable assistance during the work on his diploma thesis and later on – not only by doing much of the “dirty work ” but also with the valuable discussions we had and with his insightful questions. I am also most grateful to both of my advisors, Christian Lengauer, Ph.D., who gave me the opportunity to work on this thesis, and Priv.Doz. Dr. Martin Griebl, who went through lengthy discussions with me. He helped me with theoretical and technical aspects of this thesis and always had a word of encouragement for me. Finally, I have to thank my fiancée, Elke