Results 1  10
of
112
Code generation in the polyhedral model is easier than you think
 In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT’04
, 2004
"... Many advances in automatic parallelization and optimization have been achieved through the polyhedral model. It has been extensively shown that this computational model provides convenient abstractions to reason about and apply program transformations. Nevertheless, the complexity of code generation ..."
Abstract

Cited by 113 (17 self)
 Add to MetaCart
Many advances in automatic parallelization and optimization have been achieved through the polyhedral model. It has been extensively shown that this computational model provides convenient abstractions to reason about and apply program transformations. Nevertheless, the complexity of code generation has long been a deterrent for using polyhedral representation in optimizing compilers. First, code generators have a hard time coping with generated code size and control overhead that may spoil theoretical benefits achieved by the transformations. Second, this step is usually time consuming, hampering the integration of the polyhedral framework in production compilers or feedbackdirected, iterative optimization schemes. Moreover, current code generation algorithms only cover a restrictive set of possible transformation functions. This paper discusses a general transformation framework able to deal with nonunimodular, noninvertible, nonintegral or even nonuniform functions. It presents several improvements to a stateoftheart code generation algorithm. Two directions are explored: generated code size and code generator efficiency. Experimental evidence proves the ability of the improved method to handle reallife problems. 1.
Counting Solutions to Linear and Nonlinear Constraints through Ehrhart Polynomials: Applications to Analyze and Transform Scientific Programs
, 1996
"... In order to produce efficient parallel programs, optimizing compilers need to include an analysis of the initial sequential code. When analyzing loops with affine loop bounds, many computations are relevant to the same general problem: counting the number of integer solutions of selected free variab ..."
Abstract

Cited by 97 (0 self)
 Add to MetaCart
In order to produce efficient parallel programs, optimizing compilers need to include an analysis of the initial sequential code. When analyzing loops with affine loop bounds, many computations are relevant to the same general problem: counting the number of integer solutions of selected free variables in a set of linear and/or nonlinear parameterized constraints. For example, computing the number of flops executed by a loop, of memory locations touched by a loop, of cache lines touched by a loop, or of array elements that need to be transmitted from a processor to another during the execution of a loop, is useful to determine if a loop is load balanced, evaluate message traffic and allocate message buffers. The objective of the presented method is to evaluate symbolically, in terms of symbolic constants (the size parameters) , this number of integer solutions. By modeling the considered counting problem as a union of rational convex polytopes, the number of included integer points is ...
Parametric Analysis of Polyhedral Iteration Spaces
 JOURNAL OF VLSI SIGNAL PROCESSING
, 1998
"... In the area of automatic parallelization of programs, analyzing and transforming loop nests with parametric affine loop bounds requires fundamental mathematical results. The most common geometrical model of iteration spaces, called the polytope model, is based on mathematics dealing with convex and ..."
Abstract

Cited by 68 (13 self)
 Add to MetaCart
In the area of automatic parallelization of programs, analyzing and transforming loop nests with parametric affine loop bounds requires fundamental mathematical results. The most common geometrical model of iteration spaces, called the polytope model, is based on mathematics dealing with convex and discrete geometry, linear programming, combinatorics and geometry of numbers. In this paper, we present automatic methods for computing the parametric vertices and the Ehrhart polynomial, i.e. a parametric expression of the number of integer points, of a polytope defined by a set of parametric linear constraints. These methods have many applications in analysis and transformations of nested loop programs. The paper is illustrated with exact symbolic array dataflow analysis, estimation of execution time, and with the computation of the maximum available parallelism of given loop nests.
Interprocedural array regions analyses
, 1995
"... In order to perform powerful program optimizations, an exact interprocedural analysis of array data ow is needed. For that purpose, two new types of array region are introduced. IN and OUT regions represent the sets of array elements, the values of which are imported to or exported from the current ..."
Abstract

Cited by 66 (8 self)
 Add to MetaCart
In order to perform powerful program optimizations, an exact interprocedural analysis of array data ow is needed. For that purpose, two new types of array region are introduced. IN and OUT regions represent the sets of array elements, the values of which are imported to or exported from the current statement or procedure. Among the various applications are: compilation of communications for messagepassing machines, array privatization, compiletime optimization of local memory or cache behavior in hierarchical memory machines.
A practical automatic polyhedral parallelizer and locality optimizer
 In PLDI ’08: Proceedings of the ACM SIGPLAN 2008 conference on Programming language design and implementation
, 2008
"... We present the design and implementation of an automatic polyhedral sourcetosource transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytical mod ..."
Abstract

Cited by 63 (2 self)
 Add to MetaCart
We present the design and implementation of an automatic polyhedral sourcetosource transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytical modeldriven automatic transformation in the polyhedral model.Unlike previous polyhedral frameworks, our approach is an endtoend fully automatic one driven by an integer linear optimization framework that takes an explicit view of finding good ways of tiling for parallelism and locality using affine transformations. The framework has been implemented into a tool to automatically generate OpenMP parallel code from C program sections. Experimental results from the tool show very high performance for local and parallel execution on multicores, when compared with stateoftheart compiler frameworks from the research community as well as the best native production compilers. The system also enables the easy use of powerful empirical/iterative optimization for general arbitrarily nested loop sequences.
Formalized Methodology for Data Reuse Exploration for LowPower Hierarchical Memory Mappings
, 1997
"... Efficient use of an optimized custom memory hierarchy to exploit temporal locality in the data accesses can have a very large impact on the power consumption in data dominated applications. In the past experiments have demonstrated that this task is crucial in a complete lowpower memory management ..."
Abstract

Cited by 62 (3 self)
 Add to MetaCart
Efficient use of an optimized custom memory hierarchy to exploit temporal locality in the data accesses can have a very large impact on the power consumption in data dominated applications. In the past experiments have demonstrated that this task is crucial in a complete lowpower memory management methodology. But effective formalized techniques to deal with this specific task have not been addressed yet. In this paper, the surprisingly large design freedom available for the basic problem is explored indepth and the outline of a systematic solution methodology is proposed. The efficiency of the methodology is illustrated on a reallife motion estimation application. The results obtained for this application show power reductions of about 85% for the memory subsystem compared to the case without a custom memory hierarchy. These large gains justify that data reuse and memory hierarchy decisions should be taken early in the design flow. Keywords Specialissuelowpower97, systemleve...
Parameterized Polyhedra and their Vertices
 International Journal of Parallel Programming
, 1995
"... Algorithms specified for parametrically sized problems are more general purpose and more reusable than algorithms for fixed sized problems. For this reason, there is a need for representing and symbolically analyzing linearly parameterized algorithms. An important class of parallel algorithms can be ..."
Abstract

Cited by 43 (11 self)
 Add to MetaCart
Algorithms specified for parametrically sized problems are more general purpose and more reusable than algorithms for fixed sized problems. For this reason, there is a need for representing and symbolically analyzing linearly parameterized algorithms. An important class of parallel algorithms can be described as systems of parameterized affine recurrence equations (PARE). In this representation, linearly parameterized polyhedra are used to describe the domains of variables. This paper describes an algorithm which computes the set of parameterized vertices of a polyhedron, given its representation as a system of parameterized inequalities. This provides an important tool for the symbolic analysis of the parameterized domains used to define variables and computation domains in PARE's. A library of operations on parameterized polyhedra based on the Polyhedral Library has been written in C and is freely distributed. 1 Introduction In order to improve the performance of scientific programs...
Precise Widening Operators for Convex Polyhedra
 Static Analysis: Proceedings of the 10th International Symposium, volume 2694 of Lecture Notes in Computer Science
, 2003
"... Convex polyhedra constitute the most used abstract domain among those capturing numerical relational information. Since the domain of convex polyhedra admits infinite ascending chains, it has to be used in conjunction with appropriate mechanisms for enforcing and accelerating convergence of the ..."
Abstract

Cited by 42 (9 self)
 Add to MetaCart
Convex polyhedra constitute the most used abstract domain among those capturing numerical relational information. Since the domain of convex polyhedra admits infinite ascending chains, it has to be used in conjunction with appropriate mechanisms for enforcing and accelerating convergence of the fixpoint computation. Widening operators provide a simple and general characterization for such mechanisms. For the domain of convex polyhedra, the original widening operator proposed by Cousot and Halbwachs amply deserves the name of standard widening since most analysis and verification tools that employ convex polyhedra also employ that operator. Nonetheless, there is an unfulfilled demand for more precise widening operators. In this paper, after a formal introduction to the standard widening where we clarify some aspects that are often overlooked, we embark on the challenging task of improving on it. We present a framework for the systematic definition of new and precise widening operators for convex polyhedra. The framework is then instantiated so as to obtain a new widening operator that combines several heuristics and uses the standard widening as a last resort so that it is never less precise. A preliminary experimental evaluation has yielded promising results.
Iterative optimization in the polyhedral model: Part II, multidimensional time
 IN PLDI ’08: PROCEEDINGS OF THE 2008 ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION. USA: ACM
"... Highlevel loop optimizations are necessary to achieve good performance over a wide variety of processors. Their performance impact can be significant because they involve indepth program transformations that aiming to sustain a balanced workload over the computational, storage, and communication r ..."
Abstract

Cited by 40 (15 self)
 Add to MetaCart
Highlevel loop optimizations are necessary to achieve good performance over a wide variety of processors. Their performance impact can be significant because they involve indepth program transformations that aiming to sustain a balanced workload over the computational, storage, and communication resources of the target architecture. Therefore, it is mandatory that the compiler accurately models the target architecture and the effects of complex code restructuring. However, because optimizing compilers (1) use simplistic performance models that abstract away many of the complexities of modern architectures, (2) rely on inaccurate dependence analysis, and (3) lack frameworks to express complex interactions of transformation sequences, they typically uncover only a fraction of the peak performance available on many applications. We propose a complete iterative framework to address these issues. We rely on the polyhedral model to construct and traverse a large and expressive search space. This space encompasses only legal, distinct versions resulting from the restructuring of any static control loop nest. We first propose a feedbackdriven iterative heuristic tailored to the search space properties of the polyhedral model. Though, it quickly converges to good solutions for small kernels, larger benchmarks containing higher dimensional spaces are more challenging and our heuristic misses opportunities for significant performance improvement. Thus, we introduce the use of a genetic algorithm with specialized operators that leverage the polyhedral representation of program dependences. We provide experimental evidence that the genetic algorithm effectively traverses huge optimization spaces, achieving good performance improvements on large loop nests.
Let’s Study WholeProgram Cache Behaviour Analytically
 In Proceedings of International Symposium on HighPerformance Computer Architecture (HPCA 8
, 2002
"... ..."