Results 1  10
of
87
Generation of Efficient Nested Loops from Polyhedra
 International Journal of Parallel Programming
, 2000
"... Automatic parallelization in the polyhedral model is based on affine transformations from an original computation domain (iteration space) to a target spacetime domain, often with a different transformation for each variable. Code generation is an often ignored step in this process that has a signi ..."
Abstract

Cited by 92 (4 self)
 Add to MetaCart
Automatic parallelization in the polyhedral model is based on affine transformations from an original computation domain (iteration space) to a target spacetime domain, often with a different transformation for each variable. Code generation is an often ignored step in this process that has a significant impact on the quality of the final code. It involves making a tradeoff between code size and control code simplification/optimization. Previous methods of doing code generation are based on loop splitting, however they have nonoptimal behavior when working on parameterized programs. We present a general parameterized method for code generation based on dual representation of polyhedra. Our algorithm uses a simple recursion on the dimensions of the domains, and enables fine control over the tradeoff between code size and control overhead.
Compaan: Deriving Process Networks from Matlab for Embedded Signal Processing Architectures
 IN PROCEEDINGS OF THE 8TH INTERNATIONAL WORKSHOP ON HARDWARE/SOFTWARE CODESIGN (CODES
, 2000
"... This paper presents the Compaan tool that automatically transforms a nested loop program written in Matlab into a processnetwork specification. The process ..."
Abstract

Cited by 67 (13 self)
 Add to MetaCart
(Show Context)
This paper presents the Compaan tool that automatically transforms a nested loop program written in Matlab into a processnetwork specification. The process
Parameterized Polyhedra and their Vertices
 International Journal of Parallel Programming
, 1995
"... Algorithms specified for parametrically sized problems are more general purpose and more reusable than algorithms for fixed sized problems. For this reason, there is a need for representing and symbolically analyzing linearly parameterized algorithms. An important class of parallel algorithms can be ..."
Abstract

Cited by 50 (11 self)
 Add to MetaCart
Algorithms specified for parametrically sized problems are more general purpose and more reusable than algorithms for fixed sized problems. For this reason, there is a need for representing and symbolically analyzing linearly parameterized algorithms. An important class of parallel algorithms can be described as systems of parameterized affine recurrence equations (PARE). In this representation, linearly parameterized polyhedra are used to describe the domains of variables. This paper describes an algorithm which computes the set of parameterized vertices of a polyhedron, given its representation as a system of parameterized inequalities. This provides an important tool for the symbolic analysis of the parameterized domains used to define variables and computation domains in PARE's. A library of operations on parameterized polyhedra based on the Polyhedral Library has been written in C and is freely distributed. 1 Introduction In order to improve the performance of scientific programs...
Counting integer points in parametric polytopes using Barvinok’s rational functions
 Algorithmica
, 2007
"... Abstract Many compiler optimization techniques depend on the ability to calculate the number of elements that satisfy certain conditions. If these conditions can be represented by linear constraints, then such problems are equivalent to counting the number of integer points in (possibly) parametric ..."
Abstract

Cited by 44 (9 self)
 Add to MetaCart
(Show Context)
Abstract Many compiler optimization techniques depend on the ability to calculate the number of elements that satisfy certain conditions. If these conditions can be represented by linear constraints, then such problems are equivalent to counting the number of integer points in (possibly) parametric polytopes. It is well known that the enumerator of such a set can be represented by an explicit function consisting of a set of quasipolynomials each associated with a chamber in the parameter space. Previously, interpolation was used to obtain these quasipolynomials, but this technique has several disadvantages. Its worstcase computation time for a single quasipolynomial is exponential in the input size, even for fixed dimensions. The worstcase size of such a quasipolynomial (measured in bits needed to represent the quasipolynomial) is also exponential in the input size. Under certain conditions this technique even fails to produce a solution. Our main contribution is a novel method for calculating the required quasipolynomials analytically. It extends an existing method, based on Barvinok’s decomposition,
Generating Cache Hints for Improved Program Efficiency
 JOURNAL OF SYSTEMS ARCHITECTURE
, 2004
"... One of the new extensions in EPIC architectures are cache hints. On each memory instruction, two kinds of hints can be attached: a source cache hint and a target cache hint. The source hint indicates the true latency of the instruction, which is used by the compiler to improve the instruction schedu ..."
Abstract

Cited by 34 (4 self)
 Add to MetaCart
One of the new extensions in EPIC architectures are cache hints. On each memory instruction, two kinds of hints can be attached: a source cache hint and a target cache hint. The source hint indicates the true latency of the instruction, which is used by the compiler to improve the instruction schedule. The target hint indicates at which cache levels it is profitable to retain data, allowing to improve cache replacement decisions at run time. A compiletime method is presented which calculates appropriate cache hints. Both kind of hints are based on the locality of the instruction, measured by the reuse distance metric. Two
Precise Data Locality Optimization of Nested Loops
 J. SUPERCOMPUT
, 2002
"... A significant source for enhancing application performance and for reducing power consumption in embedded processor applications is to improve the usage of the memory hierarchy. In this paper, a temporal and spatial locality optimization framework of nested loops is proposed, driven by parameterized ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
(Show Context)
A significant source for enhancing application performance and for reducing power consumption in embedded processor applications is to improve the usage of the memory hierarchy. In this paper, a temporal and spatial locality optimization framework of nested loops is proposed, driven by parameterized cost functions. The considered loops can be imperfectly nested. New data layouts are propagated through the connected references and through the loop nests as constraints for optimizing the next connected reference in the same nest or in the other ones. Unlike many existing methods, special attention is paid to TLB (Translation Lookaside Buffer) effectiveness since TLB misses can take from tens to hundreds of processor cycles. Our approach only considers active data, that is, array elements that are actually accessed by a loop, in order to prevent useless memory loads and take advantage of storage compression and temporal locality. Moreover, the same data transformation is not necessarily applied to a whole array. Depending on the referenced data subsets, the transformation can result in different data layouts for a same array. This can significantly improve the performance since a priori incompatible references can be simultaneously optimized. Finally, the process does not only consider the innermost loop level but all levels. Hence, large strides when control returns to the enclosing loop are avoided in several cases, and better optimization is provided in the case of a small index range of the innermost loop.
Analytical Computation of Ehrhart Polynomials: Enabling more Compiler Analyses and Optimizations
 In CASES
, 2004
"... Many optimization techniques, including several targeted specifically at embedded systems, depend on the ability to calculate the number of elements that satisfy certain conditions. If these conditions can be represented by linear constraints, then such problems are equivalent to counting the number ..."
Abstract

Cited by 30 (8 self)
 Add to MetaCart
Many optimization techniques, including several targeted specifically at embedded systems, depend on the ability to calculate the number of elements that satisfy certain conditions. If these conditions can be represented by linear constraints, then such problems are equivalent to counting the number of integer points in (possibly) parametric polytopes. It is well known that this parametric count can be represented by a set of Ehrhart polynomials. Previously, interpolation was used to obtain these polynomials, but this technique has several disadvantages. Its worstcase computation time for a single Ehrhart polynomial is exponential in the input size, even for fixed dimensions. The worstcase size of such an Ehrhart polynomial (measured in bits needed to represent the polynomial) is also exponential in the input size. Under certain conditions this technique even fails to produce a solution.
Handling Memory Cache Policy with Integer Points Countings
, 1997
"... . We propose an automatic method allowing the computation of useful information on the distribution of data in the memory cache. The number of distinct memory locations and cache lines touched by a loop, and the number of processors resulting from a distribution (CYCLIC, BLOCK, REPLICATE) from a ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
. We propose an automatic method allowing the computation of useful information on the distribution of data in the memory cache. The number of distinct memory locations and cache lines touched by a loop, and the number of processors resulting from a distribution (CYCLIC, BLOCK, REPLICATE) from array elements are computed. It is shown that these problems are relevant to the same general mathematical problem, which is the counting of the exact number of integer points resulting from linear and nonlinear mappings of polytopes. 1 Introduction The construction of realistic and efficient parallel programs requires the extraction and the use of much information, concerning as well the potential parallelism and the needed resources of the algorithm, as the availability of resources on the target architecture. But these are generally not trivial to determine. Nowadays processors are designed with a memory hierarchy organized into several levels, each of which is smaller, faster, and ...
Experiences with enumeration of integer projections of parametric polytopes
, 2004
"... Abstract. Many compiler optimization techniques depend on the ability to calculate the number of integer values that satisfy a given set of linear constraints. This count (the enumerator of a parametric polytope) is a function of the symbolic parameters that may appear in the constraints. In an ex ..."
Abstract

Cited by 20 (7 self)
 Add to MetaCart
(Show Context)
Abstract. Many compiler optimization techniques depend on the ability to calculate the number of integer values that satisfy a given set of linear constraints. This count (the enumerator of a parametric polytope) is a function of the symbolic parameters that may appear in the constraints. In an extended problem (the “integer projection ” of a parametric polytope), some of the variables that appear in the constraints may be existentially quantified and then the enumerated set corresponds to the projection of the integer points in a parametric polytope. This paper shows how to reduce the enumeration of the integer projection of parametric polytopes to the enumeration of parametric polytopes. Two approaches are described and experimentally compared. Both can solve problems that were considered very difficult to solve analytically. 1