## Control Flow driven Code Hoisting at the Source Code Level

### BibTeX

@MISC{Falk_controlflow,

author = {Heiko Falk},

title = {Control Flow driven Code Hoisting at the Source Code Level},

year = {}

}

### OpenURL

### Abstract

This paper presents a novel source code optimization technique called advanced code hoisting. It aims at moving portions of code from inner loops to outer ones. In contrast to existing code motion techniques, this is done under consideration of control flow aspects. Depending on the conditions of if-statements, moving an expression can lead to an increased number of executions of this expression. This paper contains formal descriptions of the polyhedral models used for control flow analysis so as to suppress a code motion in such a situation. Due to the inherent portability of source code transformations, a very detailed benchmarking using 8 different processors was performed. The application of our implemented techniques to real-life multimedia benchmarks leads to average speed-ups of 25.5%–52 % and energy savings of 33.4%–74.5%. Furthermore, advanced code hoisting leads to improved pipeline and cache behavior and smaller code sizes. 1.

### Citations

1023 |
Advanced Compiler Design and Implementation
- Muchnick
- 1997
(Show Context)
Citation Context ...ern compilers are equipped with a large amount of different optimizations, among which common subexpression elimination (CSE) and loop-invariant code motion (LICM) have proven to be highly beneficial =-=[17]-=-. This paper presents a new source code optimization called advanced code hoisting (ACH). This technique is an elaborate combination of the already known CSE and LICM optimizations with a formal mathe... |

111 | A library for doing polyhedral operations
- Wilde
- 1993
(Show Context)
Citation Context ...on operators for polytopes are used in order to construct PFOR and PIF. Unfortunately, polytopes are m m not closed under the union operator. Instead, we use finite unions of polyhedra as proposed in =-=[20]-=- for which the union operator is closed. Hence, PΓ formally is not a polytope, but a finite union of polytopes. For the sake of simplicity, we keep on using the notion of polytopes instead of their fi... |

74 | Parametric analysis of polyhedral iteration spaces
- Clauss, Loechner
- 1998
(Show Context)
Citation Context ... a polytope is #P-complete [14] in terms of its number of linear (in-) equations and their dimensions. In order to determine the execution frequency of an expression expr, the techniques described in =-=[5]-=- for the computation of a polytope’s size are applied to P Γ . A detailed description of these techniques is omitted here since it is beyond the scope of this paper. In short, the parametric vertices ... |

30 | PolyLib: A library for manipulating parameterized polyhedra
- Loechner
- 1999
(Show Context)
Citation Context ...imes of only a few CPU seconds. 4. Benchmarking Results The techniques presented in the previous section are fully automated using the SUIF intermediate format [21] and the polyhedral library Polylib =-=[15]-=-. ACH is applied to the source codes of two representative benchmarks having passed the DTSE transformations (cf. Section 2). The CAVITY benchmark is a medical tomography image processor [2], and the ... |

30 |
Qsdpcm – a new technique in scene adaptive coding
- Strobach
- 1988
(Show Context)
Citation Context ... the source codes of two representative benchmarks having passed the DTSE transformations (cf. Section 2). The CAVITY benchmark is a medical tomography image processor [2], and the QSDPCM application =-=[19]-=- performs scene adaptive coding. The efficiency of the polyhedral analysis employed for ACH is apparent by virtue of the low runtimes required for the optimization of these benchmarks. Using a Pentium... |

16 | Control Flow driven Splitting of Loop Nests at the Source Code Level
- Falk, Marwedel
- 2003
(Show Context)
Citation Context ...of optimized data layouts and temporal locality improvement is presented in [16]. In this article, geometric models and algorithms are used to minimize TLB misses. Loop nest splitting as presented in =-=[7, 8]-=- uses polyhedral models in order to represent if-statements nested in loops. A complex analysis is performed in order to detect ranges of iterations of the loops such that all if-statements are satisf... |

15 | Partial Redundancy Elimination Driven by a Cost-Benefit Analysis
- Horspool, Ho
- 1997
(Show Context)
Citation Context ...erations of the loops such that all if-statements are satisfied. Using these iterations ranges, a loop nest is split in order to minimize if-statement executions. Partial Redundancy Elimination (PRE) =-=[12, 10, 3]-=- moves conditionally executed expressions outside their conditional scopes to enable the elimination of partial redundancies along frequently executed paths in a program. The most important disadvanta... |

14 |
et al. Compiler Transformations for High-Performance Computing
- Bacon
- 1994
(Show Context)
Citation Context ...ection 3 presents the analytical models for advanced code hoisting. Section 4 describes the benchmarking results, and Section 5 summarizes and concludes this paper. 2. Related Work Since CSE and LICM =-=[1, 17]-=- have been known for many years and can be found in any optimizing compiler, the discussion of these simple transformations is omitted here. However, literature clearly states that both CSE and LICM a... |

13 | Optimal and efficient speculationbased partial redundancy elimination
- Cai, Xue
(Show Context)
Citation Context ...erations of the loops such that all if-statements are satisfied. Using these iterations ranges, a loop nest is split in order to minimize if-statement executions. Partial Redundancy Elimination (PRE) =-=[12, 10, 3]-=- moves conditionally executed expressions outside their conditional scopes to enable the elimination of partial redundancies along frequently executed paths in a program. The most important disadvanta... |

11 |
et al. Data access and storage management for embedded programmable processors
- Catthoor
- 2002
(Show Context)
Citation Context ...Memory Allocation & Assignment explicitly by human programmers but is generated automatically by a compiler, the programmer is often unaware of the overhead due to memory accesses. The DTSE framework =-=[4]-=- of source code optimizations aims at the optimized exploitation Figure 3: DTSE stages causing of memory hierarControl & Addressing Overhead chies and thus has the effect of making addressing code exp... |

11 | Some algorithmic problems in polytope theory
- Kaibel, Pfetsch
- 2003
(Show Context)
Citation Context ...rding to definition 6. The execution frequency of the expression expr = γM+1 is equal to the size of P Γ : #expr = |P Γ | The computation of the number of points included in a polytope is #P-complete =-=[14]-=- in terms of its number of linear (in-) equations and their dimensions. In order to determine the execution frequency of an expression expr, the techniques described in [5] for the computation of a po... |

7 |
Source Code Optimization Techniques for Data Flow Dominated Embedded Software
- FALK, P
(Show Context)
Citation Context ...ble Using complete induction, theorem 1 can be proven. Due to the lack of space, the proof – as well as the ones for the following theorems – is not given in this paper. Instead, they can be found in =-=[6]-=-. For a given if-statement γm ∈ ϒ, the following polytope PIF m is generated: Definition 8 Let Γ =(γ1,...,γM,γM+1) be a nest of control flow structures composed of a loop nest Λ and a set of ifstateme... |

6 | Combined data partitioning and loop nest splitting for energy consumption minimization
- Falk, Verma
- 2004
(Show Context)
Citation Context ...of optimized data layouts and temporal locality improvement is presented in [16]. In this article, geometric models and algorithms are used to minimize TLB misses. Loop nest splitting as presented in =-=[7, 8]-=- uses polyhedral models in order to represent if-statements nested in loops. A complex analysis is performed in order to detect ranges of iterations of the loops such that all if-statements are satisf... |

6 |
et al. An Overview
- Wilson, Franch
- 1995
(Show Context)
Citation Context ...pplications leads to feasibly short runtimes of only a few CPU seconds. 4. Benchmarking Results The techniques presented in the previous section are fully automated using the SUIF intermediate format =-=[21]-=- and the polyhedral library Polylib [15]. ACH is applied to the source codes of two representative benchmarks having passed the DTSE transformations (cf. Section 2). The CAVITY benchmark is a medical ... |

5 |
et al. Analysis of High-level Address Code Transformations for Programmable Processors
- Gupta, Miranda
- 2000
(Show Context)
Citation Context ...he same value as ai, butA ′ is generated such that a maximal reuse of computations using CSEs is achieved. As a consequence, these algebraic transformations open up opportunities for CSE as stated in =-=[11]-=-. For the ADOPT transformations, experiments involving the manual application of a CSE combined with conventional LICM are reported. It is the contribution of this paper that a formal problem definiti... |

5 |
et al, “Formalized methodology for data reuse exploration for low-power hierarchical memory mappings
- Wuytack
- 1998
(Show Context)
Citation Context ...encies requires the insertion of if-statements depending on the loops’ index variables so as to access the appropriate memory locations at every point in time. The main idea of data reuse exploration =-=[22]-=- is to insert copies of the most frequently accessed parts of arrays in order to improve temporal locality. This also leads to a degraded control flow due to the insertion of if-statements so as to se... |

4 |
Taeymans et al. Automatic Segmentation of Cardiac MR Images
- Bister, Y
- 1989
(Show Context)
Citation Context ... requires the generation of complex addressing code reflecting the mapping of data elements to positions within an array. The effect of these parts of the DTSE methodology on a source code taken from =-=[2]-=- is illustrated in Figure 4. Since many different data elements are stored at the same 2addresses after in-place mapping, the same kind of address computations is performed several times at various l... |

1 |
Catthoor et al. Memory Size Reduction through Storage Order Optimization for embedded parallel Multimedia Applications
- Greef, F
- 1997
(Show Context)
Citation Context ... locality. This also leads to a degraded control flow due to the insertion of if-statements so as to select all relevant non-contiguous parts of an array that will be held in a copy. In-place mapping =-=[9]-=- aims at reusing physical memory by mapping different array elements that are not alive at the same time to the same memory location. This optimization requires the generation of complex addressing co... |

1 |
Berson et al. Path Profile Guided Partial Redundancy Elimination Using Speculation
- Gupta, A
- 1998
(Show Context)
Citation Context ...erations of the loops such that all if-statements are satisfied. Using these iterations ranges, a loop nest is split in order to minimize if-statement executions. Partial Redundancy Elimination (PRE) =-=[12, 10, 3]-=- moves conditionally executed expressions outside their conditional scopes to enable the elimination of partial redundancies along frequently executed paths in a program. The most important disadvanta... |

1 |
Word-level Algebraic Optimisation Techniques for Accelerator Data-Paths and custom Address Generators
- Janssen
- 2000
(Show Context)
Citation Context ...of address computations is performed several times at various locations in a DTSE optimized code. This is already acknowledged by the address optimization (ADOPT) phase. During regularity improvement =-=[13]-=-, given address expressions A = {a1,...,an} are transformed to A ′ = {a ′ 1 ,...,a′ n }.Every expression a ′ i computes the same value as ai, butA ′ is generated such that a maximal reuse of computati... |

1 |
et al. Precise Data Locality Optimization of nested Loops
- Loechner, Meister
(Show Context)
Citation Context ...ently used in order to represent memory accesses or iteration spaces of loop nests. An approach for simultaneous generation of optimized data layouts and temporal locality improvement is presented in =-=[16]-=-. In this article, geometric models and algorithms are used to minimize TLB misses. Loop nest splitting as presented in [7, 8] uses polyhedral models in order to represent if-statements nested in loop... |

1 |
et al. An accurate and fine grain instructionlevel energy model supporting software optimizations
- Steinke, Knauer
- 2001
(Show Context)
Citation Context ... 20% 10% 0% Sun Pentium HP MIPS TriMedia TI C6x ARM7 thmb ARM7 arm Average CAVITY QSDPCM on memory accesses and energy consumption using an instructionlevel energy Figure 10. Energy Consumption model =-=[18]-=- for the ARM7 core considering bit-toggles and offchip-memories and having an accuracy of 1.7%. The first four columns of Figure 10 depict the relative number Instr Read Data Read Data Write Mem Acces... |