## Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization (2005)

### Cached

### Download Links

- [polaris.cs.uiuc.edu]
- [iss.ices.utexas.edu]
- [iss.ices.utexas.edu]
- [www.ece.lsu.edu]
- [www.csc.lsu.edu]
- [www.eecis.udel.edu]
- [www.ece.lsu.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing (LCPC |

Citations: | 10 - 0 self |

### BibTeX

@INPROCEEDINGS{Epshteyn05analyticmodels,

author = {Arkady Epshteyn and Maria Garzaran and Gerald Dejong and David Padua and Gang Ren and Xiaoming Li and Kamen Yotov and Keshav Pingali},

title = {Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization},

booktitle = {In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing (LCPC},

year = {2005}

}

### OpenURL

### Abstract

Compilers employ system models, sometimes implicitly, to make code optimization decisions. These models are analytic; they reflect their implementor 's understanding and beliefs of the system. While their decisions can be made almost instantaneously, unless the model is perfect their decisions may be flawed.

### Citations

515 | FFTW: An adaptive software architecture for the FFT
- Frigo, Johnson
- 1998
(Show Context)
Citation Context ...del. But it is well suited to library generation where the high cost of optimal configuration decisions can be paid once. Well-known library generators that employ empirical optimization include FFTW =-=[12]-=-, ATLAS [17], PhiPAC [3] and SPIRAL [19]. An alternative decision procedure is an adaptive hybrid which includes only the prior information from the designer which he or she is most confident of. The ... |

244 | Optimizing matrix multiply using phipac: a portable, high-performance, ansi c coding methodology
- Bilmes, Asanovic, et al.
- 1997
(Show Context)
Citation Context ...d to library generation where the high cost of optimal configuration decisions can be paid once. Well-known library generators that employ empirical optimization include FFTW [12], ATLAS [17], PhiPAC =-=[3]-=- and SPIRAL [19]. An alternative decision procedure is an adaptive hybrid which includes only the prior information from the designer which he or she is most confident of. The rest is then filled in e... |

227 |
Optimizing Compilers for Modern Architectures
- Allen, Kennedy
- 2001
(Show Context)
Citation Context ...NU elements of matrix B and stores the result into a MU × NU sub-matrix of C. MU and NU are optimization parameters that must be chosen so that MU + NU + MU × NU fit in the registers of the processor =-=[2]-=-. To improve register allocation, ATLAS uses scalar replacement [5]: each element of A, B and C that is accessed in the unrolled micro-MMM code is assigned to a scalar. The array accesses in the micro... |

220 | Tile size selection using cache organization and data layout
- Coleman, McKinley
- 1995
(Show Context)
Citation Context ...of smaller sub-blocks. The size of each subblock is NB × NB, where NB is an optimization parameter that needs to be chosen so that the working set of the sub-blocks being multiplied fits in the cache =-=[4,7,18]-=-. We call the resulting code mini-MMM.s- Register blocking: The mini-MMM code itself is blocked and then unrolled to optimize the utilization of the registers. The resulting code, that we call micro-M... |

214 | Improving register allocation for subscripted variables
- Callahan, Carr, et al.
- 1990
(Show Context)
Citation Context ...trix of C. MU and NU are optimization parameters that must be chosen so that MU + NU + MU × NU fit in the registers of the processor [2]. To improve register allocation, ATLAS uses scalar replacement =-=[5]-=-: each element of A, B and C that is accessed in the unrolled micro-MMM code is assigned to a scalar. The array accesses in the micro-MMM code are replaced by these scalar variables. ATLAS expects tha... |

162 |
The Bayesian Choice
- Robert
- 2001
(Show Context)
Citation Context ... the probability distribution over β after we see the sample D using Bayes’ rule as follows: P (β|D) = P (D|β)π(β)/P (D) ∝ P (D|β)π(l1, l2) and picking the regression curve � β that maximizes P (β|D) =-=[14]-=-. Notice that maximizing the posterior involves a trade-off between fitting the data (P (D|β)) and respecting the prediction of the model (π(l1, l2)). � n P (D|β) = ( 1 √ 2πσ 2 )n e − i = 1 (performan... |

128 |
Iteration Space Tiling for Memory Hierarchies
- Wolfe
- 1989
(Show Context)
Citation Context ...his transformation converts matrix multiplication into a sequence of smaller matrix multiplications. Blocking can be accomplished by a loop transformation called tiling, which was introduced by Wolfe =-=[18]-=-. ATLAS applies blocking at the cache and the register level: - Cache Blocking: ATLAS uses blocking to decompose the matrix multiplication of large matrices into the multiplication of smaller sub-bloc... |

119 | Compiler optimization-space exploration
- Triantafyllis, Vachharajani, et al.
- 2003
(Show Context)
Citation Context ...brary installation time (for instance, small blocking parameter values will be selected if the user only multiplies small matrices). 2) Efficient adaptation can be applied at the time of compilation. =-=[16]-=- describes a compile-time optimization framework that employs empirical search which receives performance feedback from a fast estimator. 3) The space of possible versions can be too large even for on... |

96 | Adaptive optimizing compilers for the 21st century
- Cooper, Subramanian, et al.
- 2002
(Show Context)
Citation Context ...k Machine learning has been applied to construct adaptive compiler optimizers before. Cooper et. al., for example, use genetic algoithms to search through sequences of optimizing code transformations =-=[9]-=-. Using genetic algorithms (and other machine learning optimization algorithms) can be time-consuming in a large space of possible optimizations. These techniques have also been extended to search for... |

91 | A comparison of empirical and model-driven optimization
- Yotov, Li, et al.
- 2003
(Show Context)
Citation Context ... performs a near-exhaustive sampling of a region of the parameter space. It is this module that we replace in our experiments. In one experimental condition it is replaced by a leading analytic model =-=[20]-=-, in a second it is replaced by our adaptive system, and in a third the original ATLAS routine is employed. In all three cases the remainder of the MMM generation code is unchanged as are the routines... |

88 | SPL: A Language and Compiler for DSP algorithms
- Xiong, Johnson, et al.
- 2001
(Show Context)
Citation Context ...neration where the high cost of optimal configuration decisions can be paid once. Well-known library generators that employ empirical optimization include FFTW [12], ATLAS [17], PhiPAC [3] and SPIRAL =-=[19]-=-. An alternative decision procedure is an adaptive hybrid which includes only the prior information from the designer which he or she is most confident of. The rest is then filled in empirically. The ... |

63 | R.: Dynamic feedback: An effective technique for adaptive computing
- Diniz, Rinard
- 1997
(Show Context)
Citation Context ...earch involves measuring the performance of various versions of pre-compiled code during the sampling phase of the executing, and then using the best version during the (much longer) production phase =-=[11]-=-. Note that runtime searching tailors the optimization system to the requirements of the user not available at library installation time (for instance, small blocking parameter values will be selected... |

54 | Is search really necessary to generate high-performance blas
- Yotov, Li, et al.
- 2005
(Show Context)
Citation Context ...We also compare the experimental results obtained by our approach with thesresults obtained by Yotov’s model. Thus, in this Section we summarize it. A further description of the model can be found in =-=[20,21]-=-. The model depends on accurate estimates of machine parameters that include the L1 cache and line size, the number of registers, the latency of the multiply instruction, the existence of a fused mult... |

51 | pen)-ultimate tiling
- Boulet, Darte, et al.
- 1994
(Show Context)
Citation Context ...of smaller sub-blocks. The size of each subblock is NB × NB, where NB is an optimization parameter that needs to be chosen so that the working set of the sub-blocks being multiplied fits in the cache =-=[4,7,18]-=-. We call the resulting code mini-MMM.s- Register blocking: The mini-MMM code itself is blocked and then unrolled to optimize the utilization of the registers. The resulting code, that we call micro-M... |

51 | Investigating Explanation-Based Learning
- DeJong
- 1993
(Show Context)
Citation Context ...is the motivation and the subject of our current research which we offer as the first tentative steps along a lengthy but, we believe, promising path. We employ an Explanation-Based Learning paradigm =-=[10]-=-. Empirical results are treated as illustrations or manifestations of a deeper pattern to be discovered. They are explained in terms of the existing partial model and therefore serve to refine the mod... |

46 | A dynamically tuned sorting library
- Li, Garzaran, et al.
- 2004
(Show Context)
Citation Context ...be time-consuming in a large space of possible optimizations. These techniques have also been extended to search for entire versions of algorithms, as opposed to just code transformations. Li et. al. =-=[13]-=- present a two-phase algorithm for optimizing sorting. The first (offline) phase performs a search to construct a mapping from the parameters of a sorted array (its data entropy and size) to the best-... |

32 | Improving software pipelining with unroll-and-jam
- Carr, Ding, et al.
- 1996
(Show Context)
Citation Context ...the processor can continue executing instructions that do not depend on the missed data. A larger block size also increases the opportunity for higher ILP and for the compiler to reorder instructions =-=[6]-=-. Notice that tiling for L2 may not always be the best choice, because large tiles can result in more time spent in the cleanup code, which can degrade performance for some of the codes calling the MM... |

27 | L.: A framework for adaptive algorithm selection in STAPL
- Thomas, Tanase, et al.
- 2005
(Show Context)
Citation Context ... ATLAS Search, Model, and Adaptive Search. apply the best sorting algorithm to the given array at runtime. A similar framework was applied by Thomas et. al. to optimize parallel matrix multiplication =-=[15]-=-. An important feature which distinguishes our approach to searching is explicit integration of information from the analytic model to guide the search, thereby reducing its time. We believe that adap... |

9 | Investigating Adaptive Compilation using the MIPSPro Compiler
- Cooper, Waterman
- 2003
(Show Context)
Citation Context ...<= N; i + +) for (k = 1; k <= K; k + +) C[i][j] = C[i][j] + A[i][k] ∗ B[k][j] Fig. 1. Matrix Multiplication Code The code implementing a MMM is shown in Figure 1. Yotov et al [20,21] and Cooper et al =-=[8]-=- found that computing this matrix multiplication using the library generated by ATLAS results in higher performance than that obtained when the naive MMM implementation in Figure 1 is compiled using a... |