Results 1 -
2 of
2
Is Search Really Necessary to Generate High-Performance BLAS?
, 2005
"... Abstract — A key step in program optimization is the estimation of optimal values for parameters such as tile sizes and loop unrolling factors. Traditional compilers use simple analytical models to compute these values. In contrast, library generators like ATLAS use global search over the space of p ..."
Abstract
-
Cited by 32 (7 self)
- Add to MetaCart
Abstract — A key step in program optimization is the estimation of optimal values for parameters such as tile sizes and loop unrolling factors. Traditional compilers use simple analytical models to compute these values. In contrast, library generators like ATLAS use global search over the space of parameter values by generating programs with many different combinations of parameter values, and running them on the actual hardware to determine which values give the best performance. It is widely believed that traditional model-driven optimization cannot compete with search-based empirical optimization because tractable analytical models cannot capture all the complexities of modern high-performance architectures, but few quantitative comparisons have been done to date. To make such a comparison, we replaced the global search engine in ATLAS with a model-driven optimization engine, and measured the relative performance of the code produced by the two systems on a variety of architectures. Since both systems use the same code generator, any differences in the performance of the code produced by the two systems can come only from differences in optimization parameter values. Our experiments show that model-driven optimization can be surprisingly effective, and can generate code with performance comparable to that of code generated by ATLAS using global search. Index Terms — program optimization, empirical optimization, model-driven optimization, compilers, library generators, BLAS, high-performance computing
Exploring the structure of the space of compilation sequences using randomized search algorithms
- Proc of the 2004 Los Alamos Computer Science Institute (LACSI) Symposium
, 2004
"... Modern optimizing compilers apply a fixed sequence of optimizations, which we call a compilation sequence, to each program that they compile. These compilers let the user modify their behavior in a small number of specified ways, using command-line flags (e.g.,-O1,-O2,...). For five years, we have b ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Modern optimizing compilers apply a fixed sequence of optimizations, which we call a compilation sequence, to each program that they compile. These compilers let the user modify their behavior in a small number of specified ways, using command-line flags (e.g.,-O1,-O2,...). For five years, we have been working with compilers that automatically select an appropriate compilation sequence for each input program. These adaptive compilers discover a good compilation sequence tailored to the input program, the target machine, and a user-chosen objective function. We have shown, as have others, that program-specific sequences can produce better results than any single universal sequence [1, 23, 7, 10, 21] Our adaptive compiler looks for compilation sequences in a large and complex search space. Its typical compilation sequence includes 10 passes (with possible repeats) chosen from the 16 available—there are 16 10 or 1,099,511,627,776 such sequences. To learn about the properties of such spaces, we have studied subspaces that consist of 10 passes drawn from a set of 5 (5 10 or 9,765,625 sequences). These 10of-5 subspaces are small enough that we can analyze them thoroughly but large enough to reflect important properties of the full spaces. This paper reports, in detail, on our analysis of several of these subspaces and on the consequences of those observed properties for the design of search algorithms. 1 Compilation Sequences Compilers operate by applying a fixed sequence of optimizations, called a compilation sequence, to all programs. The compiler writer must select ten to twenty optimizations from the hundreds that have been pro-

