Results 1 
6 of
6
Is Search Really Necessary to Generate HighPerformance BLAS?
, 2005
"... Abstract — A key step in program optimization is the estimation of optimal values for parameters such as tile sizes and loop unrolling factors. Traditional compilers use simple analytical models to compute these values. In contrast, library generators like ATLAS use global search over the space of p ..."
Abstract

Cited by 42 (8 self)
 Add to MetaCart
Abstract — A key step in program optimization is the estimation of optimal values for parameters such as tile sizes and loop unrolling factors. Traditional compilers use simple analytical models to compute these values. In contrast, library generators like ATLAS use global search over the space of parameter values by generating programs with many different combinations of parameter values, and running them on the actual hardware to determine which values give the best performance. It is widely believed that traditional modeldriven optimization cannot compete with searchbased empirical optimization because tractable analytical models cannot capture all the complexities of modern highperformance architectures, but few quantitative comparisons have been done to date. To make such a comparison, we replaced the global search engine in ATLAS with a modeldriven optimization engine, and measured the relative performance of the code produced by the two systems on a variety of architectures. Since both systems use the same code generator, any differences in the performance of the code produced by the two systems can come only from differences in optimization parameter values. Our experiments show that modeldriven optimization can be surprisingly effective, and can generate code with performance comparable to that of code generated by ATLAS using global search. Index Terms — program optimization, empirical optimization, modeldriven optimization, compilers, library generators, BLAS, highperformance computing
Exploring the structure of the space of compilation sequences using randomized search algorithms
 Proc of the 2004 Los Alamos Computer Science Institute (LACSI) Symposium
, 2004
"... Modern optimizing compilers apply a fixed sequence of optimizations, which we call a compilation sequence, to each program that they compile. These compilers let the user modify their behavior in a small number of specified ways, using commandline flags (e.g.,O1,O2,...). For five years, we have b ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Modern optimizing compilers apply a fixed sequence of optimizations, which we call a compilation sequence, to each program that they compile. These compilers let the user modify their behavior in a small number of specified ways, using commandline flags (e.g.,O1,O2,...). For five years, we have been working with compilers that automatically select an appropriate compilation sequence for each input program. These adaptive compilers discover a good compilation sequence tailored to the input program, the target machine, and a userchosen objective function. We have shown, as have others, that programspecific sequences can produce better results than any single universal sequence [1, 23, 7, 10, 21] Our adaptive compiler looks for compilation sequences in a large and complex search space. Its typical compilation sequence includes 10 passes (with possible repeats) chosen from the 16 available—there are 16 10 or 1,099,511,627,776 such sequences. To learn about the properties of such spaces, we have studied subspaces that consist of 10 passes drawn from a set of 5 (5 10 or 9,765,625 sequences). These 10of5 subspaces are small enough that we can analyze them thoroughly but large enough to reflect important properties of the full spaces. This paper reports, in detail, on our analysis of several of these subspaces and on the consequences of those observed properties for the design of search algorithms. 1 Compilation Sequences Compilers operate by applying a fixed sequence of optimizations, called a compilation sequence, to all programs. The compiler writer must select ten to twenty optimizations from the hundreds that have been pro
unknown title
"... A performance comparison between the Earth Simulator and other terascale systems on a characteristic ASCI workload ‡ ..."
Abstract
 Add to MetaCart
A performance comparison between the Earth Simulator and other terascale systems on a characteristic ASCI workload ‡
PERFORMANCE OPTIMIZATION OF SYMMETRIC FACTORIZATION ALGORITHMS
"... Abstract. Nonlinear optimization algorithms that use Newton’s method to determine the search direction exhibit quadratic convergence locally. In the predominant case where the Hessian is positive definite, Cholesky factorization is a computationally efficient algorithm for evaluating the Newton sear ..."
Abstract
 Add to MetaCart
Abstract. Nonlinear optimization algorithms that use Newton’s method to determine the search direction exhibit quadratic convergence locally. In the predominant case where the Hessian is positive definite, Cholesky factorization is a computationally efficient algorithm for evaluating the Newton search direction −∇2f(x (k)) −1∇f(x (k)). If the Hessian is indefinite, then modified Cholesky algorithms make use of symmetric indefinite factorization to perturb the Hessian such that it is sufficiently positive definite and reasonably wellconditioned, while preserving as much as possible the information contained in the Hessian. This paper measures and compares the performance of algorithms implementing Cholesky factorization, symmetric indefinite factorization and modified Cholesky factorization. From these performance data we estimate the work (runtime) involved in symmetric pivoting and modifying the symmetric indefinite factorization. Furthermore, we evaluate the effect of the degree of indefiniteness of the symmetric matrix on performance. For each of these matrix factorizations we developed routines that implement a variety of performance optimization techniques including loop reordering, blocking, and the use of tuned Basic Linear Algebra Subroutines. 1. Introduction. Nonlinear
High Performance Computing Education for Students in Computational Engineering
"... Numerical simulation using high performance computing has become a key technology for many scientific disciplines. Consequently, high performance computing courses constitute an essential component within the undergraduate and graduate programs in Computational Engineering at University of Erlangen ..."
Abstract
 Add to MetaCart
Numerical simulation using high performance computing has become a key technology for many scientific disciplines. Consequently, high performance computing courses constitute an essential component within the undergraduate and graduate programs in Computational Engineering at University of ErlangenNuremberg. These courses are also offered as optional courses in other degree programs, such as for majors in computer science. 1 The Erlangen Computational Engineering Program The courses in high performance computing at University of ErlangenNuremberg are primarily motivated by the the Computational Engineering (CE) program that has been initiated by the Department of Computer Science in 1997 as a prototype twoyear postgraduate program leading to a Master degree. The corresponding undergraduate program has been started in 1999. Together these two programs are accepting approximately 30 new undergraduate students and 45 graduate students, annually. The traditional German university degree in the sciences and the engineering disciplines is the Diplom which corresponds approximately to the academic level of a Master degree in the US
Studienarbeit Cache Optimizations for the Lattice Boltzmann Method in 2D
"... Quellen angefertigt habe und dass die Arbeit in gleicher oder ähnlicher Form noch keiner anderen Prüfungsbehörde vorgelegen hat und von dieser als Teil einer Prüfungsleistung angenommen wurde. Alle Ausführungen, die wörtlich oder sinngemäß übernommen wurden, sind als solche gekennzeichnet. ..."
Abstract
 Add to MetaCart
Quellen angefertigt habe und dass die Arbeit in gleicher oder ähnlicher Form noch keiner anderen Prüfungsbehörde vorgelegen hat und von dieser als Teil einer Prüfungsleistung angenommen wurde. Alle Ausführungen, die wörtlich oder sinngemäß übernommen wurden, sind als solche gekennzeichnet.