Results 1 -
2 of
2
ARCHITECTURE-AWARE CLASSICAL TAYLOR SHIFT BY 1
, 2005
"... We present algorithms that outperform straightforward implementations of classical Taylor shift by 1. For input polynomials of low degrees a method of the SACLIB library is faster than straightforward implementations by a factor of at least 2; for higher degrees we develop a method that is faster th ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
We present algorithms that outperform straightforward implementations of classical Taylor shift by 1. For input polynomials of low degrees a method of the SACLIB library is faster than straightforward implementations by a factor of at least 2; for higher degrees we develop a method that is faster than straightforward implementations by a factor of up to 7. Our Taylor shift algorithm requires more word additions than straightforward implementations but it reduces the number of cycles per word addition by reducing memory tra c and the number of carry computations. The introduction of signed digits, suspended normalization, radix reduction, and delayed carry propagation enables our algorithm to take advantage of the technique of register tiling which is commonly used by optimizing compilers. While our algorithm is written in a high-level language, it depends on several parameters that can be tuned to the underlying architecture.
Loop Transformation Recipes for Code Generation and Auto-Tuning
"... Abstract. In this paper, we describe transformation recipes, which provide a high-level interface to the code transformation and code generation capability of a compiler. These recipes can be generated by compiler decision algorithms or savvy software developers. This interface is part of an auto-tu ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract. In this paper, we describe transformation recipes, which provide a high-level interface to the code transformation and code generation capability of a compiler. These recipes can be generated by compiler decision algorithms or savvy software developers. This interface is part of an auto-tuning framework that explores a set of different implementations of the same computation and automatically selects the best-performing implementation. Along with the original computation, a transformation recipe specifies a range of implementations of the computation resulting from composing a set of high-level code transformations. In our system, an underlying polyhedral framework coupled with transformation algorithms takes this set of transformations, composes them and automatically generates correct code. We first describe an abstract interface for transformation recipes, which we propose to facilitate interoperability with other transformation frameworks. We then focus on the specific transformation recipe interface used in our compiler and present performance results on its application to kernel and library tuning and tuning of key computations in high-end applications. We also show how this framework can be used to generate and auto-tune parallel OpenMP or CUDA code from a high-level specification. 1

