Results 1 -
4 of
4
ARCHITECTURE-AWARE CLASSICAL TAYLOR SHIFT BY 1
, 2005
"... We present algorithms that outperform straightforward implementations of classical Taylor shift by 1. For input polynomials of low degrees a method of the SACLIB library is faster than straightforward implementations by a factor of at least 2; for higher degrees we develop a method that is faster th ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
We present algorithms that outperform straightforward implementations of classical Taylor shift by 1. For input polynomials of low degrees a method of the SACLIB library is faster than straightforward implementations by a factor of at least 2; for higher degrees we develop a method that is faster than straightforward implementations by a factor of up to 7. Our Taylor shift algorithm requires more word additions than straightforward implementations but it reduces the number of cycles per word addition by reducing memory tra c and the number of carry computations. The introduction of signed digits, suspended normalization, radix reduction, and delayed carry propagation enables our algorithm to take advantage of the technique of register tiling which is commonly used by optimizing compilers. While our algorithm is written in a high-level language, it depends on several parameters that can be tuned to the underlying architecture.
HIGH-PERFORMANCE IMPLEMENTATIONS OF THE DESCARTES METHOD
, 2006
"... The Descartes method for polynomial real root isolation can be performed with respect to monomial bases and with respect to Bernstein bases. The first variant uses Taylor shift by 1 as its main subalgorithm, the second uses de Casteljau’s algorithm. When applied to integer polynomials, the two vari ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
The Descartes method for polynomial real root isolation can be performed with respect to monomial bases and with respect to Bernstein bases. The first variant uses Taylor shift by 1 as its main subalgorithm, the second uses de Casteljau’s algorithm. When applied to integer polynomials, the two variants have co-dominant, almost tight computing time bounds. Implementations of either variant can obtain speed-ups over previous state-of-the-art implementations by more than an order of magnitude if they use features of the processor architecture. We present an implementation of the Bernstein-bases variant of the Descartes method that automatically generates architecture-aware high-level code and leaves further optimizations to the compiler. We compare the performance of our implementation, algorithmically tuned implementations of the monomial and Bernstein variants, and architecture-unaware implementations of both variants on four different processor architectures and for three classes of input polynomials.

