Results 1 
4 of
4
ARCHITECTUREAWARE CLASSICAL TAYLOR SHIFT BY 1
, 2005
"... We present algorithms that outperform straightforward implementations of classical Taylor shift by 1. For input polynomials of low degrees a method of the SACLIB library is faster than straightforward implementations by a factor of at least 2; for higher degrees we develop a method that is faster th ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
We present algorithms that outperform straightforward implementations of classical Taylor shift by 1. For input polynomials of low degrees a method of the SACLIB library is faster than straightforward implementations by a factor of at least 2; for higher degrees we develop a method that is faster than straightforward implementations by a factor of up to 7. Our Taylor shift algorithm requires more word additions than straightforward implementations but it reduces the number of cycles per word addition by reducing memory tra c and the number of carry computations. The introduction of signed digits, suspended normalization, radix reduction, and delayed carry propagation enables our algorithm to take advantage of the technique of register tiling which is commonly used by optimizing compilers. While our algorithm is written in a highlevel language, it depends on several parameters that can be tuned to the underlying architecture.
An Analysis of Lehmer's Euclidean GCD Algorithm
 Proceedings Of The 1995 International Symposium On Symbolic And Algebraic Computation
, 1995
"... Let u and v be positive integers. We show that a slightly modified version of D. H. Lehmer's greatest common divisor algorithm will compute gcd(u; v) (with u ? v) using at most Of(log u log v)=k + k log v + log u + k 2 g bit operations and O(log u + k2 2k ) space, where k is the number of bits ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
Let u and v be positive integers. We show that a slightly modified version of D. H. Lehmer's greatest common divisor algorithm will compute gcd(u; v) (with u ? v) using at most Of(log u log v)=k + k log v + log u + k 2 g bit operations and O(log u + k2 2k ) space, where k is the number of bits in the multiprecision base of the algorithm. This is faster than Euclid's algorithm by a factor that is roughly proportional to k. Letting n be the number of bits in u and v, and setting k = b(log n)=4c, we obtain a subquadratic running time of O(n 2 = log n) in linear space. 1 Introduction Let u and v be positive integers. The greatest common divisor (GCD) of u and v is the largest integer d such that d divides both u and v. The most wellknown algorithm for computing GCDs is the Euclidean Algorithm. Much is known about this algorithm: the number of iterations required is \Theta(log v), and the worstcase running time is \Theta(log u log v), where time is measured in bit operation...
An analysis of the generalized binary GCD algorithm
 HIGH PRIMES AND MISDEMEANORS, LECTURES IN HONOUR OF HUGH COWIE
, 2007
"... In this paper we analyze a slight modification of Jebelean’s version of the kary GCD algorithm. Jebelean had shown that on nbit inputs, the algorithm runs in O(n²) time. In this paper, we show that the average running time of our modified algorithm is O(n²/ log n). This analysis involves explori ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
In this paper we analyze a slight modification of Jebelean’s version of the kary GCD algorithm. Jebelean had shown that on nbit inputs, the algorithm runs in O(n²) time. In this paper, we show that the average running time of our modified algorithm is O(n²/ log n). This analysis involves exploring the behavior of spurious factors introduced during the main loop of the algorithm. We also introduce a Jebeleanstyle leftshift kary GCD algorithm with a similar complexity that performs well in practice.
Isoefficiency and the Parallel Descartes Method
 Proc. 17th Annual Symp. Computational Geometry, ACM
, 2001
"... Introduction The efficiency of a parallel algorithm with input x on P 1 processors is defined as E(x; P ) = T (x; 1) P T (x; P ) where T (x; P ) denotes the time it takes to perform the computation using P processors and T (x; 1) is the sequential execution time. The efficiency of many paralle ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Introduction The efficiency of a parallel algorithm with input x on P 1 processors is defined as E(x; P ) = T (x; 1) P T (x; P ) where T (x; P ) denotes the time it takes to perform the computation using P processors and T (x; 1) is the sequential execution time. The efficiency of many parallel algorithms decreases when the number of processors increases and the sequential execution time is fixed; likewise, the efficiency increases when the sequential computing time increases and the number of processors is fixed. The term scalability refers to this change of efficiency (Sahni & Thanvantri, 1996). Intuitively, a parallel algorithm is scalable if it stays efficient when the number of processors and the sequential execution time are both increased. O