Results 1  10
of
16
An Iterated Eigenvalue Algorithm for Approximating Roots of Univariate Polynomials
 J. Symbolic Comput
, 2001
"... We present an iterative algorithm that approximates all roots of a univariate polynomial. The iteration uses floatingpoint eigenvalue computation of a generalized companion matrix. With some assumptions, we show that the algorithm approximates the roots within about log ae=ffl (P ) iterations, w ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
We present an iterative algorithm that approximates all roots of a univariate polynomial. The iteration uses floatingpoint eigenvalue computation of a generalized companion matrix. With some assumptions, we show that the algorithm approximates the roots within about log ae=ffl (P ) iterations, where ffl is the relative error of floatingpoint arithmetic, ae is the relative separation of the roots, and (P ) is the condition number of the polynomial. Each iteration requires an n\Thetan floatingpoint eigenvalue computation, n the polynomial degree, and evaluation of the polynomial to floatingpoint accuracy at up to n points. We describe a careful implementation of the algorithm, including many techniques that contribute to the practical efficiency of the algorithm. On some hard examples of illconditioned polynomials, e.g. highdegree Wilkinson polynomials, the implementation is an order of magnitude faster than the BiniFiorentino implementation mpsolve. 1
ARCHITECTUREAWARE CLASSICAL TAYLOR SHIFT BY 1
, 2005
"... We present algorithms that outperform straightforward implementations of classical Taylor shift by 1. For input polynomials of low degrees a method of the SACLIB library is faster than straightforward implementations by a factor of at least 2; for higher degrees we develop a method that is faster th ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
We present algorithms that outperform straightforward implementations of classical Taylor shift by 1. For input polynomials of low degrees a method of the SACLIB library is faster than straightforward implementations by a factor of at least 2; for higher degrees we develop a method that is faster than straightforward implementations by a factor of up to 7. Our Taylor shift algorithm requires more word additions than straightforward implementations but it reduces the number of cycles per word addition by reducing memory tra c and the number of carry computations. The introduction of signed digits, suspended normalization, radix reduction, and delayed carry propagation enables our algorithm to take advantage of the technique of register tiling which is commonly used by optimizing compilers. While our algorithm is written in a highlevel language, it depends on several parameters that can be tuned to the underlying architecture.
Polynomial Root Finding Using Iterated Eigenvalue Computation
 in « Proc. ISSAC », NewYork, ACM
, 2001
"... We present a novel iterative algorithm that approximates all roots of a univariate polynomial. The iteration uses floatingpoint eigenvalue computation of a generalized companion matrix. With some assumptions, we show that the algorithm approximates the roots to floatingpoint accuracy within about ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
We present a novel iterative algorithm that approximates all roots of a univariate polynomial. The iteration uses floatingpoint eigenvalue computation of a generalized companion matrix. With some assumptions, we show that the algorithm approximates the roots to floatingpoint accuracy within about log ae=ffl (P ) iterations, where ffl is the relative error of floatingpoint arithmetic, ae is the relative separation of the roots, and (P ) is the condition number of the polynomial. Each iteration requires an n\Thetan floatingpoint eigenvalue computation, n the polynomial degree, and evaluation of the polynomial to floatingpoint accuracy at n points. On some hard examples of illconditioned polynomials, e.g. highdegree Wilkinson polynomials, a careful implementation of the algorithm is an order of magnitude faster than the best alternative. 1 Introduction The algorithmic problem of approximating the roots of a univariate polynomial, presented by its coefficients, is classic in numeri...
EnergyEfficient Software Implementation of Long Integer Modular Arithmetic
 CRYPTOGRAPHIC HARDWARE AND EMBEDDED SYSTEMS  CHES 2005
, 2005
"... This paper investigates performance and energy characteristics of software algorithms for long integer arithmetic. We analyze and compare the number of RISClike processor instructions (e.g. singleprecision multiplication, addition, load, and store instructions) required for the execution of differ ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
This paper investigates performance and energy characteristics of software algorithms for long integer arithmetic. We analyze and compare the number of RISClike processor instructions (e.g. singleprecision multiplication, addition, load, and store instructions) required for the execution of different algorithms such as Schoolbook multiplication, Karatsuba and Comba multiplication, as well as Montgomery reduction. Our analysis shows that a combination of KaratsubaComba multiplication and Montgomery reduction (the socalled KCM method) allows to achieve better performance than other algorithms for modular multiplication. Furthermore, we present a simple model to compare the energyefficiency of arithmetic algorithms. This model considers the clock cycles and average current consumption of the base instructions to estimate the overall amount of energy consumed during the execution of an algorithm. Our experiments, conducted on a StrongARM SA1100 processor, indicate that a 1024bit KCM multiplication consumes about 22% less energy than other modular multiplication techniques.
A Binary Recursive Gcd Algorithm
 Proceedings of ANTS’04, Lecture Notes in Computer Science
, 2004
"... Abstract. The binary algorithm is a variant of the Euclidean algorithm that performs well in practice. We present a quasilinear time recursive algorithm that computes the greatest common divisor of two integers by simulating a slightly modified version of the binary algorithm. The structure of our ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Abstract. The binary algorithm is a variant of the Euclidean algorithm that performs well in practice. We present a quasilinear time recursive algorithm that computes the greatest common divisor of two integers by simulating a slightly modified version of the binary algorithm. The structure of our algorithm is very close to the one of the wellknown KnuthSchönhage fast gcd algorithm; although it does not improve on its O(M(n) log n) complexity, the description and the proof of correctness are significantly simpler in our case. This leads to a simplification of the implementation and to better running times. 1
Fast and Efficient Generation of Loop Bounds
 PROCEEDINGS OF PARCO '93, ELSEVIER SCIENCE PUBLISHERS (NORTH
, 1993
"... Current loop generation techniques are based on FourierMotzkin pairwaise elimination, which is known to be very memory and computationintensive. In this paper we explore an alternative way: the use of parametric linear programming, which allows to separate the computation of distinct loop bounds ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Current loop generation techniques are based on FourierMotzkin pairwaise elimination, which is known to be very memory and computationintensive. In this paper we explore an alternative way: the use of parametric linear programming, which allows to separate the computation of distinct loop bounds and leads to a parallel algorithm for loop generation.
HIGHPERFORMANCE IMPLEMENTATIONS OF THE DESCARTES METHOD
, 2006
"... The Descartes method for polynomial real root isolation can be performed with respect to monomial bases and with respect to Bernstein bases. The first variant uses Taylor shift by 1 as its main subalgorithm, the second uses de Casteljau’s algorithm. When applied to integer polynomials, the two vari ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
The Descartes method for polynomial real root isolation can be performed with respect to monomial bases and with respect to Bernstein bases. The first variant uses Taylor shift by 1 as its main subalgorithm, the second uses de Casteljau’s algorithm. When applied to integer polynomials, the two variants have codominant, almost tight computing time bounds. Implementations of either variant can obtain speedups over previous stateoftheart implementations by more than an order of magnitude if they use features of the processor architecture. We present an implementation of the Bernsteinbases variant of the Descartes method that automatically generates architectureaware highlevel code and leaves further optimizations to the compiler. We compare the performance of our implementation, algorithmically tuned implementations of the monomial and Bernstein variants, and architectureunaware implementations of both variants on four different processor architectures and for three classes of input polynomials.
Cs: a MuPAD package for counting and randomly generating combinatorial structures
 In Proceedings of FPSAC'98
, 1998
"... We present a new computer algebra package which permits to count and to generate combinatorial structures of various types, provided that these structures can be described by a speci cation, as de ned in [7]. Resume Nous presentons un nouveau module de calcul formel dedie audenombrement etala genera ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We present a new computer algebra package which permits to count and to generate combinatorial structures of various types, provided that these structures can be described by a speci cation, as de ned in [7]. Resume Nous presentons un nouveau module de calcul formel dedie audenombrement etala generation aleatoire uniforme de structures combinatoires decomposables. 1 What is CS? CS is a computer algebra package devoted to the handling of combinatorial structures. Its main features are the following: given a combinatorial speci cation of a class of decomposable structures (in the sense of [7]), CS is able to count and uniformly draw at random the structures of any given size n. It can also give some properties of the associated generating series, like recurrences and di erential equations. A speci cation of a class of combinatorial structures, as de ned in [7], is a set of productions made from basic objects (atoms) (Epsilon and Z of size 0 and 1 respectively) and from
An Efficient LLL Gram Using Buffered Transformations
"... Abstract. In this paper we introduce an improved variant of the LLL algorithm. Using the Gram matrix to avoid expensive correction steps necessary in the SchnorrEuchner algorithm and introducing the use of buffered transformations allows us to obtain a major improvement in reduction time. Unlike pr ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Abstract. In this paper we introduce an improved variant of the LLL algorithm. Using the Gram matrix to avoid expensive correction steps necessary in the SchnorrEuchner algorithm and introducing the use of buffered transformations allows us to obtain a major improvement in reduction time. Unlike previous work, we are able to achieve the improvement while obtaining a strong reduction result and maintaining the stability of the reduction algorithm. 1
Using the Parallel Karatsuba Algorithm for Long Integer Multiplication and Division
 In European Conference on Parallel Processing
, 1997
"... . We experiment with sequential and parallel versions of the Karatsuba multiplication algorithm implemented under the paclib computer algebra system on a Sequent Symmetry sharedmemory architecture. In comparison with the classical multiplication algorithm, the sequential version gives a speedup of ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
. We experiment with sequential and parallel versions of the Karatsuba multiplication algorithm implemented under the paclib computer algebra system on a Sequent Symmetry sharedmemory architecture. In comparison with the classical multiplication algorithm, the sequential version gives a speedup of 2 at 50 words, up to 5 at 500 words. On 9 processors, the parallel Karatsuba algorithm exhibits a combined speedup of 10 (50 words) up to 40 (500 words). Moreover, we use the Karatsuba algorithm within long integer division with remainder, using a recent divideandconquer technique which delays part of the dividend updates until they can be performed by multiplication between large operands. The sequential algorithm is about two times slower than Karatsuba multiplication and shows a speedup of 2 at 200 words and of 3 at 500 words, when compared to the classical division method. Using parallel multiplication on 9 processors leads to a combined speedup of almost 3 at 100 words and more th...