Results 1 
8 of
8
Evaluating polynomials in several variables and their derivatives on a GPU computing processor
, 2012
"... ..."
Hardware Acceleration Technologies in Computer Algebra: Challenges and Impact
, 2013
"... The objective of high performance computing (HPC) is to ensure that the computational power of hardware resources is well utilized to solve a problem. Various techniques are usually employed to achieve this goal. Improvement of algorithm to reduce the number of arithmetic operations, modifications ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
The objective of high performance computing (HPC) is to ensure that the computational power of hardware resources is well utilized to solve a problem. Various techniques are usually employed to achieve this goal. Improvement of algorithm to reduce the number of arithmetic operations, modifications in accessing data or rearrangement of data in order to reduce memory traffic, code optimization at all levels, designing parallel algorithms with smaller span or reduced overhead are some of the attractive areas that HPC researchers are working on. In this thesis, we investigate HPC techniques for the implementation of basic routines in computer algebra targeting hardware acceleration technologies. We start with a sorting algorithm and its application to sparse matrixvector multiplication for which we focus on work on cache complexity issues. Since basic routines in computer algebra often provide a lot of fine grain parallelism, we then turn our attention to manycore architectures on which we consider dense polynomial and matrix operations ranging from plain to fast arithmetic. Most of these operations are combined within a bivariate system solver running entirely on a graphics processing unit (GPU).
Orthogonalization on a General Purpose Graphics Processing Unit with Double Double and Quad Double Arithmetic
, 2013
"... Our problem is to accurately solve linear systems on a general purpose graphics processing unit with double double and quad double arithmetic. The linear systems originate from the application of Newton’s method on polynomial systems. Newton’s method is applied as a corrector in a path following met ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Our problem is to accurately solve linear systems on a general purpose graphics processing unit with double double and quad double arithmetic. The linear systems originate from the application of Newton’s method on polynomial systems. Newton’s method is applied as a corrector in a path following method, so the linear systems are solved in sequence and not simultaneously. One solution path may require the solution of thousands of linear systems. In previous work we reported good speedups with our implementation to evaluate and differentiate polynomial systems on the NVIDIA Tesla C2050. Although the cost of evaluation and differentiation often dominates the cost of linear system solving in Newton’s method, because of the limited bandwidth of the communication between CPU and GPU, we cannot afford to send the linear system to the CPU for solving during path tracking. Because of large degrees, the Jacobian matrix may contain extreme values, requiring extended precision, leading to a significant overhead. This overhead of multiprecision arithmetic is our main motivation to develop a massively parallel algorithm. To allow overdetermined linear systems we solve linear systems in the least squares sense, computing the QR decomposition of the matrix by the modified GramSchmidt algorithm. We describe our implementation of the modified GramSchmidt orthogonalization method for the NVIDIA Tesla C2050, using double double and quad double arithmetic. Our experimental results show that the achieved speedups are sufficiently high to compensate for the overhead of one extra level of precision.
On The Parallelization Of Integer Polynomial Multiplication
, 2014
"... With the advent of hardware accelerator technologies, multicore processors and GPUs, much effort for taking advantage of those architectures by designing parallel algorithms has been made. To achieve this goal, one needs to consider both algebraic complexity and parallelism, plus making efficient ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
With the advent of hardware accelerator technologies, multicore processors and GPUs, much effort for taking advantage of those architectures by designing parallel algorithms has been made. To achieve this goal, one needs to consider both algebraic complexity and parallelism, plus making efficient use of memory traffic, cache, and reducing overheads in the implementations. Polynomial multiplication is at the core of many algorithms in symbolic computation such as real root isolation which will be our main application for now. In this thesis, we first investigate the multiplication of dense univariate polynomials with integer coefficients targeting multicore processors. Some of the proposed methods are based on wellknown serial classical algorithms, whereas a novel algorithm is designed to make efficient use of the targeted hardware. Experimentation confirms our theoretical analysis. Second, we report on the first implementation of subproduct tree techniques on manycore architectures. These techniques are basically another application of polynomial multiplication, but over a prime field. This technique is used in multipoint evaluation and interpolation of polynomials with coefficients over a prime field.
An application of regular chain theory to the study of limit cycles
 INTERNATIONAL JOURNAL OF BIFURCATION AND CHAOS
"... In this paper, the theory of regular chains and a triangular decomposition method relying on modular computations are presented in order to symbolically solve multivariate polynomial systems. Based on the focus values for dynamic systems obtained by using normal form theory, this method is applied ..."
Abstract
 Add to MetaCart
In this paper, the theory of regular chains and a triangular decomposition method relying on modular computations are presented in order to symbolically solve multivariate polynomial systems. Based on the focus values for dynamic systems obtained by using normal form theory, this method is applied to compute the limit cycles bifurcating from Hopf critical points. In particular, a quadratic planar polynomial system is used to demonstrate the solving process and to show how to obtain center conditions. The modular computations based on regular chains are applied to a cubic planar polynomial system to show the computation efficiency of this method, and to obtain all real solutions of nine limit cycles around a singular point. To the authors ’ best knowledge, this is the first article to simultaneously provide a complete, rigorous proof for the existence of nine limit cycles in a cubic system and all real solutions for these limit cycles.
unknown title
, 2014
"... GPU acceleration of Newton’s method for large systems of polynomial equations in double double and quad double arithmetic ..."
Abstract
 Add to MetaCart
(Show Context)
GPU acceleration of Newton’s method for large systems of polynomial equations in double double and quad double arithmetic
Dense Arithmetic over Finite Fields with the CUMODP Library
"... Abstract. CUMODP is a CUDA library for exact computations with dense polynomials over finite fields. A variety of operations like multiplication, division, computation of subresultants, multipoint evaluation, interpolation and many others are provided. These routines are primarily designed to offer ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. CUMODP is a CUDA library for exact computations with dense polynomials over finite fields. A variety of operations like multiplication, division, computation of subresultants, multipoint evaluation, interpolation and many others are provided. These routines are primarily designed to offer GPU support to polynomial system solvers and a bivariate system solver is part of the library. Algorithms combine FFTbased and plain arithmetic, while the implementation strategy emphasizes reducing parallelism overheads and optimizing hardware usage.