## Exploiting Mixed Precision Floating Point Hardware in Scientific Computations (2007)

### Cached

### Download Links

Citations: | 3 - 0 self |

### BibTeX

@MISC{Buttari07exploitingmixed,

author = {Alfredo Buttari and Jack Dongarra and Jakub Kurzak and Julie Langou and Julien Langou and Piotr Luszczek and Stanimire Tomov},

title = {Exploiting Mixed Precision Floating Point Hardware in Scientific Computations},

year = {2007}

}

### OpenURL

### Abstract

By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also

### Citations

1599 |
Iterative Methods for Sparse Linear Systems
- Saad
- 1996
(Show Context)
Citation Context ...re is a point where they become prohibitively high and direct sparse methods are no longer feasible. Iterative methods are a remedy, since only a few working vectors and the primary data are required =-=[18, 19]-=-. As an example, let us first consider the iterative refinement itself, described in Algorithm 1 as xiC1DxiC M.b Axi/; (1) where M is.LU/ 1 P. Iterative methods of this form (i.e. where M does not dep... |

874 |
Accuracy and Stability of Numerical Algorithms
- Higham
- 2002
(Show Context)
Citation Context ...32// x .1/ x .1/ .32/ i 0 repeat i iC 1 r .i/ b Ax .i/ r .i/ .32/ r.i/ z .i/ .32/ SGETRS.L.32/; U.32/; P.32/; r .i/ .32/ / z .i/ z .i/ .32/ x .iC1/ x .i/ C z .i/ until x .i/ is accurate enough Higham =-=[5]-=- gives error bounds for the single and double precision, iterative refinement algorithm when the entire algorithm is implemented with the same precision (single or double, respectively). He also gives... |

551 | Applied Numerical Linear Algebra
- Demmel
- 1997
(Show Context)
Citation Context ...e refinement process is applied, which produces a correction to the computed solution at each iteration, which then yields the basic iterative refinement algorithm (Algorithm 1). As Demmel points out =-=[2]-=-, the non-linearity of the round-off error makes the iterative refinement process equivalent to the Newton’s method applied to the function f.x/Db Ax. Provided that the system is not too ill-condition... |

506 |
der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods
- Barrett, Berry, et al.
- 1994
(Show Context)
Citation Context ...re is a point where they become prohibitively high and direct sparse methods are no longer feasible. Iterative methods are a remedy, since only a few working vectors and the primary data are required =-=[18, 19]-=-. As an example, let us first consider the iterative refinement itself, described in Algorithm 1 as xiC1DxiC M.b Axi/; (1) where M is.LU/ 1 P. Iterative methods of this form (i.e. where M does not dep... |

351 |
Introduction to Matrix Computations
- Stewart
- 1973
(Show Context)
Citation Context ...ethods (Section 3). 1. Direct Methods for Solving Dense Systems 1.1. Algorithm Iterative refinement is a well known method for improving the solution of a linear system of equations of the form AxD b =-=[1]-=-. The standard approach to the solution of dense linear systems is to use the LU factorization by means of Gaussian elimination. First, the coefficient matrix A is factorized into the product of a low... |

292 | A flexible inner-outer preconditioned GMRES algorithm
- Saad
- 1993
(Show Context)
Citation Context ...ion, inner-outer iterative solver that is based on the restarted Generalized Minimal RESidual (GMRES) method. Namely, consider Algorithm 5, where for the outer loop we take the flexible GMRES (FGMRES =-=[19, 22]-=-) and for the inner loop the GMRES in single precision arithmetic (denoted by GMRESS P). FGMRES, a minor modification to the standard GMRES, is meant to accommodate non-constant preconditioners. Note ... |

222 |
The multifrontal solution of indefinite sparse symmetric linear equations
- Duff, Reid
- 1973
(Show Context)
Citation Context ...on of sparse linear systems, which is commonly achieved with either direct of iterative methods. Most sparse direct methods for solving linear systems of equations are variants of either multifrontal =-=[9]-=- or supernodal [10] factorization approaches. Here, we focus only on multifrontal methods. For results on supernodal solvers see [11]. There are a number of freely available packages that implement mu... |

157 | A fully asynchronous multifrontal solver using distributed dynamic scheduling - Amestoy, Duff, et al. - 2001 |

140 |
Rounding Errors in Algebraic Processes
- Wilkinson
- 1963
(Show Context)
Citation Context ... that the system is not too ill-conditioned, the algorithm produces a solution correct to the working precision. Iterative refinement is a fairly well understood concept and was analyzed by Wilkinson =-=[3]-=-, Moler [4] and Stewart [1]. The algorithm can be modified to use a mixed precision approach. The factorization P ADLU and the solution of the triangular systems LyDPb and U xDy are computed using sin... |

123 | An unsymmetric-pattern multifrontal method for sparse LU factorization - Davis, Duff - 1997 |

121 | Multifrontal parallel distributed symmetric and unsymmetric solvers - Amestoy, Duff, et al. - 2000 |

77 | A Combined Unifrontal/Multifrontal Method for Unsymmetric Sparse Matrices", submitted to - Davis, Du - 1995 |

75 | Hybrid scheduling for the parallel solution of linear systems - Amestoy, Guermouche, et al. |

60 | A column pre-ordering strategy for the unsymmetric-pattern multifrontal method - Davis |

52 | Inexact preconditioned conjugate gradient method with inner-outer iteration - Golub, Ye - 1999 |

45 | A black box generalized Conjugate Gradient solver with inner iterations and variable-step preconditioning - Axelsson, Vassilevski - 1991 |

43 | Flexible conjugate gradients - Notay - 2000 |

38 | Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy
- Langou, Luszczek, et al.
- 2006
(Show Context)
Citation Context ... precision arithmetic, with refinement performed in double precision arithmetic [5]. The error analysis in double precision, for our mixed precision algorithm (Algorithm 2), is given by Langou et al. =-=[6]-=-. The same technique can be applied to the case of symmetric, positive definite problems. Here, Cholesky factorization (LAPACK’s SPOTRF routine) can be used in place ofsLU factorization (SGETRF), and ... |

25 |
GMRESR: a family of nested GMRES methods. Numerical Linear Algebra with Applications
- Vorst, Vuik
- 1994
(Show Context)
Citation Context ...mat, the nonzero matrix coefficients in single precision, 2 mout number of vectors in double precision, and min number of vectors in single precision. The Generalized Conjugate Residuals (GCR) method =-=[26, 28]-=- is comparable to the FGMRES and can replace it successfully as the outer iterative solver. 3.2. Numerical Performance Similar to the case of sparse direct solvers, we demonstrate the numerical perfor... |

21 |
Iterative refinement in floating point
- Moler
- 1967
(Show Context)
Citation Context ...ystem is not too ill-conditioned, the algorithm produces a solution correct to the working precision. Iterative refinement is a fairly well understood concept and was analyzed by Wilkinson [3], Moler =-=[4]-=- and Stewart [1]. The algorithm can be modified to use a mixed precision approach. The factorization P ADLU and the solution of the triangular systems LyDPb and U xDy are computed using single precisi... |

21 | Flexible inner-outer Krylov subspace methods
- Simoncini, Szyld
(Show Context)
Citation Context ...r to apply (e.g., in our case, using single precision arithmetic). Moreover, even if no faster matrix-vector product is available, speedup can often be observed due to improved convergence (e.g., see =-=[23]-=-, where Simoncini and Szyld explain the possible benefits of FGMRES-GMRES over restarted GMRES). To illustrate the above concepts, we demonstrate the ideas with a mixed precision, inner-outer iterativ... |

19 | New insights in GMRes-like methods with variable preconditioners
- Vuik
- 1995
(Show Context)
Citation Context ...mat, the nonzero matrix coefficients in single precision, 2 mout number of vectors in double precision, and min number of vectors in single precision. The Generalized Conjugate Residuals (GCR) method =-=[26, 28]-=- is comparable to the FGMRES and can replace it successfully as the outer iterative solver. 3.2. Numerical Performance Similar to the case of sparse direct solvers, we demonstrate the numerical perfor... |

17 |
Efficient high accuracy solutions with GMRES(m
- Turner, Walker
- 1992
(Show Context)
Citation Context ...tions can be performed more efficiently on it. Here, we go two steps further: we consider replacing not only M by an inner loop of incomplete iterative solver performed in single precision arithmetic =-=[20]-=- , but also the outer loop by more sophisticated iterative methods (e.g., Krylov type). 3.1. Mixed Precision, Inner-Outer Iterative Solvers Note that replacing M by an iterative method leads to nestin... |

13 |
Implementation of mixed precision in solving systems of linear equations on the CELL processor. Concurrency Computat.: Pract
- Kurzak, Dongarra
(Show Context)
Citation Context ...ision solver performs up to 7 and 11 faster than the double precision peak in the unsymmetric and symmetric, positive definite cases respectively. Implementation details for this case can be found in =-=[7, 8]-=-. Figure 1. Performance of mixed precision, iterative refinement for unsymmetric problems on Intel Woodcrest. Figure 2. Performance of mixed precision, iterative refinement for symmetric, positive def... |

13 | Relaxation strategies for nested Krylov methods - Eshof, Sleijpen, et al. - 2003 |

12 |
Progress in sparse matrix methods in large sparse linear systems on vector supercomputers, Intern
- Ashcraft, Grimes, et al.
- 1987
(Show Context)
Citation Context ...r systems, which is commonly achieved with either direct of iterative methods. Most sparse direct methods for solving linear systems of equations are variants of either multifrontal [9] or supernodal =-=[10]-=- factorization approaches. Here, we focus only on multifrontal methods. For results on supernodal solvers see [11]. There are a number of freely available packages that implement multifrontal methods.... |

2 |
Mixed precision dense linear system solver based on cholesky factorization for the CELL processor. Concurrency Computat. Pract. Exper. in preparation
- Kurzak, Dongarra
(Show Context)
Citation Context ...ision solver performs up to 7 and 11 faster than the double precision peak in the unsymmetric and symmetric, positive definite cases respectively. Implementation details for this case can be found in =-=[7, 8]-=-. Figure 1. Performance of mixed precision, iterative refinement for unsymmetric problems on Intel Woodcrest. Figure 2. Performance of mixed precision, iterative refinement for symmetric, positive def... |

2 |
Jakub Kurzak, Piotr Luszczek, and Stanmire Tomov. Computations to enhance the performance while achieving the 64-bit accuracy
- Buttari, Dongarra
- 2006
(Show Context)
Citation Context ...ing linear systems of equations are variants of either multifrontal [9] or supernodal [10] factorization approaches. Here, we focus only on multifrontal methods. For results on supernodal solvers see =-=[11]-=-. There are a number of freely available packages that implement multifrontal methods. We have chosen for our tests the software package called MUMPS [12–14]. The main reason for selecting this softwa... |