Results 1  10
of
45
Parallel Numerical Linear Algebra
, 1993
"... We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illust ..."
Abstract

Cited by 542 (26 self)
 Add to MetaCart
We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illustrate these principles using current architectures and software systems, and by showing how one would implement matrix multiplication. Then, we present direct and iterative algorithms for solving linear systems of equations, linear least squares problems, the symmetric eigenvalue problem, the nonsymmetric eigenvalue problem, and the singular value decomposition. We consider dense, band and sparse matrices.
A proven correctly rounded logarithm in doubleprecision
 In Real Numbers and Computers, Schloss Dagstuhl
, 2004
"... Abstract. This article is a case study in the implementation of a portable, proven and efficient correctly rounded elementary function in doubleprecision. We describe the methodology used to achieve these goals in the crlibm library. There are two novel aspects to this approach. The first is the pr ..."
Abstract

Cited by 19 (9 self)
 Add to MetaCart
Abstract. This article is a case study in the implementation of a portable, proven and efficient correctly rounded elementary function in doubleprecision. We describe the methodology used to achieve these goals in the crlibm library. There are two novel aspects to this approach. The first is the proof framework, and in general the techniques used to balance performance and provability. The second is the introduction of processorspecific optimization to get performance equivalent to the best current mathematical libraries, while trying to minimize the proof work. The implementation of the natural logarithm is detailed to illustrate these questions. Mathematics Subject Classification. 2604, 65D15, 65Y99. 1.
Assisted verification of elementary functions using Gappa
 In Proceedings of the 2006 ACM symposium on Applied computing
, 2006
"... The implementation of a correctly rounded or interval elementary function needs to be proven carefully in the very last details. The proof requires a tight bound on the overall error of the implementation with respect to the mathematical function. Such work is function specific, concerns tens of lin ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
The implementation of a correctly rounded or interval elementary function needs to be proven carefully in the very last details. The proof requires a tight bound on the overall error of the implementation with respect to the mathematical function. Such work is function specific, concerns tens of lines of code for each function, and will usually be broken by the smallest change to the code (e.g. for maintenance or optimization purpose). Therefore, it is very tedious and errorprone if done by hand. This article discusses the use of the Gappa proof assistant in this context. Gappa has two main advantages over previous approaches: Its input format is very close to the actual C code to validate, and it automates error evaluation and propagation using interval arithmetic. Besides, it can be used to incrementally prove complex mathematical properties pertaining to the C code. Yet it does not require any specific knowledge about automatic theorem proving, and thus is accessible to a wider community. Moreover, Gappa may generate a formal proof of the results that can be checked independently by a lowerlevel proof assistant like Coq, hence providing an even higher confidence in the certification of the numerical code. 1.
Towards the postultimate libm
, 2005
"... This article presents advances on the subject of correctly rounded elementary functions since the publication of the libultim mathematical library developed by Ziv at IBM. This library showed that the average performance and memory overhead of correct rounding could be made negligible. However, the ..."
Abstract

Cited by 13 (8 self)
 Add to MetaCart
This article presents advances on the subject of correctly rounded elementary functions since the publication of the libultim mathematical library developed by Ziv at IBM. This library showed that the average performance and memory overhead of correct rounding could be made negligible. However, the worstcase overhead was still a factor 1000 or more. It is shown here that, with current processor technology, this worstcase overhead can be kept within a factor of 2 to 10 of current best libms. This low overhead has very positive consequences on the techniques for implementing and proving correctly rounded functions, which are also studied. These results lift the last technical obstacles to a generalisation of (at least some) correctly rounded double precision elementary functions.
Trading off Parallelism and Numerical Stability
, 1992
"... The fastest parallel algorithm for a problem may be significantly less stable numerically than the fastest serial algorithm. We illustrate this phenomenon by a series of examples drawn from numerical linear algebra. We also show how some of these instabilities may be mitigated by better floating poi ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
The fastest parallel algorithm for a problem may be significantly less stable numerically than the fastest serial algorithm. We illustrate this phenomenon by a series of examples drawn from numerical linear algebra. We also show how some of these instabilities may be mitigated by better floating point arithmetic.
Fast and Accurate Floating Point Summation with Application to Computational Geometry
 Numerical Algorithms
, 2002
"... We present several simple algorithms for accurately computing the sum of n oating point numbers using a wider accumulator. Let f and F be the number of signi cant bits in the summands and the accumulator, respectively. Then assuming gradual underow, no overow, and roundtonearest arithmetic, up ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
We present several simple algorithms for accurately computing the sum of n oating point numbers using a wider accumulator. Let f and F be the number of signi cant bits in the summands and the accumulator, respectively. Then assuming gradual underow, no overow, and roundtonearest arithmetic, up to b2 =(1 2 )c + 1 numbers can be accurately added by just summing the terms in decreasing order of exponents, yielding a sum correct to within about 1.5 units in the last place. In particular, if the sum is zero, it is computed exactly. We apply this result to the oating point formats in the IEEE oating point standard, and investigate its performance. Our results show that in the absence of massive cancellation (the most common case) the cost of guaranteed accuracy is about 3040% more than the straightforward summation. If massive cancellation does occur, the cost of computing the accurate sum is about a factor of ten. Finally we apply our algorithm in computing a robust geometric predicate (used in computational geometry), where our accurate summation algorithm improves the existing algorithm by a factor of two on a nearly coplanar set of points.
Some functions computable with a fusedmac
 in Proceedings of the 17th Symposium on Computer Arithmetic, P. Montuschi and E. Schwarz, Eds., Cape Cod
, 2005
"... The fused multiply accumulate instruction (fusedmac) that is available on some current processors such as the Power PC or the Itanium eases some calculations. We give examples of some floatingpoint functions (such as ulp(x) or Nextafter(x, y)), or some useful tests, that are easily computable usin ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
The fused multiply accumulate instruction (fusedmac) that is available on some current processors such as the Power PC or the Itanium eases some calculations. We give examples of some floatingpoint functions (such as ulp(x) or Nextafter(x, y)), or some useful tests, that are easily computable using a fusedmac. Then, we show that, with rounding to the nearest, the error of a fusedmac instruction is exactly representable as the sum of two floatingpoint numbers. We give an algorithm that computes that error. 1
Emulation of a FMA and CorrectlyRounded Sums: Proved Algorithms Using Rounding to Odd
 IEEE Trans. Computers
, 2008
"... Rounding to odd is a nonstandard rounding on floatingpoint numbers. By using it for some intermediate values instead of rounding to nearest, correctly rounded results can be obtained at the end of computations. We present an algorithm to emulate the fused multiplyandadd operator. We also present ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Rounding to odd is a nonstandard rounding on floatingpoint numbers. By using it for some intermediate values instead of rounding to nearest, correctly rounded results can be obtained at the end of computations. We present an algorithm to emulate the fused multiplyandadd operator. We also present an iterative algorithm for computing the correctly rounded sum of a set floatingpoint numbers under mild assumptions. A variation on both previous algorithms is the correctly rounded sum of any three floatingpoint numbers. This leads to efficient implementations, even when this rounding is not available. In order to guarantee the correctness of these properties and algorithms, we formally proved them using the Coq proof checker.
Automatic Generation of Staged Geometric Predicates
, 2002
"... Algorithms in Computational Geometry and Computer Aided Design are often developed for the Real RAM model of computation, which assumes exactness of all the input arguments and operations. In practice, however, the exactness imposes tremendous limitations on the algorithms – even the basic operation ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Algorithms in Computational Geometry and Computer Aided Design are often developed for the Real RAM model of computation, which assumes exactness of all the input arguments and operations. In practice, however, the exactness imposes tremendous limitations on the algorithms – even the basic operations become uncomputable, or prohibitively slow. In some important cases, however, the computations of interest are limited to determining the sign of polynomial expressions. In such circumstances, a faster approach is available: one can evaluate the polynomial in floating point first, together with some estimate of the rounding error, and fall back to exact arithmetic only if this error is too big to determine the sign reliably. A particularly efficient variation on this approach has been used by Shewchuk in his robust implementations of Orient and InSphere geometric predicates. We extend Shewchuk’s method to arbitrary polynomial expressions. The expressions are given as programs in a suitable source language featuring basic arithmetic operations of addition, subtraction, multiplication and squaring, which are to be perceived by the programmer as exact. The source language also allows for anonymous
Certifying the floatingpoint implementation of an elementary function using Gappa
 IEEE TRANSACTIONS ON COMPUTERS, 2010. 9 HTTP://DX.DOI.ORG/10.1145/1772954.1772987 10 HTTP://DX.DOI.ORG/10.1145/1838599.1838622 11 HTTP://SHEMESH.LARC.NASA.GOV/NFM2010/PAPERS/NFM2010_14_23.PDF 12 HTTP://DX.DOI.ORG/10.1007/9783642142031_11 13 HTTP://DX.
, 2011
"... High confidence in floatingpoint programs requires proving numerical properties of final and intermediate values. One may need to guarantee that a value stays within some range, or that the error relative to some ideal value is well bounded. This certification may require a timeconsuming proof fo ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
High confidence in floatingpoint programs requires proving numerical properties of final and intermediate values. One may need to guarantee that a value stays within some range, or that the error relative to some ideal value is well bounded. This certification may require a timeconsuming proof for each line of code, and it is usually broken by the smallest change to the code, e.g., for maintenance or optimization purpose. Certifying floatingpoint programs by hand is, therefore, very tedious and errorprone. The Gappa proof assistant is designed to make this task both easier and more secure, due to the following novel features: It automates the evaluation and propagation of rounding errors using interval arithmetic. Its input format is very close to the actual code to validate. It can be used incrementally to prove complex mathematical properties pertaining to the code. It generates a formal proof of the results, which can be checked independently by a lower level proof assistant like Coq. Yet it does not require any specific knowledge about automatic theorem proving, and thus, is accessible to a wide community. This paper demonstrates the practical use of this tool for a widely used class of floatingpoint programs: implementations of elementary functions in a mathematical library.