## The Accuracy Of Floating Point Summation (1993)

### Cached

### Download Links

Venue: | SIAM J. Sci. Comput |

Citations: | 46 - 0 self |

### BibTeX

@ARTICLE{Higham93theaccuracy,

author = {Nicholas J. Higham},

title = {The Accuracy Of Floating Point Summation},

journal = {SIAM J. Sci. Comput},

year = {1993},

volume = {14},

pages = {783--799}

}

### Years of Citing Articles

### OpenURL

### Abstract

. The usual recursive summation technique is just one of several ways of computing the sum of n floating point numbers. Five summation methods and their variations are analysed here. The accuracy of the methods is compared using rounding error analysis and numerical experiments. Four of the methods are shown to be special cases of a general class of methods, and an error analysis is given for this class. No one method is uniformly more accurate than the others, but some guidelines are given on the choice of method in particular cases. Key words. floating point summation, rounding error analysis, orderings. AMS subject classifications. primary 65G05, secondary 65B10. 1. Introduction. Sums of floating point numbers are ubiquitous in scientific computing. They occur when evaluating inner products, means, variances, norms, and all kinds of nonlinear functions. Although, at first sight, summation might appear to offer little scope for algorithmic ingenuity, the usual "recursive summation...

### Citations

402 | What every computer scientist should know about floating-point arithmetic
- Goldberg
- 1991
(Show Context)
Citation Context ...ting point underflows occur; how to modify error analyses to allow for underflow is described by Demmel in [6]. An excellent tutorial on many aspects of floating point arithmetic is given by Goldberg =-=[9]-=-. In section 4 we summarise some existing results on statistical estimates of accuracy of summation methods. Numerical experiments are presented in section 6 and conclusions are given in section 7. 2.... |

177 | The Art of Computer - Knuth - 1973 |

124 | A floating-point technique for extending the available precision - Dekker - 1971 |

66 |
Multi-Directional Search: A Direct Search Algorithm for Parallel Machines
- Torczon
- 1989
(Show Context)
Citation Context ...ation in this example). The errors for compensated summation are zero for all the n we tried! In the next set of tests we used a MATLAB implementation [12] of the multidirectional search (MDS) method =-=[40, 41]-=- which attempts to locate a maximizer of f : IR n ! IR using function values only. We applied the maximizer to f defined as the relative error of the sum computed in single precision by recursive summ... |

63 |
Further remarks on reducing truncation errors
- Kahan
- 1965
(Show Context)
Citation Context ...ng one of the numbers from the sum, and he made use of this estimate in a Runge-Kutta code in a program library for the EDSAC computer. Gill's estimate is valid for fixed point arithmetic only. Kahan =-=[16] and -=-Mller [32] both extended the idea to floating point arithmetic. Mller shows how to estimate a+b \Gamma f l(a+b) in chopped arithmetic, while Kahan uses a slightly simpler estimate to derive a "co... |

51 | The Symmetric Eigenvalue Problem, Prentice-Hall - Parlett - 1980 |

37 |
Underflow and the reliability of numerical software
- Demmel
- 1984
(Show Context)
Citation Context ...how our analysis has to be modified to accommodate such machines. We will assume that no floating point underflows occur; how to modify error analyses to allow for underflow is described by Demmel in =-=[6]-=-. An excellent tutorial on many aspects of floating point arithmetic is given by Goldberg [9]. In section 4 we summarise some existing results on statistical estimates of accuracy of summation methods... |

28 |
The arithmetic of the digital computer: A new approach
- Kulisch, Miranker
- 1986
(Show Context)
Citation Context ...terminating when x (k) n approximates P n i=1 x i with relative error at most u. Kahan states that these algorithms appear to have average run times of order at least n log n. See [3], [19], [25] and =-=[23]-=- for further details and references. 4. Statistical Estimates of Accuracy. As we have noted, rounding error bounds can be very pessimistic, because they account for the worst-case propagation of error... |

17 |
On accurate floating-point summation
- Malcolm
- 1971
(Show Context)
Citation Context ...ator; if that accumulator overflows it is added to the next higher one and then reset to zero, and this cascade continues until no overflow occurs. Modifications of Wolfe's algorithm are presented in =-=[28, 37]-=-. Malcolm [28] gives a detailed error analysis to show that his method achieves a relative error of order u. A drawback of the algorithm is that it is strongly machine dependent. An interesting and cr... |

13 |
Rundungsfehleranalyse einiger Verfahren zur Summation endlicher Summen
- Neumaier
- 1974
(Show Context)
Citation Context ...r floating point summation, with the aim of answering the question "which methods achieve the best accuracy?". Several authors have used error analysis to compare summation methods (see, for=-= example, [1, 2, 33, 39]-=-). Here we give a more comprehensive treatment that highlights the relationships between different methods; in particular, we give an error analysis for a general class of methods that includes most o... |

12 |
Parallel algorithms for rounding exact summation of floating point
- Leuprecht, Oberaigner
- 1982
(Show Context)
Citation Context ...=1 x i , terminating when x (k) n approximates P n i=1 x i with relative error at most u. Kahan states that these algorithms appear to have average run times of order at least n log n. See [3], [19], =-=[25]-=- and [23] for further details and references. 4. Statistical Estimates of Accuracy. As we have noted, rounding error bounds can be very pessimistic, because they account for the worst-case propagation... |

11 |
Analysis of some known methods of improving the accuracy of floatingpoint sums
- Linnainmaa
- 1974
(Show Context)
Citation Context ...slightly simpler estimate to derive a "compensated summation" method for computing P n i=1 x i . The use of Kahan's method with a Runge-Kutta formula is described in [42] (see also the exper=-=iments in [26]-=-). The estimate used by Kahan is perhaps best explained with the aid of a diagram. Let a and b be floating point numbers with jajsjbj, let b s = f l(a + b), and consider Figure 3.1, which uses boxes t... |

11 | Knuth: The Art of Computer - E - 1973 |

10 |
Floating-point computation of functions with maximum accuracy
- Bohlender
- 1977
(Show Context)
Citation Context ...) i = P n i=1 x i , terminating when x (k) n approximates P n i=1 x i with relative error at most u. Kahan states that these algorithms appear to have average run times of order at least n log n. See =-=[3]-=-, [19], [25] and [23] for further details and references. 4. Statistical Estimates of Accuracy. As we have noted, rounding error bounds can be very pessimistic, because they account for the worst-case... |

9 |
A Process for the Step-by-Step Integration of Differential Equations
- Gill
- 1951
(Show Context)
Citation Context ...idered here, and in cases where there is heavy cancellation in the sum it can be expected to be the least accurate method. The final method that we examine has an interesting background. In 1951 Gill =-=[8]-=- noticed that the rounding error in the sum of two numbers could be estimated by subtracting one of the numbers from the sum, and he made use of this estimate in a Runge-Kutta code in a program librar... |

9 | The accuracy of solutions to triangular systems
- Higham
- 1989
(Show Context)
Citation Context ...; x 2 )s7\Theta10 5 , where cond(T ; x) = 14 FLOATING POINT SUMMATION k jT \Gamma1 jjT jjxj k1=kxk1s1 (T ) is the condition number that appears in a forward error bound for the substitution algorithm =-=[11]-=-. The forward error varies over the different summation methods by a factor 98 for b 1 and a factor 39 for b 2 ; these are the largest variations we observed in tests with a variety of different matri... |

9 |
Accuate floating-point summation
- Linz
- 1970
(Show Context)
Citation Context ...i ), where j` i j = O(u). The first method we consider is pairwise summation (also known as cascade summation) , which was first discussed by McCracken and Dorn [29, pp. 61--63], Babuska [1] and Linz =-=[27]-=-. In this method the x i are summed in pairs, y i = x 2i\Gamma1 + x 2i ; i = 1 : \Theta n 2 (y [(n+1)=2] = xn if n is odd); and this pairwise summation process is repeated recursively on the y i , i =... |

9 |
PC-MATLAB User’s Guide, The MathWorks
- MOLER, LITTLE, et al.
- 1987
(Show Context)
Citation Context ...science." 6. Numerical Experiments. In this section we describe some numerical experiments that give further insight into the accuracy of summation methods. All the experiments were done using MA=-=TLAB [30]-=-, which uses IEEE standard double precision arithmetic with unit roundoff us1:1 \Theta 10 \Gamma16 . First, we illustrate the behaviour of the methods on four classes of data fx i g chosen a priori. I... |

9 |
Reducing truncation errors using cascading accumulators
- Ross
- 1965
(Show Context)
Citation Context ...ator; if that accumulator overflows it is added to the next higher one and then reset to zero, and this cascade continues until no overflow occurs. Modifications of Wolfe's algorithm are presented in =-=[28, 37]-=-. Malcolm [28] gives a detailed error analysis to show that his method achieves a relative error of order u. A drawback of the algorithm is that it is strongly machine dependent. An interesting and cr... |

8 |
A note on floating-point summation of very many terms, Electron
- Jankowski, Smoktunowicz, et al.
- 1983
(Show Context)
Citation Context ...hat if P n i=1 jx i j AE j P n i=1 x i j, compensated summation is not guaranteed to yield a small relative error. 10 FLOATING POINT SUMMATION Another version of compensated summation is described in =-=[14, 15, 21, 33, 34]-=-. Here, instead of immediately feeding each correction back into the summation, the corrections are accumulated by recursive summation and then the global correction is added to the computed sum. For ... |

7 |
The accurate solution of certain continuous problems using only sigle precision arithmetic
- Jankowski, Woźniakowski
- 1985
(Show Context)
Citation Context ...hat if P n i=1 jx i j AE j P n i=1 x i j, compensated summation is not guaranteed to yield a small relative error. 10 FLOATING POINT SUMMATION Another version of compensated summation is described in =-=[14, 15, 21, 33, 34]-=-. Here, instead of immediately feeding each correction back into the summation, the corrections are accumulated by recursive summation and then the global correction is added to the computed sum. For ... |

6 |
A comparison of floating point summation methods
- Gregory
- 1972
(Show Context)
Citation Context ...none are extensive. Linz [27] compares recursive summation with pairwise summation for uniform random numbers on [0; 1] with n = 2048, averaging the errors over 20 trials, and Caprani [4] and Gregory =-=[10]-=- both conduct a similar experiment including compensated summation as well. Linnainmaa [26] applies recursive summation and compensated summation to series expansions, Simpson's rule for quadrature an... |

5 |
Implementation of a low round-off summation method
- Caprani
- 1971
(Show Context)
Citation Context ...\Gamma (x 1 + x 2 ) + (x 3 + x 4 ) \Delta + (x 5 + x 6 ): Pairwise summation is attractive in parallel settings, because each of the dlog 2 ne stages can be done in parallel [13, sec. 5.2.2]. Caprani =-=[4]-=- shows how to implement the method on a serial machine using temporary storage of size dlog 2 ne (without overwriting the x i ). The error expression (3.3) holds for pairwise summation, but it is easy... |

3 |
Optimal design of efficient acoustic antenna arrays
- Lasdon, Plummer, et al.
- 1987
(Show Context)
Citation Context ...ds as special cases. This work was motivated by two applications in which the choice of summation method has been found to have an important influence on the performance of a numerical method. (1) In =-=[24], Las-=-don et al. derive an algorithm for solving an optimization problem that arises in the design of sonar arrays. The authors state [24, p. 145] that "The objective gradient rf in (4.1) is a sum of M... |

3 |
Pitfalls in computation
- Stegun, Abramowitz
- 1956
(Show Context)
Citation Context ...lities. (1) In the first example, x i is the ith term in the Taylor series expansion of e \Gammax about the origin, with x = 2 (this series provides the classic example of "catastrophic cancellat=-=ion" [38]-=-). Results for n = 64 are given in Table 6.1. In this example, recursive summation with the decreasing ordering yields by far the best accuracy. There is severe cancellation in the sum and the decreas... |

2 |
ska, Numerical stability in mathematical analysis
- Babu
- 1969
(Show Context)
Citation Context ...r floating point summation, with the aim of answering the question "which methods achieve the best accuracy?". Several authors have used error analysis to compare summation methods (see, for=-= example, [1, 2, 33, 39]-=-). Here we give a more comprehensive treatment that highlights the relationships between different methods; in particular, we give an error analysis for a general class of methods that includes most o... |

2 |
D.J.Mills, Effect of rounding errors on the variable metric method
- Dixon
- 1994
(Show Context)
Citation Context ...ere eliminated by accumulating separately positive and negative terms (for each component of rf) in the sum (4.1), adding them together only after all M terms had been processed." (2) Dixon and M=-=ills [7]-=- applied a quasi-Newton method to the extended Rosenbrock function F (x 1 ; x 2 ; : : : ; xn ) = n=2 X i=1 \Gamma 100(x 2i \Gamma x 2 2i\Gamma1 ) 2 + (1 \Gamma x 2i\Gamma1 ) 2 \Delta : (1.1) In SIAM J... |

2 |
Das Kahan-Babuskasche Summierungsverfahren in Triplex-ALGOL 60
- Nickel
- 1970
(Show Context)
Citation Context ...hat if P n i=1 jx i j AE j P n i=1 x i j, compensated summation is not guaranteed to yield a small relative error. 10 FLOATING POINT SUMMATION Another version of compensated summation is described in =-=[14, 15, 21, 33, 34]-=-. Here, instead of immediately feeding each correction back into the summation, the corrections are accumulated by recursive summation and then the global correction is added to the computed sum. For ... |

2 |
Rounding error analysis of elementary numerical algorithms
- Stummel
- 1980
(Show Context)
Citation Context ...r floating point summation, with the aim of answering the question "which methods achieve the best accuracy?". Several authors have used error analysis to compare summation methods (see, for=-= example, [1, 2, 33, 39]-=-). Here we give a more comprehensive treatment that highlights the relationships between different methods; in particular, we give an error analysis for a general class of methods that includes most o... |

2 | The accuracy ofsolutions to triangular systems - HIGHAM - 1989 |

1 |
Cray's arithmetic hurts scientific computation (and what might be done about it). Manuscript prepared for the Cray User Group meeting in Toronto
- How
- 1990
(Show Context)
Citation Context ...m the next smaller floating point number gives an answer that is either a factor of 2 too large or is zero, so the expression f l(x +y) = (x +y)(1+ ffi) holds with jffij = 1 but not with jffij = O(u) =-=[20]-=-. For machines without a guard digit we have to use the weaker model [44, p. 12] f l(x \Sigma y) = x(1 + ff) \Sigma y(1 + fi); jffj; jfijsu: (5.1) We now summarise the effect on the rounding error ana... |

1 | Note on quasi double-precision - Mller - 1965 |

1 |
double-precision in floating point addition
- Quasi
- 1965
(Show Context)
Citation Context ...umbers from the sum, and he made use of this estimate in a Runge-Kutta code in a program library for the EDSAC computer. Gill's estimate is valid for fixed point arithmetic only. Kahan [16] and Mller =-=[32] both-=- extended the idea to floating point arithmetic. Mller shows how to estimate a+b \Gamma f l(a+b) in chopped arithmetic, while Kahan uses a slightly simpler estimate to derive a "compensated summa... |

1 |
Best "ordering" for floating-point addition
- Robertazzi, Schwartz
- 1988
(Show Context)
Citation Context ...P n i=1 1 i 2sP 1 i=1 1 i 2 = 2 6s1:64), and so pairwise summation has the larger error bound, by a factorslog 2 n. (Expression (3.3) does not enable us to improve on the factor log 2 n in (3.7).) In =-=[36] an "-=-insertion adder" is proposed for the summation of positive numbers. This method can be applied equally well to arbitrary sums. First, the x i are sorted by order of increasing magnitude. Then x 1... |

1 |
The numerical stability in solution of differential equations
- Vitasek
(Show Context)
Citation Context ...ed arithmetic, while Kahan uses a slightly simpler estimate to derive a "compensated summation" method for computing P n i=1 x i . The use of Kahan's method with a Runge-Kutta formula is des=-=cribed in [42]-=- (see also the experiments in [26]). The estimate used by Kahan is perhaps best explained with the aid of a diagram. Let a and b be floating point numbers with jajsjbj, let b s = f l(a + b), and consi... |

1 | BOHLENDER,Floating-pointcomputation offunctions withmaximum accuracy,IEEE Trans - unknown authors - 1977 |

1 | CAPRANI,Implementation ofa lowround-offsummation method - unknown authors - 1971 |

1 | DErarmR,A floating-point techniquefor extending the availableprecision - J - 1971 |

1 | The effect ofrounding erroron the variable metric method - MILLS - 1990 |

1 | integration ofdifferentialequations inanautomatic digitalcomputing machine - GLL, Aprocessforthestep-by-step - 1951 |

1 | comparison offloatingpointsummation methods - GREGORY, A - 1972 |

1 | Parallel Computers,Adam - JESSHOPE - 1981 |

1 | WOZNIAOWSra,A noteonfloating-pointsummation ofvery many terms - JANKOWSra, SMOITUOWICZ, et al. - 1983 |

1 | The accurate solution ofcertain continuousproblems using only singleprecision arithmetic - WOZNIAKOWSra - 1985 |

1 | Furtherremarkson reducing truncation errors,Comm.ACM - KAHAN - 1965 |

1 | Implementation ofalgorithms (lecture notes by - Rep - 1973 |

1 | arithmetic hurts scientific computation (andwhatmightbedoneabout it), manuscript prepared for the CrayUserGroup meeting in Toronto - Cray’s - 1990 |

1 | with correctionsandsomeofitsapplications,Math. Stos - KaELBASItSIO, Summationalgorithm - 1973 |

1 | Optimaldesign ofefficientacoustic antenna arrays - WAREN - 1987 |

1 | LINNAINMAA,Analysis ofsome known methods ofimproving the accuracy offloating-point sums - unknown authors - 1974 |