Results 1  10
of
35
Accurate Sum and Dot Product
 SIAM J. Sci. Comput
, 2005
"... Algorithms for summation and dot product of floating point numbers are presented which are fast in terms of measured computing time. We show that the computed results are as accurate as if computed in twice or Kfold working precision, K 3. For twice the working precision our algorithms for summa ..."
Abstract

Cited by 64 (5 self)
 Add to MetaCart
Algorithms for summation and dot product of floating point numbers are presented which are fast in terms of measured computing time. We show that the computed results are as accurate as if computed in twice or Kfold working precision, K 3. For twice the working precision our algorithms for summation and dot product are some 40 % faster than the corresponding XBLAS routines while sharing similar error estimates. Our algorithms are widely applicable because they require only addition, subtraction and multiplication of floating point numbers in the same working precision as the given data. Higher precision is unnecessary, algorithms are straight loops without branch, and no access to mantissa or exponent is necessary.
On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods
 Journal of Computational and Graphical Statistics
, 2010
"... We present a casestudy on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are selfcontained parallel computational devices that can be housed in conventional desktop and la ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
We present a casestudy on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are selfcontained parallel computational devices that can be housed in conventional desktop and laptop computers. For certain classes of Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multicore processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including populationbased Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we find speedups from 35 to 500 fold over conventional singlethreaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible manycore computation. We believe the speedup we observe should motivate wider
Accuracy Enhancement for Higher Derivatives using Chebyshev Collocation and a Mapping Technique
, 1994
"... We study a new method in reducing the roundoff error in computing derivatives using Chebyshev collocation methods. By using a grid mapping derived by Kosloff and TalEzer, and the proper choice of the parameter ff, the roundoff error of the kth derivative can be reduced from O(N 2k ) to O((N jln ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
We study a new method in reducing the roundoff error in computing derivatives using Chebyshev collocation methods. By using a grid mapping derived by Kosloff and TalEzer, and the proper choice of the parameter ff, the roundoff error of the kth derivative can be reduced from O(N 2k ) to O((N jln fflj) k ), where ffl is the machine precision and N is the number of collocation points. This drastic reduction of roundoff error makes mapped Chebyshev methods competitive with any other algorithm in computing second or higher derivatives with large N . We also study several other aspects of the mapped Chebyshev differentiation matrix. We find that 1) the mapped Chebyshev methods requires much less than ß points to resolve a wave, 2) the eigenvalues are less sensitive to perturbation by roundoff error, and 3) larger time steps can be used for solving PDEs. All these advantages of the mapped Chebyshev methods can be achieved while maintaining spectral accuracy. 1 Introduction In [5], we...
A distillation algorithm for floatingpoint summation
 SIAM J. Sci. Comput
, 1999
"... Abstract. The addition of two or more floatingpoint numbers is fundamental to numerical computations. This paper describes an efficient “distillation ” style algorithm which produces a precise sum by exploiting the natural accuracy of compensated cancellation. The algorithm is applicable to all set ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
Abstract. The addition of two or more floatingpoint numbers is fundamental to numerical computations. This paper describes an efficient “distillation ” style algorithm which produces a precise sum by exploiting the natural accuracy of compensated cancellation. The algorithm is applicable to all sets of data but is particularly appropriate for illconditioned data, where standard methods fail due to the accumulation of rounding error and its subsequent exposure by cancellation. The method uses only standard floatingpoint arithmetic and does not rely on the radix used by the arithmetic model, the architecture of specific machines, or the use of accumulators.
Composition constants for raising the order of unconventional schemes for ordinary differential equations
 Math. Comp
, 1997
"... Abstract. Many models of physical and chemical processes give rise to ordinary differential equations with special structural properties that go unexploited by generalpurpose software designed to solve numerically a wide range of differential equations. If those properties are to be exploited fully ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
Abstract. Many models of physical and chemical processes give rise to ordinary differential equations with special structural properties that go unexploited by generalpurpose software designed to solve numerically a wide range of differential equations. If those properties are to be exploited fully for the sake of better numerical stability, accuracy and/or speed, the differential equations may have to be solved by unconventional methods. This short paper is to publish composition constants obtained by the authors to increase efficiency of a family of mostly unconventional methods, called reflexive. 1.
A comparison of methods for accurate summation
 SIGSAM Bull
, 2004
"... The summation of large sets of numbers is prone to serious rounding errors. Several methods of controlling these errors are compared, with respect to both speed and accuracy. It is found that the method of “Cascading Accumulators ” is the fastest of several accurate methods. The Double Compensation ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
The summation of large sets of numbers is prone to serious rounding errors. Several methods of controlling these errors are compared, with respect to both speed and accuracy. It is found that the method of “Cascading Accumulators ” is the fastest of several accurate methods. The Double Compensation method (in both single and double precision versions) is also perfectly accurate in all the tests performed. Although slower than the Cascade method, it is recommended when double precision accuracy is required. C programs that implement both these methods are available in the BULLETIN online repository. 1
Accurate floatingpoint summation
, 2005
"... Given a vector of floatingpoint numbers with exact sum s, we present an algorithm for calculating a faithful rounding of s into the set of floatingpoint numbers, i.e. one of the immediate floatingpoint neighbors of s. If the s is a floatingpoint number, we prove that this is the result of our a ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Given a vector of floatingpoint numbers with exact sum s, we present an algorithm for calculating a faithful rounding of s into the set of floatingpoint numbers, i.e. one of the immediate floatingpoint neighbors of s. If the s is a floatingpoint number, we prove that this is the result of our algorithm. The algorithm adapts to the condition number of the sum, i.e. it is very fast for mildly conditioned sums with slowly increasing computing time proportional to the condition number. All statements are also true in the presence of underflow. Furthermore algorithms with Kfold accuracy are derived, where in that case the result is stored in a vector of K floatingpoint numbers. We also present an algorithm for rounding the sum s to the nearest floatingpoint number. Our algorithms are fast in terms of measured computing time because they neither require special operations such as access to mantissa or exponent, they contain no branch in the inner loop, nor do they require extra precision: The only operations used are standard floatingpoint addition, subtraction and multiplication in one working precision, for example double precision. Moreover, in contrast to other approaches, the algorithms are ideally suited for parallelization. We also sketch dot product algorithms with similar properties.
On floatingpoint summation
 SIAM Rev
, 1995
"... In this paper we focus on some general error analysis results in floatingpoint summation. We emphasize analysis useful from both a scientific and a teaching point of view. Keywords: Floatingpoint summation, rounding errors, orderings. AMS subject classification. primary 65G05, secondary 65B10. 1 ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
In this paper we focus on some general error analysis results in floatingpoint summation. We emphasize analysis useful from both a scientific and a teaching point of view. Keywords: Floatingpoint summation, rounding errors, orderings. AMS subject classification. primary 65G05, secondary 65B10. 1
Accuracy Versus Time: A Case Study with Summation Algorithms
 In PASCO
, 2010
"... In this article, we focus on numerical algorithms for which, in practice, parallelism and accuracy do not cohabit well. In order to increase parallelism, expressions are reparsed, implicitly using mathematical laws like associativity, and this reduces the accuracy. Our approach consists in focusing ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
In this article, we focus on numerical algorithms for which, in practice, parallelism and accuracy do not cohabit well. In order to increase parallelism, expressions are reparsed, implicitly using mathematical laws like associativity, and this reduces the accuracy. Our approach consists in focusing on summation algorithms and in performing an exhaustive study: we generate all the algorithms equivalent to the original one and compatible with our relaxed time constraint. Next we compute the worst errors which may arise during their evaluation, for several relevant sets of data. Our main conclusion is that relaxing very slightly the time constraints by choosing algorithms whose critical paths are a bit longer than the optimal makes it possible to strongly optimize the accuracy. We extend these results to the case of bounded parallelism and to accurate sum algorithms that use compensation techniques.
ULTIMATELY FAST ACCURATE SUMMATION

, 2009
"... We present two new algorithms FastAccSum and FastPrecSum, one to compute a faithful rounding of the sum of floatingpoint numbers and the other for a result “as if” computed in Kfold precision. Faithful rounding means the computed result either is one of the immediate floatingpoint neighbors of th ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We present two new algorithms FastAccSum and FastPrecSum, one to compute a faithful rounding of the sum of floatingpoint numbers and the other for a result “as if” computed in Kfold precision. Faithful rounding means the computed result either is one of the immediate floatingpoint neighbors of the exact result or is equal to the exact sum if this is a floatingpoint number. The algorithms are based on our previous algorithms AccSum and PrecSum and improve them by up to 25%. The first algorithm adapts to the condition number of the sum; i.e., the computing time is proportional to the difficulty of the problem. The second algorithm does not need extra memory, and the computing time depends only on the number of summands and K. Both algorithms are the fastest known in terms of flops. They allow good instructionlevel parallelism so that they are also fast in terms of measured computing time. The algorithms require only standard floatingpoint addition, subtraction, and multiplication in one working precision, for example, double precision.