Results 1  10
of
45
A proven correctly rounded logarithm in doubleprecision
 In Real Numbers and Computers, Schloss Dagstuhl
, 2004
"... Abstract. This article is a case study in the implementation of a portable, proven and efficient correctly rounded elementary function in doubleprecision. We describe the methodology used to achieve these goals in the crlibm library. There are two novel aspects to this approach. The first is the pr ..."
Abstract

Cited by 19 (9 self)
 Add to MetaCart
Abstract. This article is a case study in the implementation of a portable, proven and efficient correctly rounded elementary function in doubleprecision. We describe the methodology used to achieve these goals in the crlibm library. There are two novel aspects to this approach. The first is the proof framework, and in general the techniques used to balance performance and provability. The second is the introduction of processorspecific optimization to get performance equivalent to the best current mathematical libraries, while trying to minimize the proof work. The implementation of the natural logarithm is detailed to illustrate these questions. Mathematics Subject Classification. 2604, 65D15, 65Y99. 1.
A parameterized floatingpoint exponential function for FPGAs
 IN IEEE INTERNATIONAL CONFERENCE ON FIELDPROGRAMMABLE TECHNOLOGY (FPT’05
, 2005
"... A parameterized floatingpoint exponential operator is presented. In singleprecision, it uses a small fraction of the FPGA’s resources and has a smaller latency than its software equivalent on a highend processor, and ten times the throughput in pipelined version. Previous work had shown that FPGA ..."
Abstract

Cited by 19 (8 self)
 Add to MetaCart
A parameterized floatingpoint exponential operator is presented. In singleprecision, it uses a small fraction of the FPGA’s resources and has a smaller latency than its software equivalent on a highend processor, and ten times the throughput in pipelined version. Previous work had shown that FPGAs could use massive parallelism to balance the poor performance of their basic floatingpoint operators compared to the equivalent in processors. As this work shows, when evaluating an elementary function, the flexibility of FPGAs provides much better performance than the processor witout even resorting to parallelism.
Formal verification of IA64 division algorithms
 Proceedings, Theorem Proving in Higher Order Logics (TPHOLs), LNCS 1869
, 2000
"... Abstract. The IA64 architecture defers floating point and integer division to software. To ensure correctness and maximum efficiency, Intel provides a number of recommended algorithms which can be called as subroutines or inlined by compilers and assembly language programmers. All these algorithms ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
Abstract. The IA64 architecture defers floating point and integer division to software. To ensure correctness and maximum efficiency, Intel provides a number of recommended algorithms which can be called as subroutines or inlined by compilers and assembly language programmers. All these algorithms have been subjected to formal verification using the HOL Light theorem prover. As well as improving our level of confidence in the algorithms, the formal verification process has led to a better understanding of the underlying theory, allowing some significant efficiency improvements. 1
Towards the postultimate libm
, 2005
"... This article presents advances on the subject of correctly rounded elementary functions since the publication of the libultim mathematical library developed by Ziv at IBM. This library showed that the average performance and memory overhead of correct rounding could be made negligible. However, the ..."
Abstract

Cited by 13 (8 self)
 Add to MetaCart
This article presents advances on the subject of correctly rounded elementary functions since the publication of the libultim mathematical library developed by Ziv at IBM. This library showed that the average performance and memory overhead of correct rounding could be made negligible. However, the worstcase overhead was still a factor 1000 or more. It is shown here that, with current processor technology, this worstcase overhead can be kept within a factor of 2 to 10 of current best libms. This low overhead has very positive consequences on the techniques for implementing and proving correctly rounded functions, which are also studied. These results lift the last technical obstacles to a generalisation of (at least some) correctly rounded double precision elementary functions.
Some functions computable with a fusedmac
 in Proceedings of the 17th Symposium on Computer Arithmetic, P. Montuschi and E. Schwarz, Eds., Cape Cod
, 2005
"... The fused multiply accumulate instruction (fusedmac) that is available on some current processors such as the Power PC or the Itanium eases some calculations. We give examples of some floatingpoint functions (such as ulp(x) or Nextafter(x, y)), or some useful tests, that are easily computable usin ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
The fused multiply accumulate instruction (fusedmac) that is available on some current processors such as the Power PC or the Itanium eases some calculations. We give examples of some floatingpoint functions (such as ulp(x) or Nextafter(x, y)), or some useful tests, that are easily computable using a fusedmac. Then, we show that, with rounding to the nearest, the error of a fusedmac instruction is exactly representable as the sum of two floatingpoint numbers. We give an algorithm that computes that error. 1
Theorems on efficient argument reductions
 Proceedings of the 16th IEEE Symposium on Computer Arithmetic (ARITH16
, 2003
"... A commonly used argument reduction technique in elementary function computations begins with two positive floating point numbers α and γ that approximate (usually irrational but not necessarily) numbers 1/C and C, e.g., C = 2π for trigonometric functions and ln 2 for e x. Given an argument to the fu ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
A commonly used argument reduction technique in elementary function computations begins with two positive floating point numbers α and γ that approximate (usually irrational but not necessarily) numbers 1/C and C, e.g., C = 2π for trigonometric functions and ln 2 for e x. Given an argument to the function of interest it extracts z as defined by xα = z + ς with z = k2 −N and ς  ≤ 2 −N−1, where k, N are integers and N ≥ 0 is preselected, and then computes u = x − zγ. Usually zγ takes more bits than the working precision provides for storing its significand, and thus exact x − zγ may not be represented exactly by a floating point number of the same precision. This will cause performance penalty when the working precision is the highest available on the underlying hardware and thus considerable extra work is needed to get all the bits of x − zγ right. This paper presents theorems that show under mild conditions that can be easily met on today’s computer hardware and still allow α ≈ 1/C and γ ≈ C to almost the full working precision, x − zγ is a floating point number of the same precision. An algorithmic procedure based on the theorems is obtained. The results will enhance performance, in particular on machines that has hardware support for fusedmultiplyadd (fma) instruction(s). 1
A parameterizable floatingpoint logarithm operator for FPGAs
 IN 39TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS. IEEE SIGNAL PROCESSING SOCIETY
, 2005
"... As FPGAs are increasingly being used for floatingpoint computing, a parameterized floatingpoint logarithm operator is presented. In single precision, this operator uses a small fraction of the FPGA’s resources, has a smaller latency than its software equivalent on a highend processor, and provid ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
As FPGAs are increasingly being used for floatingpoint computing, a parameterized floatingpoint logarithm operator is presented. In single precision, this operator uses a small fraction of the FPGA’s resources, has a smaller latency than its software equivalent on a highend processor, and provides about ten times the throughput in pipelined version. Previous work had shown that FPGAs could use massive parallelism to balance the poor performance of their basic floatingpoint operators compared to the equivalent in processors. As this work shows, when evaluating an elementary function, the flexibility of FPGAs provides much better performance than the processor without even resorting to parallelism. The presented operator is freely available from
Parameterized floatingpoint logarithm and exponential functions for FPGAs
 Microprocessors and Microsystems, Special Issue on FPGAbased Reconfigurable Computing
"... As FPGAs are increasingly being used for floatingpoint computing, the feasibility of a library of floatingpoint elementary functions for FPGAs is discussed. An initial implementation of such a library contains parameterized operators for the logarithm and exponential functions. In single precision ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
As FPGAs are increasingly being used for floatingpoint computing, the feasibility of a library of floatingpoint elementary functions for FPGAs is discussed. An initial implementation of such a library contains parameterized operators for the logarithm and exponential functions. In single precision, those operators use a small fraction of the FPGA’s resources, have a smaller latency than their software equivalent on a highend processor, and provide about ten times the throughput in pipelined version. Previous work had shown that FPGAs could use massive parallelism to balance the poor performance of their basic floatingpoint operators compared to the equivalent in processors. As this work shows, when evaluating an elementary function, the flexibility of FPGAs provides much better performance than the processor without even resorting to parallelism. The presented library is freely available from
Efficient polynomial L∞ approximations
 In Proceedings of the 18th IEEE Symposium on Computer Arithmetic (ARITH18). IEEE Computer
"... We address the problem of computing a good floatingpointcoefficient polynomial approximation to a function, with respect to the supremum norm. This is a key step in most processes of evaluation of a function. We present a fast and efficient method, based on lattice basis reduction, that often give ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
We address the problem of computing a good floatingpointcoefficient polynomial approximation to a function, with respect to the supremum norm. This is a key step in most processes of evaluation of a function. We present a fast and efficient method, based on lattice basis reduction, that often gives the best polynomial possible and most of the time returns a very good approximation.
Certifying the floatingpoint implementation of an elementary function using Gappa
 IEEE TRANSACTIONS ON COMPUTERS, 2010. 9 HTTP://DX.DOI.ORG/10.1145/1772954.1772987 10 HTTP://DX.DOI.ORG/10.1145/1838599.1838622 11 HTTP://SHEMESH.LARC.NASA.GOV/NFM2010/PAPERS/NFM2010_14_23.PDF 12 HTTP://DX.DOI.ORG/10.1007/9783642142031_11 13 HTTP://DX.
, 2011
"... High confidence in floatingpoint programs requires proving numerical properties of final and intermediate values. One may need to guarantee that a value stays within some range, or that the error relative to some ideal value is well bounded. This certification may require a timeconsuming proof fo ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
High confidence in floatingpoint programs requires proving numerical properties of final and intermediate values. One may need to guarantee that a value stays within some range, or that the error relative to some ideal value is well bounded. This certification may require a timeconsuming proof for each line of code, and it is usually broken by the smallest change to the code, e.g., for maintenance or optimization purpose. Certifying floatingpoint programs by hand is, therefore, very tedious and errorprone. The Gappa proof assistant is designed to make this task both easier and more secure, due to the following novel features: It automates the evaluation and propagation of rounding errors using interval arithmetic. Its input format is very close to the actual code to validate. It can be used incrementally to prove complex mathematical properties pertaining to the code. It generates a formal proof of the results, which can be checked independently by a lower level proof assistant like Coq. Yet it does not require any specific knowledge about automatic theorem proving, and thus, is accessible to a wide community. This paper demonstrates the practical use of this tool for a widely used class of floatingpoint programs: implementations of elementary functions in a mathematical library.