Results 1  10
of
205
MPFR: A multipleprecision binary floatingpoint library with correct rounding
 ACM Trans. Math. Softw
, 2007
"... This paper presents a multipleprecision binary floatingpoint library, written in the ISO C language, and based on the GNU MP library. Its particularity is to extend to arbitraryprecision ideas from the IEEE 754 standard, by providing correct rounding and exceptions. We demonstrate how these stron ..."
Abstract

Cited by 80 (16 self)
 Add to MetaCart
This paper presents a multipleprecision binary floatingpoint library, written in the ISO C language, and based on the GNU MP library. Its particularity is to extend to arbitraryprecision ideas from the IEEE 754 standard, by providing correct rounding and exceptions. We demonstrate how these strong semantics are achieved — with no significant slowdown with respect to other arbitraryprecision tools — and discuss a few applications where such a library can be useful. Categories and Subject Descriptors: D.3.0 [Programming Languages]: General—Standards; G.1.0 [Numerical Analysis]: General—computer arithmetic, multiple precision arithmetic; G.1.2 [Numerical Analysis]: Approximation—elementary and special function approximation; G 4 [Mathematics of Computing]: Mathematical Software—algorithm design, efficiency, portability
The Symmetric Table Addition Method for Accurate Function Approximation
 Journal of VLSI Signal Processing
, 1999
"... . This paper presents a highspeed method for computing elementary functions using parallel table lookups and multioperand addition. Increasing the number of tables and inputs to the multioperand adder significantly reduces the amount of memory required. Symmetry and leading zeros in the table co ..."
Abstract

Cited by 37 (2 self)
 Add to MetaCart
. This paper presents a highspeed method for computing elementary functions using parallel table lookups and multioperand addition. Increasing the number of tables and inputs to the multioperand adder significantly reduces the amount of memory required. Symmetry and leading zeros in the table coefficients are used to reduce the amount of memory even further. This method has a closedform solution for the table entries and can be applied to any differentiable function. For 24bit operands, this method requires two to three orders of magnitude less memory than conventional table lookups. Keywords: Elementary functions, table lookups, approximations, multioperand addition, computer arithmetic, hardware design. 1. Introduction Elementary function approximations are important in scientific computing, computer graphics, and digital signal processing applications. In the systolic array implementation of Cholesky decomposition, presented in [1], 30% of the cells approximate reciprocals...
A MachineChecked Theory of Floating Point Arithmetic
, 1999
"... . Intel is applying formal verification to various pieces of mathematical software used in Merced, the first implementation of the new IA64 architecture. This paper discusses the development of a generic floating point library giving definitions of the fundamental terms and containing formal pr ..."
Abstract

Cited by 35 (5 self)
 Add to MetaCart
. Intel is applying formal verification to various pieces of mathematical software used in Merced, the first implementation of the new IA64 architecture. This paper discusses the development of a generic floating point library giving definitions of the fundamental terms and containing formal proofs of important lemmas. We also briefly describe how this has been used in the verification effort so far. 1 Introduction IA64 is a new 64bit computer architecture jointly developed by HewlettPackard and Intel, and the forthcoming Merced chip from Intel will be its first silicon implementation. To avoid some of the limitations of traditional architectures, IA64 incorporates a unique combination of features, including an instruction format encoding parallelism explicitly, instruction predication, and speculative /advanced loads [4]. Nevertheless, it also offers full upwardscompatibility with IA32 (x86) code. 1 IA64 incorporates a number of floating point operations, the centerpi...
Some improvements on multipartite table methods
 15th IEEE Symposium on Computer Arithmetic
, 2001
"... This paper presents an unified view of most previous tablelookupandaddition methods: bipartite tables, SBTM, STAM and multipartite methods. This new definition allows a more accurate computation of the error entailed by these methods. Being more general, it also allows an exhaustive design space ..."
Abstract

Cited by 32 (10 self)
 Add to MetaCart
This paper presents an unified view of most previous tablelookupandaddition methods: bipartite tables, SBTM, STAM and multipartite methods. This new definition allows a more accurate computation of the error entailed by these methods. Being more general, it also allows an exhaustive design space exploration which has been implemented, and leads to tables smaller than previously published ones by up to 50%. Some results have been synthesised for Virtex FPGAs, and are discussed in this paper. 1
Tablebased polynomials for fast hardware function evaluation
 16th IEEE International Conference on ApplicationSpecific Systems, Architectures, and Processors (ASAP’05
, 2005
"... Many general tablebased methods for the evaluation in hardware of elementary functions have been published. The bipartite and multipartite methods implement a firstorder approximation of the function using only table lookups and additions. Recently, a singlemultiplier secondorder method of simil ..."
Abstract

Cited by 32 (13 self)
 Add to MetaCart
Many general tablebased methods for the evaluation in hardware of elementary functions have been published. The bipartite and multipartite methods implement a firstorder approximation of the function using only table lookups and additions. Recently, a singlemultiplier secondorder method of similar inspiration has also been published. This paper extends such methods to approximations of arbitrary order, using adders, small multipliers, and very small adhoc powering units. We obtain implementations that are both smaller and faster than previously published approaches. This paper also deals with the FPGA implementation of such methods. Previous work have consistently shown that increasing the approximation degree lead to not only smaller but also faster designs, as the reduction of the table size meant a reduction of its lookup time, which compensated for the addition and multiplication time. The experiments in this paper suggest that this still holds when going from order 2 to order 3, but no longer when using higherorder approximations, where a tradeoff appears. 1.
Optimizing hardware function evaluation
 IEEE Transactions on Computers
, 2005
"... Abstract—We present a methodology and an automated system for function evaluation unit generation. Our system selects the best function evaluation hardware for a given function, accuracy requirements, technology mapping, and optimization metrics, such as area, throughput, and latency. Function evalu ..."
Abstract

Cited by 24 (4 self)
 Add to MetaCart
Abstract—We present a methodology and an automated system for function evaluation unit generation. Our system selects the best function evaluation hardware for a given function, accuracy requirements, technology mapping, and optimization metrics, such as area, throughput, and latency. Function evaluation fðxÞ typically consists of range reduction and the actual evaluation on a small convenient interval such as 0; =2Þ for sinðxÞ. We investigate the impact of hardware function evaluation with range reduction for a given range and precision of x and fðxÞ on area and speed. An automated bitwidth optimization technique for minimizing the sizes of the operators in the data paths is also proposed. We explore a vast design space for fixedpoint sinðxÞ, logðxÞ, and ffiffiffixp accurate to one unit in the last place using MATLAB and ASC, A Stream Compiler for FieldProgrammable Gate Arrays (FPGAs). In this study, we implement over 2,000 placedandrouted FPGA designs, resulting in over 100 million ApplicationSpecific Integrated Circuit (ASIC) equivalent gates. We provide optimal function evaluation results for range and precision combinations between 8 and 48 bits. Index Terms—Computer arithmetic, elementary function approximation, gate arrays, minimax approximation and algorithms, optimization. 1
A proven correctly rounded logarithm in doubleprecision
 In Real Numbers and Computers, Schloss Dagstuhl
, 2004
"... Abstract. This article is a case study in the implementation of a portable, proven and efficient correctly rounded elementary function in doubleprecision. We describe the methodology used to achieve these goals in the crlibm library. There are two novel aspects to this approach. The first is the pr ..."
Abstract

Cited by 24 (10 self)
 Add to MetaCart
Abstract. This article is a case study in the implementation of a portable, proven and efficient correctly rounded elementary function in doubleprecision. We describe the methodology used to achieve these goals in the crlibm library. There are two novel aspects to this approach. The first is the proof framework, and in general the techniques used to balance performance and provability. The second is the introduction of processorspecific optimization to get performance equivalent to the best current mathematical libraries, while trying to minimize the proof work. The implementation of the natural logarithm is detailed to illustrate these questions. Mathematics Subject Classification. 2604, 65D15, 65Y99. 1.
Assisted verification of elementary functions using Gappa
 In Proceedings of the 2006 ACM symposium on Applied computing
, 2006
"... The implementation of a correctly rounded or interval elementary function needs to be proven carefully in the very last details. The proof requires a tight bound on the overall error of the implementation with respect to the mathematical function. Such work is function specific, concerns tens of lin ..."
Abstract

Cited by 22 (8 self)
 Add to MetaCart
The implementation of a correctly rounded or interval elementary function needs to be proven carefully in the very last details. The proof requires a tight bound on the overall error of the implementation with respect to the mathematical function. Such work is function specific, concerns tens of lines of code for each function, and will usually be broken by the smallest change to the code (e.g. for maintenance or optimization purpose). Therefore, it is very tedious and errorprone if done by hand. This article discusses the use of the Gappa proof assistant in this context. Gappa has two main advantages over previous approaches: Its input format is very close to the actual C code to validate, and it automates error evaluation and propagation using interval arithmetic. Besides, it can be used to incrementally prove complex mathematical properties pertaining to the C code. Yet it does not require any specific knowledge about automatic theorem proving, and thus is accessible to a wider community. Moreover, Gappa may generate a formal proof of the results that can be checked independently by a lowerlevel proof assistant like Coq, hence providing an even higher confidence in the certification of the numerical code. 1.
The Computation of Transcendental Functions on the IA64 Architecture
 Intel Technology Journal
, 1999
"... The fast and accurate evaluation of transcendental functions (e.g. exp, log, sin, and atan) is vitally important in many fields of scientific computing. Intel provides a software library of these functions that can be called from both the C and FORTRAN programming languages. By exploiting some of th ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
The fast and accurate evaluation of transcendental functions (e.g. exp, log, sin, and atan) is vitally important in many fields of scientific computing. Intel provides a software library of these functions that can be called from both the C and FORTRAN programming languages. By exploiting some of the key features of the IA64 floatingpoint architecture, we have been able to provide doubleprecision transcendental functions that are highly accurate yet can typically be evaluated in between 50 and 70 clock cycles. In this paper, we discuss some of the design principles and implementation details of these functions.