Results 11  20
of
124
A parameterized floatingpoint exponential function for FPGAs
 IN IEEE INTERNATIONAL CONFERENCE ON FIELDPROGRAMMABLE TECHNOLOGY (FPT’05
, 2005
"... A parameterized floatingpoint exponential operator is presented. In singleprecision, it uses a small fraction of the FPGA’s resources and has a smaller latency than its software equivalent on a highend processor, and ten times the throughput in pipelined version. Previous work had shown that FPGA ..."
Abstract

Cited by 19 (8 self)
 Add to MetaCart
A parameterized floatingpoint exponential operator is presented. In singleprecision, it uses a small fraction of the FPGA’s resources and has a smaller latency than its software equivalent on a highend processor, and ten times the throughput in pipelined version. Previous work had shown that FPGAs could use massive parallelism to balance the poor performance of their basic floatingpoint operators compared to the equivalent in processors. As this work shows, when evaluating an elementary function, the flexibility of FPGAs provides much better performance than the processor witout even resorting to parallelism.
Return of the hardware floatingpoint elementary function
 in 18th Symposium on Computer Arithmetic. IEEE
, 2007
"... The study of specific hardware circuits for the evaluation of floatingpoint elementary functions was once an active research area, until it was realized that these functions were not frequent enough to justify dedicating silicon to them. Research then turned to software functions. This situation ma ..."
Abstract

Cited by 18 (10 self)
 Add to MetaCart
The study of specific hardware circuits for the evaluation of floatingpoint elementary functions was once an active research area, until it was realized that these functions were not frequent enough to justify dedicating silicon to them. Research then turned to software functions. This situation may be about to change again with the advent of reconfigurable coprocessors based on fieldprogrammable gate arrays. Such coprocessors now have a capacity that allows them to accomodate doubleprecision floatingpoint computing. Hardware operators for elementary functions targeted to such platforms have the potential to vastly outperform software functions, and will not permanently waste silicon resources. This article studies the optimization, for this target technology, of operators for the exponential and logarithm functions up to doubleprecision. These operators are freely available from www.enslyon.fr/LIP/ Arenaire/. Keywords Floatingpoint elementary functions, hardware
Assisted verification of elementary functions using Gappa
 In Proceedings of the 2006 ACM symposium on Applied computing
, 2006
"... The implementation of a correctly rounded or interval elementary function needs to be proven carefully in the very last details. The proof requires a tight bound on the overall error of the implementation with respect to the mathematical function. Such work is function specific, concerns tens of lin ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
The implementation of a correctly rounded or interval elementary function needs to be proven carefully in the very last details. The proof requires a tight bound on the overall error of the implementation with respect to the mathematical function. Such work is function specific, concerns tens of lines of code for each function, and will usually be broken by the smallest change to the code (e.g. for maintenance or optimization purpose). Therefore, it is very tedious and errorprone if done by hand. This article discusses the use of the Gappa proof assistant in this context. Gappa has two main advantages over previous approaches: Its input format is very close to the actual C code to validate, and it automates error evaluation and propagation using interval arithmetic. Besides, it can be used to incrementally prove complex mathematical properties pertaining to the C code. Yet it does not require any specific knowledge about automatic theorem proving, and thus is accessible to a wider community. Moreover, Gappa may generate a formal proof of the results that can be checked independently by a lowerlevel proof assistant like Coq, hence providing an even higher confidence in the certification of the numerical code. 1.
Reciprocation, Square Root, Inverse Square Root, and Some Elementary Functions Using Small Multipliers
, 1997
"... This paper deals with the computation of reciprocals, square roots, inverse square roots, and some elementary functions using small tables, small multipliers, and for some functions, a final "large" (almost fulllength) multiplication. We propose a method that allows fast evaluation of these functio ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
This paper deals with the computation of reciprocals, square roots, inverse square roots, and some elementary functions using small tables, small multipliers, and for some functions, a final "large" (almost fulllength) multiplication. We propose a method that allows fast evaluation of these functions in double precision arithmetic. The strength of this method is that the same scheme allows the computation of all these functions. Our method is mainly interesting for designing special purpose circuits, since it does not allow a simple implementation of the four rounding modes required by the IEEE754 standard for floatingpoint arithmetic.
Hierarchical segmentation schemes for function evaluation
 In IEEE Conference on FieldProgrammable Technology
, 2003
"... This paper presents a method for evaluating functions based on piecewise polynomial approximation with a novel hierarchical segmentation scheme. The use of a novel hierarchy scheme of uniform segments and segments with size varying by powers of two enables us to approximate nonlinear regions of a fu ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
This paper presents a method for evaluating functions based on piecewise polynomial approximation with a novel hierarchical segmentation scheme. The use of a novel hierarchy scheme of uniform segments and segments with size varying by powers of two enables us to approximate nonlinear regions of a function particularly well. This partitioning is automated: efficient lookup tables and their coefficients are generated for a given function, input range, order of the polynomials, desired accuracy and finite precision constraints. We describe an algorithm to find the optimum number of segments and the placement of their boundaries, which is used to analyze the properties of a function and to benchmark our approach. Our method is illustrated using three nonlinear compound functions, √ − log(x), x log(x) and a high order rational function. We present results for various operand sizes between 8 and 24 bits for first and second order polynomial approximations. 1
StReAm: ObjectOriented Programming of Stream Architectures using PAMBlox
 FieldProgrammable Logic and Applications, LNCS 1896
, 2000
"... Simplifying the programming models is paramount to the success of reconfigurable computing. We apply the principles of objectoriented programming to the design of stream architectures for reconfigurable computing. The resulting tool, StReAm, is a domain specific compiler on top of the objectorient ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
Simplifying the programming models is paramount to the success of reconfigurable computing. We apply the principles of objectoriented programming to the design of stream architectures for reconfigurable computing. The resulting tool, StReAm, is a domain specific compiler on top of the objectoriented module generation environment PAMBlox. Combining module generation with a highlevel programming tool in C++ gives the programmer the convenience to explore the flexibility of FPGAs on the arithmetic level and write the algorithms in the same language and environment. Stream architectures consist of the pipelined dataflow graph mapped directly to hardware. Data streams through the implementation of the dataflow graph with only minimal control logic overhead. The main advantage of stream architectures is a clockfrequency equal to the datarate leading to very low power consumption.
Proposal for a standardization of mathematical function implementation in floatingpoint arithmetic
 NUMERICAL ALGORITHMS
, 2004
"... ..."
Towards the postultimate libm
, 2005
"... This article presents advances on the subject of correctly rounded elementary functions since the publication of the libultim mathematical library developed by Ziv at IBM. This library showed that the average performance and memory overhead of correct rounding could be made negligible. However, the ..."
Abstract

Cited by 13 (8 self)
 Add to MetaCart
This article presents advances on the subject of correctly rounded elementary functions since the publication of the libultim mathematical library developed by Ziv at IBM. This library showed that the average performance and memory overhead of correct rounding could be made negligible. However, the worstcase overhead was still a factor 1000 or more. It is shown here that, with current processor technology, this worstcase overhead can be kept within a factor of 2 to 10 of current best libms. This low overhead has very positive consequences on the techniques for implementing and proving correctly rounded functions, which are also studied. These results lift the last technical obstacles to a generalisation of (at least some) correctly rounded double precision elementary functions.
A Hardware Gaussian Noise Generator Using the BoxMuller Method and Its Error Analysis
 IEEE Trans. Computers
, 2006
"... Abstract—We present a hardware Gaussian noise generator based on the BoxMuller method that provides highly accurate noise samples. The noise generator can be used as a key component in a hardwarebased simulation system, such as for exploring channel code behavior at very low bit error rates, as lo ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
Abstract—We present a hardware Gaussian noise generator based on the BoxMuller method that provides highly accurate noise samples. The noise generator can be used as a key component in a hardwarebased simulation system, such as for exploring channel code behavior at very low bit error rates, as low as 10 12 to 10 13. The main novelties of this work are accurate analytical error analysis and bitwidth optimization for the elementary functions involved in the BoxMuller method. Two 16bit noise samples are generated every clock cycle and, due to the accurate error analysis, every sample is analytically guaranteed to be accurate to one unit in the last place. An implementation on a Xilinx Virtex4 XC4VLX10012 FPGA occupies 1,452 slices, three block RAMs, and 12 DSP slices, and is capable of generating 750 million samples per second at a clock speed of 375 MHz. The performance can be improved by exploiting concurrent execution: 37 parallel instances of the noise generator at 95 MHz on a Xilinx VirtexII Pro XC2VP1007 FPGA generate seven billion samples per second and can run over 200 times faster than the output produced by software running on an Intel Pentium4 3 GHz PC. The noise generator is currently being used at the Jet Propulsion Laboratory, NASA to evaluate the performance of lowdensity paritycheck codes for deepspace communications. Index Terms—Algorithms implemented in hardware, computer arithmetic, error analysis, elementary function approximation, field programmable gate arrays, minimax approximation and algorithms, optimization, random number generation, simulation. 1
APPROXIMATION THEORY AND APPROXIMATION PRACTICE
"... — the constructive approximation of functions. ..."