Results 1 - 10
of
14
Some improvements on multipartite table methods
- 15th IEEE Symposium on Computer Arithmetic
, 2001
"... This paper presents an unified view of most previous table-lookup-and-addition methods: bipartite tables, SBTM, STAM and multipartite methods. This new definition allows a more accurate computation of the error entailed by these methods. Being more general, it also allows an exhaustive design space ..."
Abstract
-
Cited by 28 (9 self)
- Add to MetaCart
This paper presents an unified view of most previous table-lookup-and-addition methods: bipartite tables, SBTM, STAM and multipartite methods. This new definition allows a more accurate computation of the error entailed by these methods. Being more general, it also allows an exhaustive design space exploration which has been implemented, and leads to tables smaller than previously published ones by up to 50%. Some results have been synthesised for Virtex FPGAs, and are discussed in this paper. 1
Table-based polynomials for fast hardware function evaluation
- 16th IEEE International Conference on ApplicationSpecific Systems, Architectures, and Processors (ASAP’05
, 2005
"... Many general table-based methods for the evaluation in hardware of elementary functions have been published. The bipartite and multipartite methods implement a first-order approximation of the function using only table lookups and additions. Recently, a single-multiplier second-order method of simil ..."
Abstract
-
Cited by 20 (9 self)
- Add to MetaCart
Many general table-based methods for the evaluation in hardware of elementary functions have been published. The bipartite and multipartite methods implement a first-order approximation of the function using only table lookups and additions. Recently, a single-multiplier second-order method of similar inspiration has also been published. This paper extends such methods to approximations of arbitrary order, using adders, small multipliers, and very small ad-hoc powering units. We obtain implementations that are both smaller and faster than previously published approaches. This paper also deals with the FPGA implementation of such methods. Previous work have consistently shown that increasing the approximation degree lead to not only smaller but also faster designs, as the reduction of the table size meant a reduction of its lookup time, which compensated for the addition and multiplication time. The experiments in this paper suggest that this still holds when going from order 2 to order 3, but no longer when using higherorder approximations, where a tradeoff appears. 1.
Hierarchical segmentation schemes for function evaluation
- In IEEE Conference on Field-Programmable Technology
, 2003
"... This paper presents a method for evaluating functions based on piecewise polynomial approximation with a novel hierarchical segmentation scheme. The use of a novel hierarchy scheme of uniform segments and segments with size varying by powers of two enables us to approximate nonlinear regions of a fu ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
This paper presents a method for evaluating functions based on piecewise polynomial approximation with a novel hierarchical segmentation scheme. The use of a novel hierarchy scheme of uniform segments and segments with size varying by powers of two enables us to approximate nonlinear regions of a function particularly well. This partitioning is automated: efficient look-up tables and their coefficients are generated for a given function, input range, order of the polynomials, desired accuracy and finite precision constraints. We describe an algorithm to find the optimum number of segments and the placement of their boundaries, which is used to analyze the properties of a function and to benchmark our approach. Our method is illustrated using three non-linear compound functions, √ − log(x), x log(x) and a high order rational function. We present results for various operand sizes between 8 and 24 bits for first and second order polynomial approximations. 1
Return of the hardware floating-point elementary function
- in 18th Symposium on Computer Arithmetic. IEEE
, 2007
"... The study of specific hardware circuits for the evaluation of floating-point elementary functions was once an active research area, until it was realized that these functions were not frequent enough to justify dedicating silicon to them. Research then turned to software functions. This situation ma ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
The study of specific hardware circuits for the evaluation of floating-point elementary functions was once an active research area, until it was realized that these functions were not frequent enough to justify dedicating silicon to them. Research then turned to software functions. This situation may be about to change again with the advent of reconfigurable co-processors based on field-programmable gate arrays. Such co-processors now have a capacity that allows them to accomodate double-precision floating-point computing. Hardware operators for elementary functions targeted to such platforms have the potential to vastly outperform software functions, and will not permanently waste silicon resources. This article studies the optimization, for this target technology, of operators for the exponential and logarithm functions up to double-precision. These operators are freely available from www.ens-lyon.fr/LIP/ Arenaire/. Keywords Floating-point elementary functions, hardware
Dinechin, “Multipartite tables in JBits for the evaluation of functions on FPGAs
- Proc. IEEE Int. Parallel and Distributed Processing Symp
, 2002
"... This paper presents the implementation, on Virtex FPGAs, of a core generator for arbitrary numeric functions in fixed-point format. The cores use the state-of-theart multipartite table method, which allows input and output precisions in the range of 8 to 24 bits on current Virtex chips. The implemen ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper presents the implementation, on Virtex FPGAs, of a core generator for arbitrary numeric functions in fixed-point format. The cores use the state-of-theart multipartite table method, which allows input and output precisions in the range of 8 to 24 bits on current Virtex chips. The implementation uses the JBits API to embed elaborate optimisation techniques in the description of the hardware. 1
Adaptive range reduction for hardware function evaluation
- In Proc. IEEE Int’l Conf. on Field-Programmable Technology
, 2004
"... Function evaluation f(x) typically consists of range reduction and the actual function evaluation on a small interval. In this paper, we investigate optimization of range reduction given the range and precision of x and f(x). For every function evaluation there exists a convenient interval such as [ ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Function evaluation f(x) typically consists of range reduction and the actual function evaluation on a small interval. In this paper, we investigate optimization of range reduction given the range and precision of x and f(x). For every function evaluation there exists a convenient interval such as [0,π/2) for sin(x). The adaptive range reduction method, which we propose in this work, involves deciding whether range reduction can be used effectively for a particular design. The decision depends on the function being evaluated, precision, and optimization metrics such as area, latency and throughput. In addition, the input and output range has an impact on the preferable function evaluation method such as polynomial, table-based, or combinations of the two. We explore this vast design space of adaptive range reduction for fixed-point sin(x), log(x) and √ x accurate to one unit in the last place using MATLAB and ASC, A Stream Compiler. These tools enable us to study over 1000 designs resulting in over 40 million Xilinx equivalent circuit gates, in a few hours ’ time. The final objective is to progress towards a fully automated library that provides optimal function evaluation hardware units given input/output range and precision. 1
Programmable Numerical Function Generators Based on Quadratic Approximation
- Architecture and Synthesis Method,” Proc. Asia and South Pacific Design Automation Conf. (ASPDAC ’06
, 2006
"... Abstract — This paper presents an architecture and a synthesis method for programmable numerical function generators (NFGs) for trigonometric, logarithmic, square root, and reciprocal functions. Our NFG partitions a given domain of the function into non-uniform segments using an LUT cascade, and app ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract — This paper presents an architecture and a synthesis method for programmable numerical function generators (NFGs) for trigonometric, logarithmic, square root, and reciprocal functions. Our NFG partitions a given domain of the function into non-uniform segments using an LUT cascade, and approximates the given function by a quadratic polynomial for each segment. Thus, we can implement fast and compact NFGs for a wide range of functions. Implementation results on an FPGA show that: 1) our NFGs require only 4 % of the memory needed by NFGs based on the linear approximation with non-uniform segmentation; and 2) our NFGs require only 22 % of the memory needed by NFGs based on the 5th-order approximation with uniform segmentation. Our automatic synthesis system generates such compact NFGs quickly. I.
Arbitrary function approximation in HDLs with application to the nbody problem
- In 2003 IEEE International Conference on Field-Programmable Technology (FPT
, 2003
"... A module generator is described that allows for the generation of synthesizable VHDL modules which implement arbitrary functions in fixed point precision using the Symmetric Table Addition Method (STAM). This module generator was interfaced to a high level synthesis tool “fly” which automatically ge ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
A module generator is described that allows for the generation of synthesizable VHDL modules which implement arbitrary functions in fixed point precision using the Symmetric Table Addition Method (STAM). This module generator was interfaced to a high level synthesis tool “fly” which automatically generates fully-pipelined circuits from a Perl-like language. The resulting system was applied to the N-body problem and results are presented. It was found that a function generator module is a very useful addition to a hardware description language. 1
"Partially rounded" Small-Order Approximations for Accurate, . . .
, 2002
"... We aim at evaluating elementary and special functions using small tables and small, rectangular, multipliers. To do that, we show how accurate polynomial approximations whose order-1 coefficients are small in size (a few bits only) can be computed. We compare the obtained results with similar work i ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We aim at evaluating elementary and special functions using small tables and small, rectangular, multipliers. To do that, we show how accurate polynomial approximations whose order-1 coefficients are small in size (a few bits only) can be computed. We compare the obtained results with similar work in the recent literature.
Low Precision Table Based Complex Reciprocal Approximation
"... Abstract—A recently proposed complex valued division algorithm[1] designed for efficient hardware implementations requires a prescaling step by a constant factor. Techniques for obtaining this prescaling factor have been mentioned by the authors, which serves to justify the feasibility of the algori ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—A recently proposed complex valued division algorithm[1] designed for efficient hardware implementations requires a prescaling step by a constant factor. Techniques for obtaining this prescaling factor have been mentioned by the authors, which serves to justify the feasibility of the algorithm but is inadequate for obtaining efficient implementations. Table based solutions are formulated in this paper for obtaining the prescaling factor, a low precision reciprocal approximation for a complex value, using techniques adopted from univariate function approximations. Two separate designs are proposed, one using a single table (a reference design) and another using generalized multipartite tables. The main contribution of this work is the extension of generalized multipartite table methods to a function of two variables. The multipartite tables derived were up to 67% more memory efficient than their single table counterparts. I.

