Results 1  10
of
25
A parameterized floatingpoint exponential function for FPGAs
 IN IEEE INTERNATIONAL CONFERENCE ON FIELDPROGRAMMABLE TECHNOLOGY (FPT’05
, 2005
"... A parameterized floatingpoint exponential operator is presented. In singleprecision, it uses a small fraction of the FPGA’s resources and has a smaller latency than its software equivalent on a highend processor, and ten times the throughput in pipelined version. Previous work had shown that FPGA ..."
Abstract

Cited by 21 (10 self)
 Add to MetaCart
A parameterized floatingpoint exponential operator is presented. In singleprecision, it uses a small fraction of the FPGA’s resources and has a smaller latency than its software equivalent on a highend processor, and ten times the throughput in pipelined version. Previous work had shown that FPGAs could use massive parallelism to balance the poor performance of their basic floatingpoint operators compared to the equivalent in processors. As this work shows, when evaluating an elementary function, the flexibility of FPGAs provides much better performance than the processor witout even resorting to parallelism.
When FPGAs are better at floatingpoint than microprocessors
 Proceedings of the International ACM/SIGDA Symposium on FieldProgrammable Gate Arrays
, 2008
"... It has been shown that FPGAs could outperform highend microprocessors on floatingpoint computations thanks to massive parallelism. However, most previous studies reimplement in the FPGA the operators present in a processor. This is a safe and relatively straightforward approach, but it doesn’t e ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
(Show Context)
It has been shown that FPGAs could outperform highend microprocessors on floatingpoint computations thanks to massive parallelism. However, most previous studies reimplement in the FPGA the operators present in a processor. This is a safe and relatively straightforward approach, but it doesn’t exploit the greater flexibility of the FPGA. This article is a survey of the many ways in which the FPGA implementation of a given floatingpoint computation can be not only faster, but also more accurate than its microprocessor counterpart. Techniques studied here include custom precision, specific accumulator design, dedicated architectures for coarser operators which have to be implemented in software in processors, and others. A realworld biomedical application illustrates these claims. This study also points to how current FPGA fabrics could be enhanced for better floatingpoint support. 1
A parameterizable floatingpoint logarithm operator for FPGAs
 IN 39TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS. IEEE SIGNAL PROCESSING SOCIETY
, 2005
"... As FPGAs are increasingly being used for floatingpoint computing, a parameterized floatingpoint logarithm operator is presented. In single precision, this operator uses a small fraction of the FPGA’s resources, has a smaller latency than its software equivalent on a highend processor, and provid ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
(Show Context)
As FPGAs are increasingly being used for floatingpoint computing, a parameterized floatingpoint logarithm operator is presented. In single precision, this operator uses a small fraction of the FPGA’s resources, has a smaller latency than its software equivalent on a highend processor, and provides about ten times the throughput in pipelined version. Previous work had shown that FPGAs could use massive parallelism to balance the poor performance of their basic floatingpoint operators compared to the equivalent in processors. As this work shows, when evaluating an elementary function, the flexibility of FPGAs provides much better performance than the processor without even resorting to parallelism. The presented operator is freely available from
An FPGAspecific Approach to FloatingPoint Accumulation and SumofProducts
 FieldProgrammable Technology, IEEE
, 2008
"... This article studies two common situations where the flexibility of FPGAs allows one to design applicationspecific floatingpoint operators which are more efficient and more accurate than those offered by processors and GPUs. First, for applications involving the addition of a large number of floati ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
(Show Context)
This article studies two common situations where the flexibility of FPGAs allows one to design applicationspecific floatingpoint operators which are more efficient and more accurate than those offered by processors and GPUs. First, for applications involving the addition of a large number of floatingpoint values, an adhoc accumulator is proposed. By tailoring its parameters to the numerical requirements of the application, it can be made arbitrarily accurate, at an area cost comparable to that of a standard floatingpoint adder, and at a higher frequency. The second example is the sumofproduct operation, which is the building block of matrix computations. A novel architecture is proposed that feeds the previous accumulator out of a floatingpoint multiplier whose rounding logic has been removed, again improving the area/accuracy tradeoff. These architectures are implemented within the FloPoCo generator, freely available under the LGPL. 1.
Parameterized floatingpoint logarithm and exponential functions for FPGAs
 Microprocessors and Microsystems, Special Issue on FPGAbased Reconfigurable Computing
"... As FPGAs are increasingly being used for floatingpoint computing, the feasibility of a library of floatingpoint elementary functions for FPGAs is discussed. An initial implementation of such a library contains parameterized operators for the logarithm and exponential functions. In single precision ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
As FPGAs are increasingly being used for floatingpoint computing, the feasibility of a library of floatingpoint elementary functions for FPGAs is discussed. An initial implementation of such a library contains parameterized operators for the logarithm and exponential functions. In single precision, those operators use a small fraction of the FPGA’s resources, have a smaller latency than their software equivalent on a highend processor, and provide about ten times the throughput in pipelined version. Previous work had shown that FPGAs could use massive parallelism to balance the poor performance of their basic floatingpoint operators compared to the equivalent in processors. As this work shows, when evaluating an elementary function, the flexibility of FPGAs provides much better performance than the processor without even resorting to parallelism. The presented library is freely available from
Floating Point or LNS: Choosing the Right Arithmetic on an Application Basis
 In Proceedings of the 9 th EUROMICRO Conference on Digital System Design
, 2006
"... For applications requiring a large dynamic range, real numbers may be represented either in floatingpoint (FP), or in the logarithm number system (LNS). Which system is best for a given application is difficult to know in advance, because the cost and performance of LNS operators depend on the targ ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
For applications requiring a large dynamic range, real numbers may be represented either in floatingpoint (FP), or in the logarithm number system (LNS). Which system is best for a given application is difficult to know in advance, because the cost and performance of LNS operators depend on the target accuracy in a highly non linear way. In doubt, designers will choose floatingpoint. This article demonstrates a methodology for a better informed choice thanks to FPLibrary, a freely available, dual FP/LNS arithmetic operator library. FPLibrary may be used in the prototype phase of an application to obtain, with low design effort, accurate measures of performance, cost and accuracy of both LNS and FP approaches. Two case studies demonstrate the benefits of this methodology. 1
A DualPurpose Real/Complex Logarithmic Number System ALU
 19TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER ARITHMETIC
, 2009
"... The real Logarithmic Number System (LNS) allows fast and inexpensive multiplication and division but more expensive addition and subtraction as precision increases. Recent advances in higherorder and multipartite table methods, together with cotransformation, allow real LNS ALUs to be implemented e ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
The real Logarithmic Number System (LNS) allows fast and inexpensive multiplication and division but more expensive addition and subtraction as precision increases. Recent advances in higherorder and multipartite table methods, together with cotransformation, allow real LNS ALUs to be implemented effectively on FPGAs for a wide variety of mediumprecision specialpurpose applications. The Complex LNS (CLNS) is a generalization of LNS which represents complex values in logpolar form. CLNS is a more compact representation than traditional rectangular methods, reducing the cost of busses and memory in intensive complexnumber applications like the FFT; however, prior CLNS implementations were either slow CORDICbased or expensive 2Dtablebased approaches. This paper attempts to leverage the recent advances made in realvalued LNS units for the more specialized context of CLNS. This paper proposes a novel approach to reduce the cost of CLNS addition by reusing a conventional realvalued LNS ALU with specialized CLNS hardware that is smaller than the realvalued LNS ALU to which it is attached. The resulting ALU is much less expensive than prior fast CLNS units at the cost of some extra delay. The extra hardware added to the ALU is for trigonometricrelated functions, and may be useful in LNS applications other than CLNS. The novel algorithm proposed here is implemented using the FloPoCo library (which incorporates recent HOTBM advances in functionunit generation), and FPGA synthesis results are reported.
Multiplicative square root algorithms for FPGAs
 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS
, 2010
"... Most current square root implementations for FPGAs use a digit recurrence algorithm which is well suited to their LUT structure. However, recent computingoriented FPGAs include embedded multipliers and RAM blocks which can also be used to implement quadratic convergence algorithms, very high radix ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Most current square root implementations for FPGAs use a digit recurrence algorithm which is well suited to their LUT structure. However, recent computingoriented FPGAs include embedded multipliers and RAM blocks which can also be used to implement quadratic convergence algorithms, very high radix digit recurrences, or polynomial approximation algorithms. The cost of these solutions is evaluated and compared, and a complete implementation of a polynomial approach is presented within the opensource FloPoCo framework. This polynomial approach allows a shorter latency and higher frequency than the digit recurrence approach, and improves over previous multiplicative approaches. However, the cost of IEEEcompliant correct rounding is shown to be very high.
Luk: “FPGA Designs with Optimized Logarithmic Arithmetic
"... Abstract—Using a general polynomial approximation approach, we present an arithmetic library generator for the logarithmic number system (LNS). The generator produces optimized LNS arithmetic libraries that improve significantly over previous LNS designs on area and latency. We also provide area cos ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Using a general polynomial approximation approach, we present an arithmetic library generator for the logarithmic number system (LNS). The generator produces optimized LNS arithmetic libraries that improve significantly over previous LNS designs on area and latency. We also provide area cost estimation and bitaccurate simulation tools that facilitate comparison between LNS and floatingpoint designs. Index Terms—Reconfigurable hardware, specialpurpose and applicationbased systems, computer systems organization, computer arithmetic, general, numerical analysis, mathematics of computing.
FPGAbased acceleration of the computations involved in transcranial magnetic stimulation
 In Southern Programmable Logic Conference
, 2008
"... In the last years the interest for magnetic stimulation of the human nervous tissue has increased, because this technique has proved its utility and applicability both as a diagnostic and as a treatment instrument. Research in this domain is aimed at eliminating some disadvantages of the technique: ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
In the last years the interest for magnetic stimulation of the human nervous tissue has increased, because this technique has proved its utility and applicability both as a diagnostic and as a treatment instrument. Research in this domain is aimed at eliminating some disadvantages of the technique: the lack of focalization of the stimulated human body region and the reduced efficiency of the energetic transfer from the stimulating coil to the tissue. Designing better stimulation coils is so far a trialanderror process, relying on very computeintensive simulations. In software, such a simulation has a very high running time (several hours for complicated geometries of the coils). This paper proposes and demonstrates an FPGAbased hardware implementation of this simulation, which reduces the computation time by 23 orders of magnitude. Thanks to this powerful tool, some significant improvements in the design of the coils have already been obtained. 1.