Results 1  10
of
17
A library of parameterizable floatingpoint cores for FPGAs and their application to scientific computing
 In Proc. of International Conference on Engineering Reconfigurable Systems and Algorithms
, 2005
"... Abstract — Advances in field programmable gate arrays (FPGAs), which are the platform of choice for reconfigurable computing, have made it possible to use FPGAs in increasingly many areas of computing, including complex scientific applications. These applications demand high performance and highpr ..."
Abstract

Cited by 22 (9 self)
 Add to MetaCart
(Show Context)
Abstract — Advances in field programmable gate arrays (FPGAs), which are the platform of choice for reconfigurable computing, have made it possible to use FPGAs in increasingly many areas of computing, including complex scientific applications. These applications demand high performance and highprecision, floatingpoint arithmetic. Until now, most of the research has not focussed on compliance with IEEE standard 754, focusing instead upon custom formats and bitwidths. In this paper, we present doubleprecision floatingpoint cores that are parameterized by their degree of pipelining and the features of IEEE standard 754 that they implement. We then analyze the effects of supporting the standard when these cores are used in an FPGAbased accelerator for LennardJones force and potential calculations that are part of molecular dynamics (MD) simulations. I.
Exponential: Implementation TradeOffs for Hundred Bit Precision
, 2000
"... The development of processors has given rise to problems that need more than double precision arithmetic. Some of them are known to require very long multiple precision numbers, but for some others, doubling the available precision to reach about 100 bits is sufficient. We propose an insight on the ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
The development of processors has given rise to problems that need more than double precision arithmetic. Some of them are known to require very long multiple precision numbers, but for some others, doubling the available precision to reach about 100 bits is sufficient. We propose an insight on the development of a library for the exponential function. Since the hardware is able to perform all the arithmetic operations on 53 bits, our exponential has to be based on a polynomial or a rational approximation. Our routines
ABSTRACT Automating CustomPrecision Function Evaluation for Embedded Processors
"... Due to resource and power constraints, embedded processors often cannot afford dedicated floatingpoint units. For instance, the IBM PowerPC processor embedded in Xilinx VirtexII Pro FPGAs only supports emulated floatingpoint arithmetic, which leads to slow operation when floatingpoint arithmetic ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Due to resource and power constraints, embedded processors often cannot afford dedicated floatingpoint units. For instance, the IBM PowerPC processor embedded in Xilinx VirtexII Pro FPGAs only supports emulated floatingpoint arithmetic, which leads to slow operation when floatingpoint arithmetic is desired. This paper presents a customizable mathematical library using fixedpoint arithmetic for elementary function evaluation. We approximate functions via polynomial or rational approximations depending on the userdefined accuracy requirements. The data representation for the inputs and outputs are compatible with IEEE singleprecision and doubleprecision floatingpoint formats. Results show that our 32bit polynomial method achieves over 80 times speedup over the singleprecision mathematical library from Xilinx, while our 64bit polynomial method achieves over 30 times speedup.
HighPerformance Floating Point Divide
 In Proceedings of the Euromicro Symposium on Digital System Design
, 2001
"... In modern processors floating point divide operations often take 20 to 25 clock cycles, five times that of multiplication. Typically multiplicative algorithms with quadratic convergence are used for highperformance divide. A divide unit based on the multiplicative NewtonRaphson iteration is propos ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
In modern processors floating point divide operations often take 20 to 25 clock cycles, five times that of multiplication. Typically multiplicative algorithms with quadratic convergence are used for highperformance divide. A divide unit based on the multiplicative NewtonRaphson iteration is proposed. This divide unit utilizes the higherorder NewtonRaphson reciprocal approximation to compute the quotient fast, efficiently and with high throughput. The divide unit achieves fast execution by computing the square, cube and higher powers of the approximation directly and much faster than the traditional approach with serial multiplications. Additionally, the second, third, and higherorder terms are computed simultaneously further reducing the divide latency. Significant hardware reductions have been identified that reduce the overall computation significantly and therefore, reduce the area required for implementation and the power consumed by the computation. The proposed hardware unit is designed to achieve the desired quotient precision in a single iteration allowing the unit to be fully pipelined for maximum throughput. 1
Small FPGA polynomial approximations with 3bit coefficients and lowprecision estimations of the powers of x
, 2005
"... ..."
Adaptive FPGA NoCbased Architecture for Multispectral Image Correlation
 IN PROC. IS&T CGIV&MCS08, IS&T’S FOURTH EUROPEAN CONFERENCE ON COLOUR IN GRAPHICS, IMAGING, AND VISION, AND MCS’2008, THE 10TH INTERNATIONAL SYMPOSIUM ON MULTISPECTRAL COLOUR SCIENCE
, 2008
"... An adaptive FPGA architecture based on the NoC (NetworkonChip) approach is used for the multispectral image correlation. This architecture must contain several distance algorithms depending on the characteristics of spectral images and the precision of the authentication. The analysis of distance ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
An adaptive FPGA architecture based on the NoC (NetworkonChip) approach is used for the multispectral image correlation. This architecture must contain several distance algorithms depending on the characteristics of spectral images and the precision of the authentication. The analysis of distance algorithms is required which bases on the algorithmic complexity, result precision, execution time and the adaptability of the implementation. This paper presents the comparison of these distance computation algorithms on one spectral database. The result of a RGB algorithm implementation was discussed.
Hardware Operators for Function Evaluation Using SparseCoefficient Polynomials, in "Electronic Letters", to appear, 2006. Arénaire 21
"... ..."
(Show Context)
M.: Variable precision floating point division and square root
 In: Workshop on High Performance Embedded Computing
, 2004
"... ..."
(Show Context)
unknown title
"... Abstract—It is well known that the Viterbi and Viterbi MonomialBased Phase Estimator, which includes the M th Power Estimator, performs poorly for cross QAM signals. However, it is shown here that by allowing the power of the monomial to be negative, much improved performance can be realized at med ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—It is well known that the Viterbi and Viterbi MonomialBased Phase Estimator, which includes the M th Power Estimator, performs poorly for cross QAM signals. However, it is shown here that by allowing the power of the monomial to be negative, much improved performance can be realized at medium to high signaltonoise ratios (SNR). Monte Carlo simulations are used to demonstrate the efficacy of this novel simple extension, for 32and 128QAM systems. In principle, this extension can also be applied to other constellations, e.g., (4,12)PSK. Keywords—Synchronization, blind phase estimation, quadrature amplitude modulation, blind carrier phase recovery. T A Simple Improvement to the Viterbi and
High–Radix Iterative Algorithm for Powering Computation
"... A highradix composite algorithm for the computation of the powering function (¤¦ ¥ ) is presented in this paper. The algorithm consists of a sequence of overlapped operations: (i) digitrecurrence logarithm, (ii) lefttoright carryfree (LRCF) multiplications, and (iii) online exponential. A redun ..."
Abstract
 Add to MetaCart
(Show Context)
A highradix composite algorithm for the computation of the powering function (¤¦ ¥ ) is presented in this paper. The algorithm consists of a sequence of overlapped operations: (i) digitrecurrence logarithm, (ii) lefttoright carryfree (LRCF) multiplications, and (iii) online exponential. A redundant number system is used, and the selection in (i) and (iii) is done by rounding except from the first iteration, when selection by table lookup is necessary to guarantee the convergence of the recurrences. A sequential implementation of the algorithm is proposed, and the execution times and hardware requirements are estimated for single and doubleprecision floatingpoint computations, for radix §©¨����� �, showing that powering can be computed with similar performance as highradix CORDIC algorithms. 1