Results 1 - 10
of
54
An Analysis Of Division Algorithms And Implementations
- IEEE Transactions on Computers
, 1995
"... Floating-point division is generally regarded as a low frequency, high latency operation in typical floating-point applications. However, the increasing emphasis on high performance graphics and the industry-wide usage of performance benchmarks forces processor designers to pay close attention to al ..."
Abstract
-
Cited by 44 (8 self)
- Add to MetaCart
Floating-point division is generally regarded as a low frequency, high latency operation in typical floating-point applications. However, the increasing emphasis on high performance graphics and the industry-wide usage of performance benchmarks forces processor designers to pay close attention to all aspects of floating-point computation. Many algorithms are suitable for implementing division in hardware. This paper presents four major classes of algorithms in a unified framework, namely digit recurrence, functional iteration, very high radix, and variable latency. Digit recurrence algorithms, the most common of which is SRT, use subtraction as the fundamental operator, and they converge to a quotient linearly. Division by functional iteration converges to a quotient quadratically using multiplication. Very high radix division algorithms are similar to digit recurrence algorithms, but they incorporate multiplication to reduce the latency. Variable latency division algorithms reduce the...
A Mechanically Checked Proof of the Correctness of the Kernel of the AMD5K86 Floating-Point Division Algorithm
- IEEE Transactions on Computers
, 1996
"... We describe a mechanically checked proof of the correctness of the kernel of the floating point division algorithm used on the AMD5K 86 microprocessor. The kernel is a non-restoring division algorithm that computes the floating point quotient of two double extended precision floating point numbers, ..."
Abstract
-
Cited by 28 (11 self)
- Add to MetaCart
We describe a mechanically checked proof of the correctness of the kernel of the floating point division algorithm used on the AMD5K 86 microprocessor. The kernel is a non-restoring division algorithm that computes the floating point quotient of two double extended precision floating point numbers, p and d (d 6= 0), with respect to a rounding mode, mode. The algorithm is defined in terms of floating point addition and multiplication. First, two NewtonRaphson iterations are used to compute a floating point approximation of the reciprocal of d. The result is used to compute four floating point quotient digits in the 24,,17 format (24 bits of precision and 17 bit exponents) which are then summed using appropriate rounding modes. We prove that if p and d are 64,,15 (possibly denormal) floating point numbers, d 6= 0 and mode specifies one of six rounding procedures and a desired precision 0 ! n 64, then the output of the algorithm is p=d rounded according to mode. We prove that every int...
Design Issues in Division and Other Floating-Point Operations
- IEEE Transactions on Computers
, 1997
"... Floating-point division is generally regarded as a low frequency, high latency operation in typical floating-point applications. However, in the worst case, a high latency hardware floating-point divider can contribute an additional 0.50 CPI to a system executing SPECfp92 applications. This paper ..."
Abstract
-
Cited by 19 (7 self)
- Add to MetaCart
Floating-point division is generally regarded as a low frequency, high latency operation in typical floating-point applications. However, in the worst case, a high latency hardware floating-point divider can contribute an additional 0.50 CPI to a system executing SPECfp92 applications. This paper presents the system performance impact of floating-point division latency for varying instruction issue rates. It also examines the performance implications of shared multiplication hardware, shared square root, on-the-fly rounding and conversion, and fused functional units. Using a system level study as a basis, it is shown how typical floating-point applications can guide the designer in making implementation decisions and trade-offs.
A tool for unbiased comparison between logarithmic and floating-point arithmetic
- LIP, École Normale Supérieure de
, 2004
"... arithmetic ..."
Design Issues In High Performance Floating Point Arithmetic Units
, 1996
"... In recent years computer applications have increased in their computational complexity. The industry-wide usage of performance benchmarks, such as SPECmarks, forces processor designers to pay particular attention to implementation of the floating point unit, or FPU. Special purpose applications, suc ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
In recent years computer applications have increased in their computational complexity. The industry-wide usage of performance benchmarks, such as SPECmarks, forces processor designers to pay particular attention to implementation of the floating point unit, or FPU. Special purpose applications, such as high performance graphics rendering systems, have placed further demands on processors. High speed floating point hardware is a requirement to meet these increasing demands. This work examines the state-of-the-art in FPU design and proposes techniques for improving the performance and the performance/area ratio of future FPUs. In recent FPUs, emphasis has been placed on designing ever-faster adders and multipliers, with division receiving less attention. The design space of FP dividers is large, comprising five different classes of division algorithms: digit recurrence, functional iteration, very high radix, table look-up, and variable latency. While division is an infrequent operation...
Reciprocation, Square Root, Inverse Square Root, and Some Elementary Functions Using Small Multipliers
, 1997
"... This paper deals with the computation of reciprocals, square roots, inverse square roots, and some elementary functions using small tables, small multipliers, and for some functions, a final "large" (almost full-length) multiplication. We propose a method that allows fast evaluation of these functio ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
This paper deals with the computation of reciprocals, square roots, inverse square roots, and some elementary functions using small tables, small multipliers, and for some functions, a final "large" (almost full-length) multiplication. We propose a method that allows fast evaluation of these functions in double precision arithmetic. The strength of this method is that the same scheme allows the computation of all these functions. Our method is mainly interesting for designing special purpose circuits, since it does not allow a simple implementation of the four rounding modes required by the IEEE-754 standard for floating-point arithmetic.
SRT Division Architectures and Implementations
- IN PROC. 13TH IEEE SYMP. COMPUTER ARITHMETIC
, 1997
"... SRT dividers are common in modern floating point units. Higher division performance is achieved by retiring more quotient bits in each cycle. Previous research has shown that realistic stages are limited to radix-2 and radix-4. Higher radix dividers are therefore formed by a combination of low-radix ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
SRT dividers are common in modern floating point units. Higher division performance is achieved by retiring more quotient bits in each cycle. Previous research has shown that realistic stages are limited to radix-2 and radix-4. Higher radix dividers are therefore formed by a combination of low-radix stages. In this paper, we present an analysis of the effects of radix-2 and radix-4 SRT divider architectures and circuit families on divider area and performance. We show the performance and area results for a wide variety of divider architectures and implementations. We conclude that divider performance is only weakly sensitive to reasonable choices of architecture but significantly improved by aggressive circuit techniques.
Verification of IEEE Compliant Subtractive Division Algorithms
- FORMAL METHODS IN COMPUTER-AIDED DESIGN (FMCAD '96)
, 1996
"... A parameterized definition of subtractive floating point division algorithms is presented and verified using PVS. The general algorithm is proven to satisfy a formal definition of an IEEE standard for floating point arithmetic. The utility of the general specification is illustrated using a numb ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
A parameterized definition of subtractive floating point division algorithms is presented and verified using PVS. The general algorithm is proven to satisfy a formal definition of an IEEE standard for floating point arithmetic. The utility of the general specification is illustrated using a number of different instances of the general algorithm.
Complex division with prescaling of the operands
- Proceedings of ASAP’03: 14th IEEE Conference on ApplicationSpecific Systems, Architectures and Processors
, 2003
"... We adapt the radix-r digit-recurrence division algorithm to complex division. By prescaling the operands, we make the selection of quotient digits simple. This leads to a simple hardware implementation, and allows correct rounding of complex quotient. To reduce large prescaling tables required for r ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
We adapt the radix-r digit-recurrence division algorithm to complex division. By prescaling the operands, we make the selection of quotient digits simple. This leads to a simple hardware implementation, and allows correct rounding of complex quotient. To reduce large prescaling tables required for radices greater than 4, we adapt the bipartite-table method to multiple-operand functions. 1.
Design Issues in Floating-Point Division
, 1994
"... Floating-point division is generally regarded as a low frequency, high latency operation in typical floating-point applications. However, the increasing emphasis on high performance graphics and the industry-wide usage of performance benchmarks, such as SPECmarks, forces processor designers to pay c ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Floating-point division is generally regarded as a low frequency, high latency operation in typical floating-point applications. However, the increasing emphasis on high performance graphics and the industry-wide usage of performance benchmarks, such as SPECmarks, forces processor designers to pay close attention to all aspects of floating-point computation. This paper presents the algorithms often utilized for floating-point division, and it also presents implementation alternatives available for designers. Using a system level study as a basis, it is shown how typical floating-point applications can guide the designer in making implementation decisions and trade-offs.

