Results 1  10
of
58
An Analysis Of Division Algorithms And Implementations
 IEEE Transactions on Computers
, 1995
"... Floatingpoint division is generally regarded as a low frequency, high latency operation in typical floatingpoint applications. However, the increasing emphasis on high performance graphics and the industrywide usage of performance benchmarks forces processor designers to pay close attention to al ..."
Abstract

Cited by 53 (8 self)
 Add to MetaCart
Floatingpoint division is generally regarded as a low frequency, high latency operation in typical floatingpoint applications. However, the increasing emphasis on high performance graphics and the industrywide usage of performance benchmarks forces processor designers to pay close attention to all aspects of floatingpoint computation. Many algorithms are suitable for implementing division in hardware. This paper presents four major classes of algorithms in a unified framework, namely digit recurrence, functional iteration, very high radix, and variable latency. Digit recurrence algorithms, the most common of which is SRT, use subtraction as the fundamental operator, and they converge to a quotient linearly. Division by functional iteration converges to a quotient quadratically using multiplication. Very high radix division algorithms are similar to digit recurrence algorithms, but they incorporate multiplication to reduce the latency. Variable latency division algorithms reduce the...
A Mechanically Checked Proof of the Correctness of the Kernel of the AMD5K86 FloatingPoint Division Algorithm
 IEEE Transactions on Computers
, 1996
"... We describe a mechanically checked proof of the correctness of the kernel of the floating point division algorithm used on the AMD5K 86 microprocessor. The kernel is a nonrestoring division algorithm that computes the floating point quotient of two double extended precision floating point numbers, ..."
Abstract

Cited by 30 (11 self)
 Add to MetaCart
We describe a mechanically checked proof of the correctness of the kernel of the floating point division algorithm used on the AMD5K 86 microprocessor. The kernel is a nonrestoring division algorithm that computes the floating point quotient of two double extended precision floating point numbers, p and d (d 6= 0), with respect to a rounding mode, mode. The algorithm is defined in terms of floating point addition and multiplication. First, two NewtonRaphson iterations are used to compute a floating point approximation of the reciprocal of d. The result is used to compute four floating point quotient digits in the 24,,17 format (24 bits of precision and 17 bit exponents) which are then summed using appropriate rounding modes. We prove that if p and d are 64,,15 (possibly denormal) floating point numbers, d 6= 0 and mode specifies one of six rounding procedures and a desired precision 0 ! n 64, then the output of the algorithm is p=d rounded according to mode. We prove that every int...
Design Issues in Division and Other FloatingPoint Operations
 IEEE Transactions on Computers
, 1997
"... Floatingpoint division is generally regarded as a low frequency, high latency operation in typical floatingpoint applications. However, in the worst case, a high latency hardware floatingpoint divider can contribute an additional 0.50 CPI to a system executing SPECfp92 applications. This paper ..."
Abstract

Cited by 23 (7 self)
 Add to MetaCart
Floatingpoint division is generally regarded as a low frequency, high latency operation in typical floatingpoint applications. However, in the worst case, a high latency hardware floatingpoint divider can contribute an additional 0.50 CPI to a system executing SPECfp92 applications. This paper presents the system performance impact of floatingpoint division latency for varying instruction issue rates. It also examines the performance implications of shared multiplication hardware, shared square root, onthefly rounding and conversion, and fused functional units. Using a system level study as a basis, it is shown how typical floatingpoint applications can guide the designer in making implementation decisions and tradeoffs.
A tool for unbiased comparison between logarithmic and floatingpoint arithmetic
 LIP, École Normale Supérieure de
, 2004
"... arithmetic ..."
Design Issues In High Performance Floating Point Arithmetic Units
, 1996
"... In recent years computer applications have increased in their computational complexity. The industrywide usage of performance benchmarks, such as SPECmarks, forces processor designers to pay particular attention to implementation of the floating point unit, or FPU. Special purpose applications, suc ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
In recent years computer applications have increased in their computational complexity. The industrywide usage of performance benchmarks, such as SPECmarks, forces processor designers to pay particular attention to implementation of the floating point unit, or FPU. Special purpose applications, such as high performance graphics rendering systems, have placed further demands on processors. High speed floating point hardware is a requirement to meet these increasing demands. This work examines the stateoftheart in FPU design and proposes techniques for improving the performance and the performance/area ratio of future FPUs. In recent FPUs, emphasis has been placed on designing everfaster adders and multipliers, with division receiving less attention. The design space of FP dividers is large, comprising five different classes of division algorithms: digit recurrence, functional iteration, very high radix, table lookup, and variable latency. While division is an infrequent operation...
Reciprocation, Square Root, Inverse Square Root, and Some Elementary Functions Using Small Multipliers
, 1997
"... This paper deals with the computation of reciprocals, square roots, inverse square roots, and some elementary functions using small tables, small multipliers, and for some functions, a final "large" (almost fulllength) multiplication. We propose a method that allows fast evaluation of these functio ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
This paper deals with the computation of reciprocals, square roots, inverse square roots, and some elementary functions using small tables, small multipliers, and for some functions, a final "large" (almost fulllength) multiplication. We propose a method that allows fast evaluation of these functions in double precision arithmetic. The strength of this method is that the same scheme allows the computation of all these functions. Our method is mainly interesting for designing special purpose circuits, since it does not allow a simple implementation of the four rounding modes required by the IEEE754 standard for floatingpoint arithmetic.
SRT Division Architectures and Implementations
 IN PROC. 13TH IEEE SYMP. COMPUTER ARITHMETIC
, 1997
"... SRT dividers are common in modern floating point units. Higher division performance is achieved by retiring more quotient bits in each cycle. Previous research has shown that realistic stages are limited to radix2 and radix4. Higher radix dividers are therefore formed by a combination of lowradix ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
SRT dividers are common in modern floating point units. Higher division performance is achieved by retiring more quotient bits in each cycle. Previous research has shown that realistic stages are limited to radix2 and radix4. Higher radix dividers are therefore formed by a combination of lowradix stages. In this paper, we present an analysis of the effects of radix2 and radix4 SRT divider architectures and circuit families on divider area and performance. We show the performance and area results for a wide variety of divider architectures and implementations. We conclude that divider performance is only weakly sensitive to reasonable choices of architecture but significantly improved by aggressive circuit techniques.
Verification of IEEE Compliant Subtractive Division Algorithms
 FORMAL METHODS IN COMPUTERAIDED DESIGN (FMCAD '96)
, 1996
"... A parameterized definition of subtractive floating point division algorithms is presented and verified using PVS. The general algorithm is proven to satisfy a formal definition of an IEEE standard for floating point arithmetic. The utility of the general specification is illustrated using a numb ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
A parameterized definition of subtractive floating point division algorithms is presented and verified using PVS. The general algorithm is proven to satisfy a formal definition of an IEEE standard for floating point arithmetic. The utility of the general specification is illustrated using a number of different instances of the general algorithm.
High Performance Rotation Architectures Based On Radix4 Cordic Algorithm
, 1997
"... Traditionally, CORDIC algorithms have employed radix2 in the first n/2 microrotations (n is the precision in bits) in order to preserve a constant scale factor. In this work we will present a full radix4 CORDIC algorithm in rotation mode and circular coordinates and its corresponding selection fun ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
Traditionally, CORDIC algorithms have employed radix2 in the first n/2 microrotations (n is the precision in bits) in order to preserve a constant scale factor. In this work we will present a full radix4 CORDIC algorithm in rotation mode and circular coordinates and its corresponding selection function, and we will propose an efficient technique for the compensation of the non constant scale factor. Three radix4 CORDIC architectures are implemented: a) a word serial architecture based on the zero skipping technique; b) a pipelined architecture; and c) an application specific architecture (the angles are known beforehand). The first two are general purpose implementations in redundant arithmetic (carrysave), whereas the last one is a simplification of the first two. The proposed architectures are time and/or area efficient when compared with already existing CORDIC architectures. 1. Introduction The CORDIC (COordinate Rotation DIgital Computer) algorithm was introduced by Volder [...
Design Issues in FloatingPoint Division
, 1994
"... Floatingpoint division is generally regarded as a low frequency, high latency operation in typical floatingpoint applications. However, the increasing emphasis on high performance graphics and the industrywide usage of performance benchmarks, such as SPECmarks, forces processor designers to pay c ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Floatingpoint division is generally regarded as a low frequency, high latency operation in typical floatingpoint applications. However, the increasing emphasis on high performance graphics and the industrywide usage of performance benchmarks, such as SPECmarks, forces processor designers to pay close attention to all aspects of floatingpoint computation. This paper presents the algorithms often utilized for floatingpoint division, and it also presents implementation alternatives available for designers. Using a system level study as a basis, it is shown how typical floatingpoint applications can guide the designer in making implementation decisions and tradeoffs.