## Division/Square-Root Using Comparison Multiples

### BibTeX

@MISC{Nikmehr_division/square-rootusing,

author = {Hooman Nikmehr},

title = {Division/Square-Root Using Comparison Multiples},

year = {}

}

### OpenURL

### Abstract

A new implementation for minimally redundant radix-4 floating-point SRT division/square-root (division/sqrt) with the recurrence in the signed-digit format is introduced. The implementation is developed based on the comparison multiples idea. In the proposed approach, the magnitude of the quotient (root) digit is calculated by comparing the truncated partial remainder with 2 limited precision multiples of the divisor (partial root). The digit sign is determined by investigating the polarity of the truncated partial remainder. A timing evaluation using the logical synthesis (Synopsys DC with Artisan 0.18 µm typical library) shows a latency of 2.5 ns for the recurrence of the proposed division/sqrt. This is less than of the conventional implementation.

### Citations

239 |
Computer Arithmetic Algorithms and Hardware Designs
- Parhami
- 2000
(Show Context)
Citation Context ...d. In the recurrence (1), the dividend x and the divisor d are two normalised binary numbers in the range [0.5, 1). Also, q j+1 represents the quotient digit in the signed-digit (SD) redundant format =-=[12]-=- selected from the minimally redundant radix-4 digit set � � 2, 1, 0, 1, 2 , where m = −m. (2) In (1), the next partial remainder (PR) w[ j + 1] is represented in carrysave (CS) redundant format [12].... |

175 |
Logical Effort: Designing Fast CMOS Circuits
- Sutherland, Sproull, et al.
- 1999
(Show Context)
Citation Context ...cient functioning modules with higher concurrency among them, a quicker circuit for radix-4 FP SRT division/sqrt is obtained. The time delay estimations carried out using the method of logical effort =-=[5]-=- and the logic synthesis show considerable decrease in the execution time with respect to conventional implementations. 2 BACKGROUND Some surveys [6, 2] shows that most VLSI implementations of FP divi... |

108 |
Lang: Division and Square Root: Digit Recurrence Algorithms and Implementations
- Ercegovac, T
- 1994
(Show Context)
Citation Context ...t digit selection (QDS) function [1], which is traditionally implemented using the lookup table method. In this method, the QDS function is realised in a table totally implemented with a PLA or a ROM =-=[10]-=-. Since the QDS function is the main part of the critical path, any improvement in this circuit effectively decreases the delay of FP SRT division. Ercegovac and Lang [10] cover almost all issues in d... |

65 | Division Algorithms and Implementations
- Oberman, Flynn
- 1997
(Show Context)
Citation Context ...omputations, almost all recent microprocessors and digital signal processors, perform in hardware all four fundamental arithmetic operations, namely addition, subtraction, multiplication and division =-=[1]-=-. Studying the processors’ architectures and implementations reveals that of the four operations, division is not performed as fast as addition, subtraction and multiplication [2]. In 1994, Intel lost... |

51 |
A New Class of Digital Division Methods
- Robertson
- 1958
(Show Context)
Citation Context ...ws that most VLSI implementations of FP division are based on digit recurrence division algorithms known as SRT (SRT division algorithm was introduced independently by D. Sweeney [7], J. E. Robertson =-=[8]-=- and T. D. Tocher [9]). SRT division is an iterative algorithm with linear convergence toward the quotient. In this algorithm, the quotient digit selection (QDS) function calculates a fix Braden Phill... |

31 |
Techniques of multiplication and division for automatic binary computers
- Tocher
- 1958
(Show Context)
Citation Context ...lementations of FP division are based on digit recurrence division algorithms known as SRT (SRT division algorithm was introduced independently by D. Sweeney [7], J. E. Robertson [8] and T. D. Tocher =-=[9]-=-). SRT division is an iterative algorithm with linear convergence toward the quotient. In this algorithm, the quotient digit selection (QDS) function calculates a fix Braden Phillips and Cheng-Chew Li... |

24 | Bit-level analysis of an SRT divider circuit
- Bryant
(Show Context)
Citation Context ...ion is not performed as fast as addition, subtraction and multiplication [2]. In 1994, Intel lost $475 Million due to an error in the division part of the Pentium microprocessor’s floating-point unit =-=[3, 4]-=-. This fiasco highlights that the algorithms, architectures and realisations proposed for division are still immature, requiring more investigation and attention especially when designing modern high ... |

21 |
A Fast VLSI Adder Architecture
- Srinivas, Parhi
- 1992
(Show Context)
Citation Context ...mber to 2’s complement format and check the most significant bit (sign bit). The identity between BSD to binary (2’s complement) conversion and the binary addition processes is now clearly understood =-=[22,23,24,25]-=-. Consider binary subtraction Z = X − Y, where X and Y are two n-bit 2’s complement numbers. Using the BSD representation definition, the composite number (X − Y) can be interpreted as a BSD number, w... |

17 |
A New Carry-Free Division Algorithm and its Application to a Single Chip 1024-b RSA Processor
- Vandemeulebroecke, Vanzieledhem
- 1990
(Show Context)
Citation Context ...mber to 2’s complement format and check the most significant bit (sign bit). The identity between BSD to binary (2’s complement) conversion and the binary addition processes is now clearly understood =-=[22,23,24,25]-=-. Consider binary subtraction Z = X − Y, where X and Y are two n-bit 2’s complement numbers. Using the BSD representation definition, the composite number (X − Y) can be interpreted as a BSD number, w... |

15 |
Choices of Operand Truncation in the SRT Division Algorithm
- Burgess, Williams
- 1995
(Show Context)
Citation Context ...a CS adder. The critical path passes through the black components. After the well-published Pentium FDIV bug in 1994 [3], a considerable effort has been put to analysing the QDS function lookup table =-=[13]-=-, studying its implementation [14] and verification [3, 15]. In addition, developing alternative approaches to implementing FP SRT division has been another agenda [10]. Recently, to implement the QDS... |

14 |
Area and Performance Tradeoffs in FloatingPoint Divide and Square-Root Implementations
- Soderquist, Leeser
- 1996
(Show Context)
Citation Context ...ication and division [1]. Studying the processors’ architectures and implementations reveals that of the four operations, division is not performed as fast as addition, subtraction and multiplication =-=[2]-=-. In 1994, Intel lost $475 Million due to an error in the division part of the Pentium microprocessor’s floating-point unit [3, 4]. This fiasco highlights that the algorithms, architectures and realis... |

14 | Fast Low-Energy VLSI Binary Addition
- Parhi
- 1997
(Show Context)
Citation Context ...mber to 2’s complement format and check the most significant bit (sign bit). The identity between BSD to binary (2’s complement) conversion and the binary addition processes is now clearly understood =-=[22,23,24,25]-=-. Consider binary subtraction Z = X − Y, where X and Y are two n-bit 2’s complement numbers. Using the BSD representation definition, the composite number (X − Y) can be interpreted as a BSD number, w... |

11 | M.K.: Modular verification of SRT division
- Rueß, Shankar, et al.
- 1999
(Show Context)
Citation Context ...omponents. After the well-published Pentium FDIV bug in 1994 [3], a considerable effort has been put to analysing the QDS function lookup table [13], studying its implementation [14] and verification =-=[3, 15]-=-. In addition, developing alternative approaches to implementing FP SRT division has been another agenda [10]. Recently, to implement the QDS function, a method using selection constants is widely emp... |

9 | The Design and Implementation of a HighPerformance Floating-Point Divider
- Oberman, Quach, et al.
- 1994
(Show Context)
Citation Context ...ons carried out using the method of logical effort [5] and the logic synthesis show considerable decrease in the execution time with respect to conventional implementations. 2 BACKGROUND Some surveys =-=[6, 2]-=- shows that most VLSI implementations of FP division are based on digit recurrence division algorithms known as SRT (SRT division algorithm was introduced independently by D. Sweeney [7], J. E. Robert... |

6 |
A tale of two numbers
- Moler
- 1995
(Show Context)
Citation Context ...ion is not performed as fast as addition, subtraction and multiplication [2]. In 1994, Intel lost $475 Million due to an error in the division part of the Pentium microprocessor’s floating-point unit =-=[3, 4]-=-. This fiasco highlights that the algorithms, architectures and realisations proposed for division are still immature, requiring more investigation and attention especially when designing modern high ... |

6 |
754-1985 IEEE standard for binary floating-point arithmetic
- IEEE
- 1985
(Show Context)
Citation Context ...[ j]} 4 can be calculated by a 6-bit binary adder. 6 COMBINED DIVISION AND SQRT The IEEE 754 standard requires the designers to implement both division and sqrt in the FP units of the microprocessors =-=[11]-=-. Given the 8 B[j] w0 w1 w2 MSB 0 2 3 1 MSB MUX w[j] QDS* or RDS* On-the-Fly Conversion S 0 S 1 S 2 S 4w[j] MUX w 0 w 1 w 2 d or 1 -d or F1 -2d or F2 F 2d or 2 F PR Sign Det MUX MUX Reg. Mag(q j+1) or... |

5 |
High Speed Arithmetic in a Parallel Device
- Cocke, Sweeney
- 1957
(Show Context)
Citation Context ...me surveys [6, 2] shows that most VLSI implementations of FP division are based on digit recurrence division algorithms known as SRT (SRT division algorithm was introduced independently by D. Sweeney =-=[7]-=-, J. E. Robertson [8] and T. D. Tocher [9]). SRT division is an iterative algorithm with linear convergence toward the quotient. In this algorithm, the quotient digit selection (QDS) function calculat... |

5 |
A CMOS Floating-Point Multiplier
- Uya, Kaneko, et al.
- 1984
(Show Context)
Citation Context |

4 |
Fast radix-4 retimed division with selection by comparisons
- Antelo, Lang, et al.
- 2002
(Show Context)
Citation Context ...comparisons, we develop the following alternative, which has more flexibility and results in a simpler implementation. A similar discussion on the comparison multiples method is given by Antelo et al =-=[16]-=- as: An alternative implementation is based on comparisons of the residual estimate with truncated multiples of the divisor, however, this implementation is rarely used in practice because it requires... |

3 |
A new algorithm for division in hardware
- Kantabutra
- 1996
(Show Context)
Citation Context ...ilation of the truncated redundant residual and comparison, so no advantage is obtained with respect to the implementation with selection constants. Despite these quotes, there are reports of radix-2 =-=[18]-=- and radix-8 [19] SRT division, based on the comparison multiples idea, that show relatively improved response time. Moreover, Jensen [20] reports that although a highly optimised divider implemented ... |

3 |
Radix-2 SRT division algorithm with simple quotient digit selection
- Burgess
- 1991
(Show Context)
Citation Context ...he set {M1} 5 and {M2} 5, or the set ¬ {M1} 5 and ¬ {M2} 5, to the adders. The comparison results are two BSD numbers with up to 8 digits. However, investigation shows that no representation overflow =-=[21]-=- happens when calculating (12). Therefore, the results could be represented in 7 digits instead. This makes the size ofsc n X Y n n Carry Generator Sign(X-Y) Figure 5: An architecture for n-digit BSD ... |

3 |
Architectures for floating-point division
- Nikmehr
- 2005
(Show Context)
Citation Context ...dix-4 FP SRT division. QDS ∗ refers to the QDS function without the PR sign detector. 3.8 Buffers Sice the comparators, the comparison sign detectors and the coder are on the FP divider critical path =-=[28]-=-, one way to decrease the recurrence critical path delay might be to minimise the fan out of the circuit supplying the QDS function’s comparators with {4w[ j]} 5 and {Mk} 5. So, as shown in Figure 4, ... |

3 |
Module to Perform Multiplication, Division, and Square Root in Systolic Arrays for Matrix Computations
- Ercegovac, Lang
- 1991
(Show Context)
Citation Context ...teration. number of similarities between the recurrences of these two, it is very normal to implement a combined circuit that perform both operations. Examples of such implementations can be found in =-=[29, 30]-=-. Comparing the specification of FP SRT division and sqrt reveals that to match the recurrences, it is sufficient to modify the sqrt PR such that old w[ j] new w[ j] = . (36) 2 Having called new w[ j]... |

2 |
Design issues in radix-4 SRT square root and divide unit
- Burgess, Hinds
- 2001
(Show Context)
Citation Context ... the signs are manipulated by a coder to obtain the correct value for q j+1. One recent implementation for the QDS function using comparators and selection constants is disclosed by Burgess and Hinds =-=[17]-=-. The QDS function is part of the divide/square root unit used in a vector processing chip called ARM VFPJ. As shown in Figure 3, to prevent carry propagation while comparing, w[ j] is represented in ... |

2 | Alternative Implementations of SRT Division and Square Root Algorithms
- Jensen
- 1998
(Show Context)
Citation Context ...constants. Despite these quotes, there are reports of radix-2 [18] and radix-8 [19] SRT division, based on the comparison multiples idea, that show relatively improved response time. Moreover, Jensen =-=[20]-=- reports that although a highly optimised divider implemented using the conventional approach is just slightly faster than its un-optimised counterpart implemented using the comparison multiples idea,... |

2 |
High Performance VLSI Architecture for Division and Square Root
- McQuillan, McCanny, et al.
- 1991
(Show Context)
Citation Context ...teration. number of similarities between the recurrences of these two, it is very normal to implement a combined circuit that perform both operations. Examples of such implementations can be found in =-=[29, 30]-=-. Comparing the specification of FP SRT division and sqrt reveals that to match the recurrences, it is sufficient to modify the sqrt PR such that old w[ j] new w[ j] = . (36) 2 Having called new w[ j]... |

1 |
Measuring the Complexity of
- Oberman, Flynn
- 1995
(Show Context)
Citation Context ...es through the black components. After the well-published Pentium FDIV bug in 1994 [3], a considerable effort has been put to analysing the QDS function lookup table [13], studying its implementation =-=[14]-=- and verification [3, 15]. In addition, developing alternative approaches to implementing FP SRT division has been another agenda [10]. Recently, to implement the QDS function, a method using selectio... |

1 |
New Theory for High-Radix Division
- “A
- 1997
(Show Context)
Citation Context ...uncated redundant residual and comparison, so no advantage is obtained with respect to the implementation with selection constants. Despite these quotes, there are reports of radix-2 [18] and radix-8 =-=[19]-=- SRT division, based on the comparison multiples idea, that show relatively improved response time. Moreover, Jensen [20] reports that although a highly optimised divider implemented using the convent... |

1 |
Computer Arithmetic: Principles
- Hwang
- 1979
(Show Context)
Citation Context ...ore efficient implementation for the sign detectors. Figure 5 represents an architecture for such sign detectors. The architecture is derived from the fundamental definition of the binary subtraction =-=[26]-=-. According to this definition, when performing n-bit subtraction Z = X − Y, the n-th carry sent out from the subtractor can be recognised as the inverse of the sign of Z. This sign is equal to the po... |