### Table 16: Relative cost of different divide/square root implementations

"... In PAGE 39: ... Area and Performance of Specific Methods It is enlightening to consider the performance improvement of the individual square/root divide implementations, taking into account the area investment. Table16 reproduces the relative area estimates from Section 6 for easy reference. Table 17 shows the cumulative improvement of the benchmark execution time, across all configurations.... ..."

### Table 8: Relative cost of different divide/square root implementations

"... In PAGE 31: ... Table8 shows estimates of the relative areas of the four canonical implementations based on standard cell technology. Only the areas of circuitry devoted exclusively to divide/square root functionality are covered.... ..."

### Table 6: Divide/square root performance of multiply- accumulate implementations

in An Area/Performance Comparison of Subtractive and Multiplicative Divide/Square Root Implementations

1995

"... In PAGE 7: ...Table 6: Divide/square root performance of multiply- accumulate implementations The IBM RS/6000 series uses unique algorithms for the Newton-Raphson iterations to accommodate the structure of the multiply-accumulate unit. Division and square root latencies for the 8-bit seed Newton-Raphson implementa- tion in Table6 are identical to the actual processor. The 16-bit seed Newton-Raphson figures are obtained from es- timates based on available information about the division and square root algorithms [9].... ..."

### Table 10: Divide/square root performance of chained implementations

"... In PAGE 35: ... The particular example in this study is inspired by the Mips R4400 microprocessor [MWV92, Sim93]. carry sum multiply add/ round register file Operation Latency Throughput add 4 3 multiply 8 4 Figure 18: Chained add-multiply configuration The latencies of division and square root for the different implementation alternatives are given in Table10 . The third implementation, which is in boldface, is closest to the actual configuration of the Mips R4400.... ..."

### Table 7: Relative cost of different divide/square root implementations

"... In PAGE 9: ... An alternative approach uses standard-cell technology to estimate the areas of different implementations, includ- ing the more sophisticated options [14]. Table7 shows the results of this study, with the values normalized to the size of the 8-bit seed Goldschmidt variant. According to these results, a radix-16 SRT unit need only be 45% larger than a radix-4 design in the same technology.... ..."

### Table 1: Area comparison of two divide/square root implementations

in An Area/Performance Comparison of Subtractive and Multiplicative Divide/Square Root Implementations

1995

"... In PAGE 5: ... Nevertheless, we shall attempt to give some basis for comparing the different implementations. Table1 compares the size of the hard- ware required for division in the Weitek 3364 and Texas Instruments 8847 arithmetic coprocessors (not including shared components); the figures are based on measure- ments of microphotographs [8]. The chips have similar die sizes and device densities, and although the multiplication algorithms are different, both have two-pass arrays which take up approximately 22% of the chip area.... ..."

### Table 9: Area comparison of two divide/square root implementations

### Table 12: Divide/square root performance of independent implementations

"... In PAGE 37: ...Table12 . For radix-4 division, there is a simple but powerful optimization in effect.... ..."