### Table 1 shows the number of parallel multiply/accumulate operations that fit into the FPGA on Riley for various operand precisions. Both fixed and arbitrary multipliers are shown.

1997

"... In PAGE 6: ... Table1 - No of taps in FPGA for filters To realise the extra performance obtainable by parallelism in the FPGA, the system architecture must support dataflows that ensure the multipliers/accumulators are kept supplied with data. The location of data in the system is key to partitioning.... ..."

Cited by 1

### Table 6: Divide/square root performance of multiply- accumulate implementations

in An Area/Performance Comparison of Subtractive and Multiplicative Divide/Square Root Implementations

1995

"... In PAGE 7: ...Table 6: Divide/square root performance of multiply- accumulate implementations The IBM RS/6000 series uses unique algorithms for the Newton-Raphson iterations to accommodate the structure of the multiply-accumulate unit. Division and square root latencies for the 8-bit seed Newton-Raphson implementa- tion in Table6 are identical to the actual processor. The 16-bit seed Newton-Raphson figures are obtained from es- timates based on available information about the division and square root algorithms [9].... ..."

### Table 14: Divide/square root performance of multiply-accumulate implementations

"... In PAGE 37: ...2), due to the structure of the multiply-accumulate unit and the method used to resolve last-digit accuracy. Divide and square root latencies for the 8-bit seed Newton-Raphson implementation in Table14 are identical to the actual processor. The 16-bit seed Newton-Raphson figures are obtained from estimates based on available information about the division... ..."

### Table 5: Improvement in execution time [%] on the Givens rotation benchmark, by implementation, for the multiply-accumulate configuration

"... In PAGE 8: ... Not only does this make for a more uniform comparison, but seems like a feasible implementation since the RS/6000 actually has a longer cycle time than the PA7200 while using a compara- ble fabrication technology. Table 4: Divide/square root performance of multiply- accumulate implementations Latency Algorithm Divide Square Root 8-bit seed Newton-Raphson 19 22 16-bit seed Newton-Raphson 14 17 radix-4 SRT 15 15 radix-16 SRT 8 8 The results, shown in Table5 , display the same pat- tern as for the independent add-multiply case, albeit with even greater contrast. The longer latencies of the Newton- Raphson operations mean that the subtractive implemen- tations perform even better by comparison.... ..."

### TABLE II Computational complexity of various MSSNR implementations. MACs are real multiply-and-accumulates and adds are real additions (or subtractions). brute force Wu, et al. [8]

### Table 2: Complexity of a fixed multiplier with digit-size four and coefficient 1839 (excluding CPA stage). No pre-accumulation is used.

"... In PAGE 3: ... Again, note that the registers includes the pipeline registers.A fixed coefficient multiplier with coefficient 1839 and digit-size four was also designed and the metrics of the different implementa- tions are shown in Table2 . No pre-accumulation could be used as the number of partial products for any significance level was small- er than three.... ..."

### Table 4: Divide/square root performance of multiply- accumulate implementations

### TABLE II PRIMARY FEATURES OF A FOUR-BIT MULTIPLIER ACCUMULATOR

1997

Cited by 1

### Table 7 Implementation results for the different FIR filters

"... In PAGE 54: ... With this setup, the speed at which the filters break is not limited by the multipliers. Table7 reports the maximum speeds at which each of the filter implementations was able to run. The implementations using the constant multipliers have a better cloc k speed ... ..."

### Table 1. Synthesis results comparing average power consumed in a multiplier in folding 65-tap, 129-tap bandpass FIR filters on to an architecture with a given number of multipliers and adders. Several scenarios are compared which include folding begining from a direct-form filter DFG and folding from a transpose filter DFG with varying unfolding factors to uncover common data-operands. Coefficient reordering is also resorted to in all the cases in an attempt to further reduce power consumption in the multipliers.

2000

"... In PAGE 4: ... The multiplier used in our simulations was a 16 16 Booth-recoded Wallace-Tree multiplier. The results of our simulation are tabulated in Table1 . Table 1 presents a compar- ison of average power consumed in a multiplier for folding from a transpose FIR filter DFG with various unfolding factors for a given number of hardware multipliers and adders.... ..."

Cited by 2