### Table 3. Performance of FPGA-based designs for 256-FFT 1 1000-FFT

"... In PAGE 8: ...f the problem size. The required memory is O(N). Our design can solve larger problems (with reduced throughput) with xed hardware. In Figure 3 and in Table3 the execution times of various designs for 256- FFT are shown. The input samples are 16-bit data for all the designs.... In PAGE 9: ...(from[12]) In Table3 the performance of ve FPGA-based implementations are shown. Three of them are from \The Fastest FFT in the West quot; [12].... In PAGE 9: ... Even though it computes a 1000-point FFT, the performance is not attractive compared with our approach. Table3 also shows the area require- ments of these implementations. Our design is faster than the earlier FPGA-based implementations.... In PAGE 9: ... Our design is faster than the earlier FPGA-based implementations. The im- plementations in Table3 are designs optimized for a particular problem size and device features and need to be redesigned for larger problems. Our design can also handle larger problems with the same xed hardware by increasing the memory.... ..."

### Table 3 Comparisons of Hardware Requirements for 256- point FFT with Different Architectures

"... In PAGE 8: ... Table3 shows the comparisons of hardware requirements for supporting 256-point FFT with different FFT architectures. Our proposed architecture needs 1 complex multiplier and 16 complex adders for the 256-point FFT computations in WiMAX systems.... ..."

### Table 1. Synthesis area results for a 2048-point, 10-bit input and 12-bit output FFT processor.

### Table 1. FFT Power Comparison

"... In PAGE 4: ... The power consumption of the fabricated FFT-4 and completed implementation of the FFT-16 has been used to estimate the overall power efficiency of a 1024-point FFT. These results are shown in comparison with other FFT designs in Table1 . In addition to these reductions in power consumption, we achieved a remarkably high sustained throughput as can be observed in Table 2.... ..."

### Table 1: Pruning experiments

1998

"... In PAGE 3: ... Dynamic reordering methods are implemented by synthesis operations, as internal BDD cost functions do not apply. In a first set of experiments (see Table1 ) we show the effect of pruning and heuristic minimization under the vari- able ordering given by the PLAs. The results obtained by re-implementing the algorithm for MO PSDKRO minimiza- tion from [3] are marked MO.... In PAGE 4: ... In general better results are obtained from the time consuming sifting method R2. The results show significant improvements to many pre- viously reported results MO in Table1 . Columns SO show the results from Single Output (SO) minimization, (corre- sponding to OS nodes at top, Section 5), using the PLA variable orderings.... ..."

Cited by 2

### Table 1: Pruning experiments

1998

"... In PAGE 3: ... Dynamic reordering methods are implemented by synthesis operations, as internal BDD cost functions do not apply. In a first set of experiments (see Table1 ) we show the effect of pruning and heuristic minimization under the vari- able ordering given by the PLAs. The results obtained by re-implementing the algorithm for MO PSDKRO minimiza- tion from [3] are marked MO.... In PAGE 4: ... In general better results are obtained from the time consuming sifting method R2. The results show significant improvementsto many pre- viously reported results MO in Table1 . Columns SO show the results from Single Output (SO) minimization, (corre- sponding to OS nodes at top, Section 5), using the PLA variable orderings.... ..."

Cited by 2

### Table 1. Scheduling for a 256-point FFT. Num. Cycles is the size of the schedule re- quired for implementing the transfer to the next stage.

### Table 7: Results of computational benchmark for the mesh-spectral 2D FFT application, running on a single node of the IBM SP2 using Fortran. Grid size is 800 by 800 points. Times are in milliseconds.

1996

"... In PAGE 24: ... The computational benchmark measures values for the fol- lowing times: Toverhead (start and terminate process), Tread t (set up grid and read input data), Tinit (initialize for FFT), Trow t (perform FFTs on rows), Tcol t (perform FFTs on columns), and Twrite t (write output data). Results are given in Table7 . Observe that results for this benchmark are independent of the choice of archetype implementation.... ..."

Cited by 3

### Table 1: Direct convolution timings with a 128 128 workspace imagine more complex environments where the gap in the running times of the FFT-based and the linear algorithm will be reduced or even inverted. We emphasize that we ran our experiments with algorithms implemented in software on a conventional single-processor architecture. A hardware or parallel implementation of the FFT algorithm would certainly lower the `break-even apos; com- plexity where the FFT-based algorithm becomes preferable to the linear algorithm.

"... In PAGE 13: ... The direct algorithm aug- ments the workspace bitmap and pads it with zeros to avoid problems at boundary con gurations when the convolution is computed. Table1 summarizes experimental results for a 128 128 workspace. The FFT-based algorithm computes the bitmap CSPACE in approximately 90 seconds.... In PAGE 13: ... The FFT-based algorithm computes the bitmap CSPACE in approximately 90 seconds. Column 2 of Table1 shows the time required... ..."

### Table 5. Resources usage of 256-point FFT imple- mentations on Virtex II FPGAs

"... In PAGE 6: ... We chose three types of 256-point FFT implementations with 16-bit input on Virtex II for com- parison. The resources usage and performance of these implementations in a Virtex-II FPGA are summarized in Table5 . Consider the fastest pipelined implementation in Ta- ble 5 as an example.... ..."