### Table 1: Optimized PCR reaction based on the datapath of Figure 1 and the dataflow graph of Figure 5.

2001

"... In PAGE 12: ...Table 1: Optimized PCR reaction based on the datapath of Figure 1 and the dataflow graph of Figure 5. The optimized PCR program of Table1 was easy to derive since there is only one mixer in the system.... In PAGE 12: ... The PCR program contains a total of 15 INPUT and MIX operations. From Table1 , we note that an upper bound on the processing time is 15 minutes. Each time slot is of length 0.... In PAGE 14: ... The optimum processing time is 9.6 minutes, 50% faster than the PCR program of Table1 . The optimized schedule is given below: Time (minutes) Operations Representation Path path1, C-E-J-H-G-D Path path2, C-E-J-H-F-D apos; Path path3, C-E-I-A Path path4, C-E-A apos; Path path5, D apos;-F-H-K Path path6, A apos;-F-H-K Path path7, A-G-F-D apos; Definition 0 Load partition map INPUT Tris-HCl I1 0.... ..."

Cited by 14

### Table 1. Execution Behavior Of Scheduled Dataflow

"... In PAGE 12: ...Table1... ..."

### Table 2: Technology Mapping results

"... In PAGE 8: ... The results show that the Boolean approach reduces the number of matching algorithm calls, nd smaller area circuits in better CPU time, and reduces the initial network graph because generic 2-input base function are used. Table2 presents a comparison between SIS and Land for the library 44-2.genlib, which is distributed with the SIS package.... ..."

### Table 10 Results for FDM2, graph partitioning.

"... In PAGE 17: ... Be- cause of the similarity with Nested Dissection, we expect the performance to be satisfactory. In Table10 we show the results obtained for matrix FDM2 with the standard PMETIS executable code with default partitioning parameters. Here p denotes the number of sub- domains, or graph partitions.... ..."

### Table 3: Discrepancies between hardware and dataflow-graph behaviors

2002

"... In PAGE 6: ... This differs from hardware, because the output is always updated when the state changes. Figure 6 and Table3 illustrate this problem for the case of an enabled unit-delay (register) block and a toggling enable signal. Note that the output in the dataflow-graph matches the output in the hardware only when the enable signal is high.... ..."

Cited by 3

### Table 3. Optimal distribution on a cluster of eight two way SMPs using graph partitioning

"... In PAGE 4: ...44 To understand the effectiveness of the assignment, it is now attempted to obtain an optimal task assignment on the same cluster using graph partitioning using a standard graph partitioning tool like Metis. Table3 shows the details of the distribution. For this distribution, each module is assumed to be represented by a vertex with a ... ..."

Cited by 1

### Table 2 Execution time for computing weighted graphs

"... In PAGE 7: ... The computation of the graph takes O(jV j log jV j) time [20]. Table2 shows the execution time for computing our extended SIG on surfel-based models with di erent sizes. The algorithm implemented with non-optimized C++ code has been executed on an Intel Pentium 4 2.... ..."

### TABLE 2. Steps for computing optimal partitions with Rmax D 4

1997

Cited by 2

### Table 6: Graph partitions of random graphs generated by cutting the hypercube and grid em-

"... In PAGE 16: ... We #0Cnd that the bisection widths for hypercube embeddings are about the same for all hyperplanes whereas for grid embeddings, the two partitions dividing the grid in half vertically and horizontally give the best partitions. Table6 shows how the Mob hypercube and grid embedding algorithms perform as graph- partitioning algorithms. The data for random graphs on the performance of the Mob graph- partitioning algorithm and the KL graph-partitioning algorithm is taken from our study of local search graph-partitioning heuristics in #5B19,21#5D.... In PAGE 17: ...Table6 by the percentage of all edges that cross the cut between A and B.We found that 16-to-1 grid and hypercube embeddings with our Mob-based heuristics produced bisection widths comparable to those for the Mob heuristic for graph-partitioning.... In PAGE 17: ... The performance of the Mob embedding algorithms interpreted as graph-partitioning algorithms is remarkable, considering that Mob is optimizing the #5Cwrong quot; cost function. While the data in Table6 cannot show conclusively how good the Mob embedding algorithms are, the existence of a better graph-embedding algorithm would also imply the existence of a better graph-partitioning algorithm. 3.... ..."

### Table 5: Synthesis based approach vs. Heuristic 8 Conclusion This paper presented a fast and e cient heuristic to optimize the throughput of a task graph that is partitioned across multiple fpgas. The technique presented in this paper is also applicable in multi-way asic partitioning where there are xed number of asic chips available with a xed area bounds. The tasks are behavioral/algorithmic code segments and can be implemented on the fpgas in several di erent ways. The heuristic e ciently uses the partition information and intelligently explores the design space of each task to select a suitable candidate. Implementations for tasks are chosen such that the throughput of the task graph is maximized, while at the same time honoring all area constraints posed by the fpgas. We also presented an area estimation heuristic that computes the minimum area required for each partition segment. Experimental results illustrate that the heuristic is fast, can handle large task graphs with several design options, importantly the heuristic is e ective and produces only 2-10% o -optimal throughput for various examples. The throughput optimization heuristic we presented is part of a behavioral partitioning framework for multi-fpga architectures.

"... In PAGE 13: ... As designs get larger the ga run times become expensive. Table5 shows the results of comparison of the synthesis based approach for throughput optimization (Approach I, presented in Section 4) verses our heuristic. The table reports, for each example, the average throughput produced by both approaches for 1000 di erent executions.... ..."