### Table 1. Scheduling results after applying the various resource utilization strategies for pred2 and pred0 1 from the MPEG-1 Prediction block and tiler from the GIMP image processing tool

in Dynamic Conditional Branch Balancing during the High-Level Synthesis Of Control-Intensive Designs

2003

"... In PAGE 5: ...han 5 user seconds on a 1.6 Ghz PC running Linux. For all the experiments we have used a priority- based list scheduler [16]. The scheduling results for these three functions are presented in Table1 , in terms of the number of states in the finite state machine con- troller and the cycles on the longest path through the design. The longest path through a conditional is the longer of the two branches and for loops, its the length of the loop body multiplied by loop iterations.... In PAGE 5: ... All resources are single cycle ex- cept the multiplier (2 cycles) and the divider (4 cycles). The first row in Table1 lists the results for the base- line case, i.e.... ..."

Cited by 4

### Table 1. Scheduling results after applying the various resource utilization strategies for pred2 and pred0 1 from the MPEG-1 Prediction block and tiler from the GIMP image processing tool

in Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-Intensive Designs

2003

"... In PAGE 5: ...han 5 user seconds on a 1.6 Ghz PC running Linux. For all the experiments we have used a priority- based list scheduler [16]. The scheduling results for these three functions are presented in Table1 , in terms of the number of states in the nite state machine con- troller and the cycles on the longest path through the design. The longest path through a conditional is the longer of the two branches and for loops, its the length of the loop body multiplied by loop iterations.... In PAGE 5: ... All resources are single cycle ex- cept the multiplier (2 cycles) and the divider (4 cycles). The rst row in Table1 lists the results for the base- line case, i.e.... ..."

Cited by 4

### Table 1. Summary of Tasks Parallelized Method Parallel Serial

1999

"... In PAGE 4: ... In order to select the number of processors one would reasonably want to engage, mprocs , and to estimate the potential increase in efficiency, it is important to know which tasks are parallelized in a given ITOUGH2-PVM application. Table1 lists the various analyses performed by ITOUGH2 and the minimization algo- rithms available, and indicates which forward simula- tions can be run in parallel, and which ones must be executed sequentially. Table 1.... In PAGE 5: ... For example, if mprocs=n=8 and 7 processors of equiva- lent speed and work load are available, it is reasonable to select only nprocs=4 to avoid 6 processors being idle for 50% of the time during the calculation of the Jacobian. Similar restrictions apply to most algo- rithms listed in Table1 (Finsterle, 1998), with the notable exception of grid search and Monte Carlo simulations, in which no waiting times exist, i.e.... In PAGE 6: ...e., mprocs ( Table1 ), the general characteristics of the minimization algorithm, and the number of hosts available, are known at the time of the run. Others, especially the workload on the computers in the cluster, are difficult to predict.... In PAGE 7: ... (1997) and Finsterle (1999c). The parameter estimation problem is solved using the Levenberg-Marquardt algorithm, limiting the number of child processes to mprocs=n=6 (see Table1 ). Five iterations are performed for this benchmark example, requiring a total of 36 forward simulations, namely an initial run, 5 evaluations of the Jacobian matrix at 6 runs each, plus 5 runs to test whether the Levenberg- Marquardt-step was successful (no unsuccessful steps are performed in the first 5 iterations).... In PAGE 7: ... Five iterations are performed for this benchmark example, requiring a total of 36 forward simulations, namely an initial run, 5 evaluations of the Jacobian matrix at 6 runs each, plus 5 runs to test whether the Levenberg- Marquardt-step was successful (no unsuccessful steps are performed in the first 5 iterations). According to Table1 , 6 of the 36 runs must be executed in sequence, and 30 can be performed in parallel. The parallel runs are expected to require about the same time as spent by 30/nprocs forward runs.... ..."

### Table 2. The parallel calculate efficiency

in Fabled Boundary Condition Forecast Algorithm in Multigrid Domain Decomposition Parallel Computing

"... In PAGE 5: ...tep of 2nd grid is h = 0.5/N.The step of 3rd grid is h=0.25/N etc. Table2 shows the parallel calculate efficiency data of model1 and model 3. When N gt;2, the efficiency of model 3 is 12.... ..."

### Table 2. Efficiency measures.

1997

"... In PAGE 8: ... Program speed-up. Table2 shows the efficiency measures of the parallel model implementation using different grid sizes on 2, 4, 8, 16, and 32 PEs. These measures show how each single processor is efficiently used during the execution of the... ..."

Cited by 9

### Table 1. Overview of each cluster system that tested the efficiency and performance gains of the Parallel-Horus framework.

"... In PAGE 5: ...cluster systems in Europe and one in Australia. Table1 provides the specific characteristics of each cluster. More recently, we ran these appli- cations at an even larger scale, concurrently using more than 20 cluster systems located on three different continents (see http://www.... ..."

### Table 2 shows the speedup and the efficiency of the parallel algorithm: the speedup

"... In PAGE 7: ... Nr. 21164/500/256x256 Speedup (Efficiency) 21264/500/256x256 Speedup (Efficiency) 21164/500/1000x1000 Speedup (Efficiency) 21264/500/1000x1000 Speedup (Efficiency) 4 1,57817 (0,78909) 1,06446 (0,53223) 2,03657 (1,01828) 1,28538 (0,64269) 5 2,56767 (1,28383) 1,73186 (0,86593) 3,35237 (1,67618) 2,11585 (1,05793) 6 2,34550 (0,78183) 1,58201 (0,52734) 4,54209 (1,51403) 2,86675 (0,95558) 7 2,36571 (0,78857) 1,59564 (0,53188) 4,63311 (1,54437) 2,92420 (0,97473) 8 2,82883 (0,70721) 1,90801 (0,47700) 3,35146 (0,83787) 2,11528 (0,52882) 9 3,12352 (0,62470) 2,10678 (0,42136) 4,00895 (0,80179) 2,53025 (0,50605) Table2 : Average speedup and efficiency values in the PVM environment Figure 7: Average run times for the smoothing of the 256x256 (left) and the 1000x1000 (right) image Average Run Times (256x256) 0 1 2 3 4 5 6 7 12345 Processing Elements Ti m e [ s ] Average Run Times (1000x1000) 0 50 100 150 200 250 12345 Processing Elements Ti m e [ s... ..."

### Table 8: Performance Result in Parallel Run-Time (seconds) on a 3 20 Process Grid

in APoly-Algorithm for Parallel Dense Matrix Multiplication on Two-Dimensional Process Grid Topologies

1993

"... In PAGE 48: ... Because in small process count case, the broadcast is less expensiveandthus the MM algorithms perform well. Table8 lists the performance result on a 3 20 process grid (the parallel run time of the fastest algorithm in each test case is highlighted using bold letters). Although the total number of processes in this test case is almost the same as that of the 7 9 grid case, the ratio of grid shapes is quite di erent.... ..."

### Table 6: Performance Result in Parallel Run-Time (seconds) on a 4 5 Process Grid

in APoly-Algorithm for Parallel Dense Matrix Multiplication on Two-Dimensional Process Grid Topologies

1993

"... In PAGE 48: ... The case 5 is a symmetric case of the case 1 with M =4000andN =500. Table6 lists the performance result on a 4 5 process grid (the parallel run time of the fastest algorithm in each test case is highlighted using bold letters). Because P lt;Q,we expect the column versions of MM algorithm are better than the rowversions.... ..."