### Table 10: Test matrix generated from a discretization on a 64 64 grid: Laplace apos;s equation. Times shown in table are in microseconds. The experiments are performed on the BBN TC2000. In the Parallel distributed Cimmino solver the number of generated subsystems, for numerical reasons, is related with the structure of the problem and not with the number of available computing elements. Therefore, we implemented a scheduler that statically distributes tasks to the computing elements trying to keep the work load balanced among the processing elements and to take advantage of available interconnection networks. Part of our current research objectives is to test the scheduler in a heterogeneous environment using 11

"... In PAGE 11: ... Finally, we developped an implementation where a single process performs the steps of the Block-CG, and only the matrix?matrix products that involve the iteration matrix are performed in parallel (Master-Slave : centralized). In Arioli, Drummond, Du , and Ruiz (1994a), we present results obtained for the three implementations using PVM 3 on a BBN TC2000 computer (see Table10 ) and a heterogeneous network of IBM RS6000 and SUN Sparc 10 workstations. Laplace Matrix 4096 x 4096 (Block size = 4, 171 iterations) Elapsed Time of sequential version = 279142... ..."

### Table 2 Specifications of the Eleven Heterogeneous Computers Machine

"... In PAGE 12: ... 5.2 Applications A small heterogeneous local network of 11 different Solaris and Linux workstations shown in Table2 is used in the experiments. The network is based on 100 Mbit Ethernet with a switch enabling parallel communications between the computers.... In PAGE 15: ... 7. Determination of a set with relatively few points used to build the speed functions of the processors X2-X5 whose specifications are shown in Table2 . As few as 6 points and 5 points are used to build an efficient speed function for matrix multiplication and LU factorization respectively with deviation approximately 5% from other speed functions built with more number of points.... In PAGE 15: ... Though the absolute speed must be obtained by multiplication of two dense non-square matrices, we observed that our serial version gives almost the same speeds for multiplication of two dense square matrices if the number of elements in a dense non-square matrix is the same as the number of elements in a dense square matrix. This is illustrated in Table 3 for computers X2-X5 whose specifications are shown in Table2 . Thus speed functions of the processors built using dense square matrices will be the same as those built using dense non-square matrices.... In PAGE 17: ... However allocation of a task to these computers, the size of which is greater than 36000000 and 81000000 for matrix-matrix multiplication and LU factorization respectively, will result in severe performance degradation of the parallel application. For each of these two applications, the largest problem size that can be solved on the network of heterogeneous networks shown in Table2 is just the sum of the largest sizes of the tasks that can be solved on each computer. There are three important issues in selecting a set of points to build a speed function of a processor: 1.... In PAGE 18: ... Speeds of the processors are assumed to be zero for problem sizes beyond their upper bounds. multiplication obtained using three sets of 6, 7, and 8 points and speed functions for LU factorization obtained using three sets of 5, 7, and 8 points for the computers X2-X5 whose specifications are shown in Table2 . It can be seen that 6 points and 5 points are enough to build an efficient speed function that fall within acceptable limits of deviation for matrix multiplication and LU factorization respectively.... ..."

### Table 1: Application program performance by employing ATME. is developed to deal with the conditional task schedul- ing. Based on the ideas introduced in this paper, we develop an environment, named ATME, to automate the conditional scheduling support. ATME adjusts the scheduling policy between di erent executions for the purpose of improving program performance, by pre- dicting the task model from models used in past exe- cutions.Experimental results indicate that application pro- grams employing ATME to automate the scheduling issues can not only free the programmer from the need of considering those operational issues, but also achieve good parallel execution time for the parallel program.

1997

"... In PAGE 4: ... We introduce the term AIR to measure the exe- cution e ciency di erence between that achieved by ATME against the ideal one: AIR = Ideal Exec: Time ? Actual Exec: Time Ideal Exec: Time A positive value for AIR represents ATME perform- ing better than when the task attributes are precisely known prior to execution; while a negative AIR shows the opposite. Table1 shows the execution time of application programs when employing ATME to automate task scheduling. The results illustrate the performance of ATME in di erent situations represented by the value of AvePMRatio.... ..."

Cited by 2

### Table 1. Summary of Representative Scheduling Algorithms and Their Features

in Institute Open Issues and Challenges in Security-aware Real-Time Scheduling for Distributed Systems

"... In PAGE 12: ...3 to provide performance guarantees to a wide variety of applications while achieving high utilization of system resources shared by these applications executing in a dynamic heterogeneous computing environment. We summarize in Table1 the most relevant scheduling algorithms described in the literature. It is noted from Table 1 that SAREH differs from the existing algorithms in that it is a closed-loop, dynamic, real-time, security-aware algorithm designed for heterogeneous distributed systems.... In PAGE 12: ... We summarize in Table 1 the most relevant scheduling algorithms described in the literature. It is noted from Table1 that SAREH differs from the existing algorithms in that it is a closed-loop, dynamic, real-time, security-aware algorithm designed for heterogeneous distributed systems. 4.... ..."

### Table 3. Aggregate improvements in schedule length

2005

"... In PAGE 24: ... And, the quality criterion indicating improvement (decrease) in schedule length for each problem instance when our placement-aware priority function is used compared to placement-unaware LPF as: 100 A3 B4Tlongest path A0 TheuB5BPTheu Figure 10 shows that our placement-aware priority function consistently generates better sched- ules. Table3 summarizes the result for 120 problem instances. Each entry in the table represents data from a set of instances.... In PAGE 24: ...86%. As is clear from Table3 , while a simple longest path heuristic works reasonably well with small graphs and few columns, our heuristic clearly generates superior (shorter) schedules, both with increasing problem size. The key difference is that LPF also tries to improve schedule length by prefetch, but only after selecting the task to be scheduled, while our heuristic considers placement implications in task selection.... ..."

Cited by 2

### Table 1. Performance improvements from scheduler optimizations.

"... In PAGE 7: ...2. Table1 presents the performance results of these optimizations applied in dif- ferent combinations. The column labeled convergent shows the results for Convergent scheduling, described in Sec- tion 5.... In PAGE 7: ...y 12.4% over critical path re-computation (CR). The locality-aware optimizations bias the placement of load instructions (CRBL) and instructions that write to the register file (CRBLO). Table1... ..."

### Table 2: Data to estimate scheduling headroom for predicate-aware scheduler

2003

"... In PAGE 7: ... The gap between these lower and upper bounds constitutes the headroom for the predicate-aware scheduler. Table2 shows an estimate of the predicate-aware scheduler head- room on acyclic (columns 2-7) and cyclic (columns 8-13) pa-ready regions. The data presented is averaged over all benchmarks.... In PAGE 8: ... Columns 8-13 show similar data for the cyclic regions; resource-constrained schedule length is defined by ResMII, and latency-constrained schedule length is defined by RecMII for cyclic regions. Relative to the pa-read acyclic region schedule length on the baseline 4(6)-wide machine (with a cmpp latency of 1 cycle), we see from Table2 that on average the critical path length for cmpp latencies of 1, 2 and 3 cycles respectively is 24%(10.... In PAGE 8: ... This explains the degradation in PALS speedup as we go from a 4-wide to a 6-wide baseline machine for fixed cmpp latency. In the case of cyclic regions, the PAMS lower-bound is deter- mined by the resource-constrained schedule length of the predicate- aware processor (a174a159a33a42a30a67a89a23a175a58a175a78a168 a166 , see Table2 ). Latency-constrained schedule length (a174a88a33a79a65a79a89a23a175a58a175 ) is not a limiting factor for either 4-wide or 6-wide machine models.... In PAGE 8: ... Latency-constrained schedule length (a174a88a33a79a65a79a89a23a175a58a175 ) is not a limiting factor for either 4-wide or 6-wide machine models. As Table2 shows, a174a88a33a79a65a79a89a23a175a58a175 for, even for a cmpp latency of 3, is much smaller than a174a88a33a15a30a67a89a62a175a58a175a78a168 a166 . There- fore, as columns 10-17 of Table 3 show, PAMS achieves substantial speedups for all cmpp latencies (18%, 16%, 18%).... In PAGE 9: ... For cmpp latencies of 2 and 3, most of the performance improvement comes from PAMS. The speedup achieved on the entire application is smaller than the speedup achieved on pa-ready cyclic regions alone, since as Table2 shows, these regions constitute on average only 36% of the total application baseline execution time. 5.... ..."

Cited by 2

### Table 3. Factors, levels and application makespan for WQRxx. Task heterogeneity 0% Task heterogeneity 100% Mean task size

2007

"... In PAGE 16: ... The goal was to establish how the different factor influences the results. In Table3 , we can observe how each factor (on its extreme levels) affects the application makespan under WQRxx scheduling (the best scheduler on average). Table 3.... In PAGE 16: ...Equation 2 is the solution for the linear system derived from Table3 , where XG is the mean task size factor, XT is the task heterogeneity factor, XM is the machine heterogeneity factor and the other terms are combinations of interaction among those factors. = + + + + + + + 105376898 37144934 858248.... ..."

Cited by 1

### Table 3. Average scheduling time per task Scheduling

in A dynamic critical path algorithm for scheduling scientific workflow applications on global grids

2007

"... In PAGE 7: ....3.3 Scheduling time. Table3 shows the average scheduling time (in milliseconds) for one task of parallel, fork-join and random workflows to generate a single schedule for different scheduling techniques. To generate a single schedule, Myopic, Min-Min, Max- Min and HEFT require nearly 1 millisecond for each task irrespective of workflow size and type whereas the average scheduling time of one task for DCP-G is 16 to 17 milliseconds, and does not vary with the type of workflow as the task selection procedure is independ- ent of workflow structure.... In PAGE 7: ... So, scheduling time increases with the increase in the size of the workflow. While it is possible to reschedule at regular intervals in GA and GRASP, Table3 shows that the scheduling times for these are at least 100 times that of DCP-G, and increases with the size of the workflow as well. Hence, we did not incorporate rescheduling for GA and GRASP in the experiments for dynamic environment.... ..."

Cited by 2

### Table 3: Computational Results in Parallel Scheduling Problem

2002

"... In PAGE 22: ... If a machine cannot be scheduled, a cut of the form (16) is added to the next assignment MILP problem. In Table3 , the problems are solved using complete MILP and CP formulations, as well as the hybrid model, with modified data given in Harjunkoski et al. (2000) where the original release dates, due dates and durations (Jain and Grossmann, 2000) have been arbitrarily changed and roughly multiplied by a factor of 10 to test the robustness of the method.... ..."

Cited by 2