### Table 1: Costs of communication primitives on the hypercube computer.

1995

"... In PAGE 10: ...themselves. Table1 shows the communication costs of these primitives on the hypercube computer. The parameter m denotes the message size in words, seq is a sequence of identi ers representing the processors in various dimensions over which the collective communication primitive is carried out.... In PAGE 29: ... It is clear that the pipelined version program runs faster than the program using broadcast operations. (a) matrix size 29 29 block size #PE = 2 #PE = 4 #PE = 8 #PE = 16 #PE = 32 21 21 226 (222) 117 (118) 63 (68) 35 (46) 22 (36) 22 22 134 (134) 69 (70) 36 (39) 20 (24) 12 (17) 23 23 104 (105) 54 (56) 28 (31) 15 (19) 10 (13) 24 24 92 (95) 48 (51) 27 (31) 15 (20) 9 (13) 25 25 88 (92) 48 (54) 28 (34) 17 (24) ** (b) matrix size 210 210 block size #PE = 2 #PE = 4 #PE = 8 #PE = 16 #PE = 32 21 21 1767 (1744) 900 (898) 466 (485) 250 (286) 141 (194) 22 22 1061 (1058) 537 (542) 276 (285) 144 (159) 79 (99) 23 23 824 (826) 418 (424) 214 (224) 113 (124) 63 (75) 24 24 726 (733) 372 (384) 195 (209) 105 (122) 59 (78) 25 25 686 (701) 359 (381) 195 (222) 111 (140) 65 (95) Table1 0: The simulation time (in units of seconds) of running a Gauss elimination algorithm for linear systems. The data that are not in parentheses are obtained from a pipelined version algorithm; the data that are in parentheses are obtained from an algorithm using broadcast operations.... ..."

Cited by 2

### Table 1. The two parallelisms involved in a single simulation time step Monte Carlo part Poisson part

"... In PAGE 9: ... Beyond 12 processors, this part takes more than 10 % of the total computing time. During potential exchange: the scheme described in Table1 has been modi ed for the SPMD implementation. Each processor broadcasts the potential values to all the processors.... ..."

### Table 4. Parameters for MPI reduction primitives. Reduce Allreduce Reduce scatter Parallel prefix

"... In PAGE 5: ... Another example: for D4=3 and D2=64 Kbytes, D8D7 represents AP 80% of latency. Table4 presents the estimated parameters of the model for MPI reduction routines. As expected, MPI Reduce is C7B4D0D3CVBED4B5, which means that it uses a tree-structured communication pattern (bottom-up traversing of the tree).... ..."

Cited by 1

### Table 2: Times for Smith Form computation for GL7(Z) matrices. From left to right: the dimensions of the matrix after reductions, rank approximation by reductions, time of reading and reducing the matrix, time for the adaptive algorithm for a reduced matrix; times for the original matrix: smooth form computation, adaptive algorithm; valence computation in parallel - sequential time equivalent.

704

"... In PAGE 7: ... As the result for these matrices we give the number of invariant factors divisible by 2 and 3. In Table2 we give the times for the Smith form algorithms used. For cases with * no data are available or relevant.... ..."

### Table 2.5: The total computational complexity for parallel Hessenberg reduction Data distribution Calc(n; p)

### Table 5: Communication pattern of linear algebra kernels (the array dimensions for reduction and broadcast are of source and destination respectively).

"... In PAGE 12: ... Table 4 gives an overview of the data representation and layout for the dominating computations. Table5 shows the communication operations used along with their associated array ranks. Table 6 tabulates the computation to communication ratio in the main loop of each linear algebra benchmark.... ..."

### Table 1: Parallel Computers in Our Data Set

### Table 2: PSNR and Computation Cost Reduc- tion

1997

"... In PAGE 6: ... It is clear that the HCVQ based system exhibits a superior performance than all the other schemes. Table2 shows the PSNR and the reduction in computational complexity, using 5 control points interpolation procedure, at di erent bit rate. 7.... ..."

Cited by 1

### Table 1. Communication pattern of linear algebra kernels (the array dimensions for reduction and broadcast are of source and destination respectively).

"... In PAGE 3: ... The linear algebra library function subset included in the HPFBench suite is comprised of eight routines. Table1 shows an overview of the data layout for the dominating computations and the communication operations used along with their associated array ranks. conj-grad uses the Conjugate Gradient method for the solution of a single instance of a tridiagonal system.... ..."

### Table 2. Computation efficiency improvement by reducing the redundancy

"... In PAGE 5: ... The same randomly generated systems are used. The average CPU times to find the optimal voltage schedules by apply- ing Algorithm 1 or Algorithm 2 to construct the NAP job sets are collected and shown in Table2 . Table 2 shows the dramatic reduction of computational cost by applying Algo- rithm 2, especially for systems with a large number of jobs.... ..."