### Table 1. E ciency of parallel algorithms

1998

"... In PAGE 17: ... The Schwarz alternating procedure with overlapping has been used. The e ciency of parallel iterative algorithms is reported in Table1 using the classical de nition of e ciency: e = t1 tp 1 p,wheretpdenotes the computing time using p processors. Results are given for discretized domains with 25000 points.... In PAGE 17: ... Results are given for discretized domains with 25000 points. From Table1 it can be seen that the e ciency of asynchronous iterations with order intervals is better than the e ciency of parallel synchronous iterations. Idle time due to synchro-... ..."

Cited by 10

### Table 1: Floating point performance characteristics of individual cores of modern, multi-core processor architectures. DGESV and SGESV are the LAPACK subroutines for dense system solution in double precision and single precision respectively. Architecture Clock DP Peak SP Peak time(DGESV)/

2007

"... In PAGE 2: ... When combined with the size of the register file of 128 registers, it is capable of delivering close to peak performance on many common computationally intensive workloads. Table1 shows the difference in peak performance between single precision (SP) and double precision (DP) of four modern processor architectures; also, on the last column is reported the ratio between the time needed to solve a dense linear system in double and single precision by means of the LAPACK DGESV and SGESV respec- tively. Following the recent trend in chip design, all of the presented processors are multi-core architectures.... In PAGE 6: ... For the Cell processor (see Figures 7 and 8), parallel implementations of Algo- rithms 2 and 3 have been produced in order to exploit the full computational power of the processor. Due to the large difference between the single precision and double precision floating point units (see Table1 ), the mixed precision solver performs up to 7 and 11 faster than the double precision peak in the unsymmetric and symmetric, positive definite cases respectively. Implementation details for this case can be found in [7, 8].... ..."

### Table 3 The parallel complexity of the algorithms with the number of processors p less than the number of subproblems

1994

"... In PAGE 19: ... The parallel arithmetic complexity of GMRES, with p processors, is then approximately CGMRES(I; p) = I(I + 1) np + log2(p) + I np + I 5n p ; where the term I(I + 1)(np + log2(p)) is from the dot products and DAXPYs of the Gram-Schmidt process, the term np I from the forming of the new approximate solution after I steps are complete, and 5n p I from the matrix- vector multiply. The parallel arithmetic complexity is estimated in Table 2 for p equal to the number of subproblems and in Table3 for p less than the number of subproblems.It is also important to consider the parallel communication complex- ity, though its impact is architecture dependent.... In PAGE 28: ... Curiously, increasing overlap seems to degrade convergence in the strongly inde nite case, whereas it always improves the convergence of de nite operators. For instance, when H = 1=8 and = 300, overlaps of h, 2h, 3h (not listed in Table3 ), and 4h lead to iteration counts of 35, 37, 43, and gt; 100, respectively. Loss of orthogonality likely plays a contributing role in the upturn.... ..."

Cited by 53

### Table 1. Performance comparison of clustering algorithms with and without iterative feature selection

"... In PAGE 8: ... Detailed analyses not given here also showed that the Markov Blanket filter imposes more influence on the stability and correctness of the clustering than the information gain filter, especially when the number of the features to be finally used is small. In a comparison of the clustering result using different approaches ( Table1 ), we can see that CLIFF outperforms both C3-means with feature selection and NCut without feature selection. The number of features selected and used to compute the affinity matrix C5 during each iteration is chosen em- pirically, and the clustering result is sensitive to different choices of this number.... ..."

### Table 3. Parameters for the simulated multi-core system.

"... In PAGE 7: ...1 Environment We use an execution-driven simulator that models multi- core systems with MESI coherence and support for hard- ware or hybrid TM systems. Table3 summarizes the pa- rameters for the simulated CMP architecture. All opera- tions, except loads and stores, have a CPI of 1.... ..."

### Table 10: Additive and multiplicative preconditioner, test3.i We now compare additive and multiplicative methods used as preconditioners. Since we use a conjugated gradient method, we cannot use the (alternating) multiplicative Schwarz iteration, but we have to use the symmetric variant instead. Compare the number of iterations. The previous remark on parallel computing and additive methods also applies here. Exercise 14 Size of overlap in the Schwarz iteration. (table 11, test4.i)

1996

Cited by 1

### Table 1: Algorithmic complexity of parallel parametric dissection on various architectures.

1997

"... In PAGE 16: ... The details of these algorithms are involved and may be found in our earlier technical report [13]. Table1 summarizes our results. 7 Applications to Unstructured Meshes A portion of a 2-d unstructured mesh is shown in Figure 7.... ..."

Cited by 5

### Table 1: Algorithmic complexity of parallel parametric dissection on various architectures.

"... In PAGE 16: ... The details of these algorithms are involved and may be found in our earlier technical report [13]. Table1 summarizes our results. 7 Applications to Unstructured Meshes A portion of a 2-d unstructured mesh is shown in Figure 7.... ..."

### Table 2 Speedups for the parallel implementation of Algorithms A1 and A2 for di erent and overlapping interval sizes H (the values of H, see Table 1).

1996

"... In PAGE 16: ... In this situation, the convergence of Algorithms A1 and A2 is de ned by a decrease rate of these residual errors. In Table2 , we give the speedups for Algorithms A1 and A2 with respect to the direct (\undecomposed quot;) algorithm. In the table, we use the notations SA = td=tA p , where td is the execution time for the direct algorithm and tA p for the iterative algorithms by parallel processing.... In PAGE 16: ... In the table, we use the notations SA = td=tA p , where td is the execution time for the direct algorithm and tA p for the iterative algorithms by parallel processing. Based on the results given in Table2 , we discuss some issues related to the compu- tational e ectiveness of Algorithms A1 and A2. In Table 2, we underline the maximum values of the speedups for the algorithms.... In PAGE 16: ... Based on the results given in Table 2, we discuss some issues related to the compu- tational e ectiveness of Algorithms A1 and A2. In Table2 , we underline the maximum values of the speedups for the algorithms. The juxtaposition of the data from Tables... In PAGE 17: ... In this case, the maximum values of the speedups are relatively small, because they are achieved at big values of . From Table2 , it also follows that the parallel implementation of Algorithm A2 is in all cases faster than that of Algorithm A1. However, the speedups of the algorithms at = 10?1 do not much di er from one another.... In PAGE 17: ... This is because by solving the test problem at these values of and a remarkable part of the computational cost of Algorithm A2 falls on the interfacial subproblems (the solution of these subproblems represents two sequential steps of the algorithm). One can see from Table2 that at some values of the \critical quot; parameters the speedups of the algorithms are \superlinear quot; (more than the number of the processors which are used in parallel). This is, most probably, explained by the e ect of the cache- memory of the workstations used in computations.... ..."

Cited by 1

### Table 1 Number of iterations required for various grid sizes and overlaps

"... In PAGE 7: ... Their parallel implementations which will be presented elsewhere. Results The test results are gathered in Table1 . The overlap between the subdomains is labeled Ov and is the same for any two subdomains that overlap.... ..."