### Table 1. Some key parameters for the different quantum-dots studied here.

### Table 6 PARATEC results using 488 atom CdSe quantum dot on the evaluated platforms. X1E experiments were conducted using X1-compiled binary

in PLATFORMS

"... In PAGE 12: ... 6.2 Experimental Results Table6 presents performance data for three CG steps of a 488 atom CdSe (cadmium selenide) quantum dot and a standard Local Density Approximation (LDA) run of Fig. 4 (a) Conduction band minimum electron state for a CdSe quantum dot of the type used in PARATEC experi- ments (Wang 2001).... In PAGE 13: ... Even though the 3-D FFT was written to minimize global com- munications, architectures with a poor balance between their bisection bandwidth and computational rate will suffer performance degradation at higher concurrencies. Table6 shows that PARATEC achieves unprecedented performance on the ES system, sustaining 5.5 Tflop/s for 2048 processors.... ..."

### Table 4. Comparison of FS-PCG, FS-PCG-XR and FS-LOBPCG methods in nding 10 eigenstates around the gap of quantum dots of increasing size.

"... In PAGE 6: ... Therefore they are all robust. The timing results are given in Table4 . For each test case the number of atoms of the quantum dot and the order n of the corresponding matrix is given.... In PAGE 6: ... Comparison of FS-PCG, FS-PCG-XR and FS-LOBPCG methods in nding 10 eigenstates around the gap of quantum dots of increasing size. From Table4 , we observe that the three methods behave almost the same.... In PAGE 7: ... The blocked implementation of FS-LOBPCG in PESCAN should run faster and also scale to larger processor counts as latency is less of an issue in the communications part of the code. Another feature of FS-LOBPCG that is not stressed in Table4 is its over- whelming superiority over FS-PCG when no preconditioner is available. In Ta- ble 5 Left, we illustrate this later feature.... ..."

### Table 4. Comparison of FS-PCG, FS-PCG-XR and FS-LOBPCG methods in finding 10 eigenstates around the gap of quantum dots of increasing size

2005

"... In PAGE 6: ... Therefore they are all robust. The timing results are given in Table4 . For each test case the number of atoms of the quantum dot and the order n of the corresponding matrix is given.... In PAGE 6: ...Table 4. Comparison of FS-PCG, FS-PCG-XR and FS-LOBPCG methods in finding 10 eigenstates around the gap of quantum dots of increasing size From Table4 , we observe that the three methods behave almost the same.... In PAGE 7: ... The blocked implementation of FS-LOBPCG in PESCAN should run faster and also scale to larger processor counts as latency is less of an issue in the communications part of the code. Another feature of FS-LOBPCG that is not stressed in Table4 is its over- whelming superiority over FS-PCG when no preconditioner is available. In Ta- ble 5 Left, we illustrate this later feature.... ..."

### Table 7 Comparison of FS-PCG and FS-LOBPCG with and without preconditioner to find mx = 10 eigenvalues of the quantum dots (83Cd, 81Se)

"... In PAGE 7: ... Another feature of FS-LOBPCG that is not stressed in Table 6 is its overwhelming superiority over FS-PCG when no preconditioner is available. In Table7 , we illustrate this latter feature. For the quantum dot (83Cd, 81Se), FS-LOBPCG runs 4 times faster than FS-PCG without preconditioner whereas it runs only 1.... ..."

### Table 6: PARATEC results using 488 atom CdSe quantum dot on the evaluated platforms.SSP results are shown as the aggregate performance of 4 SSPs to allow a direct comparison with MSP performance.

"... In PAGE 10: ... 6.1 Experimental Results Table6 presents performance data for 3 CG steps of a 488 atom CdSe (Cadmium Selenide) quantum dot and a standard Local Den- sity Approximation (LDA) run of PARATEC with a 35 Ry cut- off using norm-conserving pseudopotentials. A typical calculation would require at least 60 CG iterations to converge the charge den- sity for a CdSe dot.... ..."

### Table 6 Comparison of FS-PCG, FS-PCG-XR and FS-LOBPCG methods in finding 10 eigenstates around the gap of quantum dots of increasing size

"... In PAGE 6: ... Therefore they are all robust. The timing results are given in Table6 . For each test case the number of atoms of the quantum dot and the order n of the corresponding matrix is given.... In PAGE 6: ...From Table6 , we observe that the three methods behave almost the same. The best method (in term of time) being either FS-PCG-XR or FS-LOBPCG.... In PAGE 7: ... The blocked implementation of FS-LOBPCG in PESCAN should run faster and also scale to larger processor counts as latency is less of an issue in the communications part of the code. Another feature of FS-LOBPCG that is not stressed in Table6 is its overwhelming superiority over FS-PCG when no preconditioner is available. In Table 7, we illustrate this latter feature.... In PAGE 7: ...4 times faster with. Table 7 Comparison of FS-PCG and FS-LOBPCG with and without preconditioner to find mx = 10 eigenvalues of the quantum dots (83Cd, 81Se) # matvec Time (83Cd, 81Se) n = 34,143 FS-PCG(200) precond 15,096 264 sec FS-LOBPCG precond 10,688 210 sec FS-PCG(200) no precond 71,768 1274 sec FS-LOBPCG no precond 17,810 341 sec For the four experiments presented in Table6 , the number of inner iteration that gives the minimum total time is always attained for a small number of outer iteration, this is illustrated in Table 8 for (232Cd, 235Se) where the minimum time is obtained for 6 outer iterations. Another and more practical way of stopping the inner iteration is in fixing the requested tolerance reached at the end of the inner loop.... ..."

### Table 4. Comparison of FS-PCG, FS-PCG-XR and FS-LOBPCG methods in finding 10 eigenstates around the gap of quantum dots of increasing size.

2005

"... In PAGE 8: ... Therefore they are all robust. The timing results are given in Table4 . For each test case the number of atoms of the quantum dot and the order n of the corresponding matrix is given.... In PAGE 8: ... The parameter for the number of iterations in the inner loop (nline) for FS-PCG and FS-PCG-XR is chosen to be the optimal one among the values 20, 50, 100, 200, and 500 and is given in brackets after the solver. From Table4 , we observe that the three methods behave almost the same. The best method (in term of time) being either FS-PCG-XR or FS-LOBPCG.... In PAGE 8: ... The blocked implementation of FS-LOBPCG in PESCAN should run faster and also scale to larger processor counts as latency is less of an issue in the communications part of the code. Another feature of FS-LOBPCG that is not stressed in Table4 is its over- whelming superiority over FS-PCG when no preconditioner is available. In Ta- ble 5, we illustrate this later feature.... In PAGE 8: ...4 times faster with. For the four experiments presented in Table4 , the number of inner iteration that gives the minimum total time is always attained for a small number of outer iteration, this is illustrated in Table 6 for (232Cd, 235Se) where the minimum time is obtained for 6 outer iterations. Another and more practical way of stop- ping the inner iteration is in fixing the requested tolerance reached at the end of the inner loop.... ..."

### Table 5. Comparison of FS-PCG and FS-LOBPCG with and without preconditioner to find mx = 10 eigenvalues of the quantum dots (83Cd,81Se)

2005

### Table 1. Ground state energies of a hydrogenic impurity in spherical quantum dot of in nite depth with a radius (R). (a) exact calculation [38, 39] (b) variational calculation [38], (c) QGA based on parameter optimization and (d) QGA based on wave function optimization.

"... In PAGE 10: ...2. Results and Discussion The results obtained from the calculations are presented comparatively in Table1 along with the exact results.... In PAGE 10: ....2. Results and Discussion The results obtained from the calculations are presented comparatively in Table 1 along with the exact results. The rst column in Table1 corresponds to the radius of the well, i.e.... ..."