### Table 2 presents several analysis statistics. For each parallel extent it presents the number of auxiliary call sites in the extent, the number of methods in the extent, the number of independent pairs of methods in the extent and the number of pairs that the compiler had to symbolically execute. All of the parallel extents have a significant number of auxiliary call sites; the compiler would be unable to parallelize any of the extents if the commutativity testing phase included the auxiliary operations. Most of the pairs of invoked methods in the extent are always independent, which means that the compiler has to symbolically execute relatively few pairs.

1996

"... In PAGE 15: ... Table2 : Analysis Statistics for Barnes-Hut 6.2.... ..."

Cited by 46

### Table 1: Comparison of ow analysis capabilities of parallelizing compilers.

1992

"... In PAGE 14: ... A set of program constructs that may produce such GIVs are triangular and trapezoidal loops, which generate GIVs that can be characterized by polynomials in terms of enclosing loop index variables. None of the state-of- the-art parallelizing compilers ( Table1 ) that we studied could handle these cases, not even when the loop bounds were in such a way that result in linear induction variables. An example of a trapezoidal loop with a linear induction variable, together with its transformed code generated by Parafrase-2, is shown in Figure 11.... In PAGE 17: ... In Perfect benchmarks, programs ADM, MG3D, and SPEC77 contain induction variables with multiple assignments. Surprisingly, some of the state-of-the-art parallelizing compilers ( Table1 ) do not handle such simple cases of induction variables. ia = 0 ib = 0 do j = 1, n i = ia + p a (i) = a (i) + 1 do j = 1, n ia = i a (j * p) = a (j * p) + 1 i = ib + q b (j * q) = b (j * q) + 1 b (i) = b (i) + 1 end do ib = i i = c (j) end do Figure 18: Example of an induction variable with multiple assignments.... ..."

Cited by 34

### Table II. Analysis Statistics for Barnes-Hut Parallel Auxiliary Extent Independent Symbolically

1997

Cited by 47

### Table II. Analysis Statistics for Barnes-Hut Parallel Auxiliary Extent Independent Symbolically

1997

Cited by 47

### Table 1: Three Bounded-Geometric Distributions for N Symbol Parallelism

1993

"... In PAGE 8: ... (Thus = D=P = .) For most of our experiments we use the following bounded-geometric distribution for available job parallelism (similar to the distribution in [10]), with parameters Pmax and p N = 8 lt; : P with probability Pmax min(X; P) with probability 1 ? Pmax; where X = Geometric(p): We consider three speci c bounded-geometric distributions for N, which are given in Table1 . More details of these workloads are given in [12].... In PAGE 13: ... RPSAPF , RFCFS, and REQ were estimated by discrete event simulation6, and R1 FB was obtained from the analysis in [8]. Figure 2 plots the ratio R =R 1 versus = D=P for the H, M, and L parallelism distributions given in Table1 and P=20. Note that the Y-axis of the gure has a log scale.... In PAGE 17: ...orkload. The DPS bound is looser for the M workload (not shown) than for the H workload. The reason for the looseness of the bound under the H and M workloads is that processors are idle under RRP if there are only a few jobs in the system, each with low parallelism. As shown in Table1 , 10% of the H workload consists of sequential jobs (N=1). For the M workload it can be shown that slightly more than 10% of the jobs have an available parallelism of 5 or less.... In PAGE 24: ....2.1 Range of Policy Performance in C 0 Since PSAPF is optimal and PLAPF is pessimal in C 0 , under A1 and A2, the ratio of RPLAPF to RPSAPF measures the range of policy performance in C 0 . Figure 4 plots this ratio versus = D=P for the H, M, and L workloads of Table1 at P=20 and P=100. We note that the di erence between PSAPF and PLAPF decreases as workload parallelism decreases.... In PAGE 34: ... Since D = P for this experiment REQ(N P) = D=P = 1 at = 0, and thus the ratio REQ=REQ(N P) = S=1 = S at = 0. For the workloads in Table1 , S is lower for the M workload than for the H workload. Note, however, that there are many workloads with moderate N (not shown) that have higher mean service time than the H workload.... ..."

Cited by 5

### Table 1: PARALLELISM DISTRIBUTION FOR THE COMPILER-

"... In PAGE 2: ... HPAM Sim#5B6#5D was then used to evaluate the percent- age of time #28measured in cycles#29 during whichabench- mark executed with each distinct DoP. Table1 shows the DoP time percentages for the CMU benchmarks #28for clarity, percentages of less than 1#25 of the total number of cycles are omitted#29. The information in- cluded in Table 1doesnot re#0Dect the communication behavior of the benchmarks.... In PAGE 2: ... Table 1 shows the DoP time percentages for the CMU benchmarks #28for clarity, percentages of less than 1#25 of the total number of cycles are omitted#29. The information in- cluded in Table1 doesnot re#0Dect the communication behavior of the benchmarks. In order to complement Table 1 and to gain insightinto the communication be- havior of the benchmarks, the ratio of the total time spent in computation to the total time spent in com- munication was estimated for each benchmark #28Figure 1#29.... In PAGE 2: ... The information in- cluded in Table 1doesnot re#0Dect the communication behavior of the benchmarks. In order to complement Table1 and to gain insightinto the communication be- havior of the benchmarks, the ratio of the total time spent in computation to the total time spent in com- munication was estimated for each benchmark #28Figure 1#29. This ratio #28denoted CCratio#29 is obtained by trac- ing the communication and computation tasks for one processor #28i.... In PAGE 2: ... This ratio depends on the underlying communication model which is discussed in Section 3. Table1 and Figure 1 show that the CMU bench- marks have di#0Berent parallelism and communication distributions. The benchmark 1D Fast Fourier Trans- form #28#0Bt1#29 benchmark has two predominant DoPs : a large sequential portion #28i.... In PAGE 2: ... Thus, the faster the #0Crst level, the more performance gain the heterogeneous machine will haveover the homogeneous machine. As shown in Table1 , airshed has #0Cve distinct DoPs including a small sequential portion. However, the par- allel portions of airshed have a high communication overhead #28Figure 1#29.... ..."

### Table 1: Loop bounds and array references in Perfect Benchmarks r . For an example that requires symbolic dependence analysis, consider the code segment from program TRFD shown in Figure 3. After elimination of the generalized induction variable mijkl, the subscript of array xijkl in Figure 4 is a nonlinear function of enclosing loop index variables, and contains an unknown symbolic term, num. Another unknown symbolic term, morb, also appears in the enclosing loop bounds. However, the compiler can use the following lemma, which is based on nite di erences, to come up with a set of constraints su cient to prove that the value of the subscript expression of the array xijkl is strictly increasing. Veri cation of these conditions at compile time indicates lack of dependences. If the compiler is not able to prove the validity of these conditions, they may be veri ed at run time to support a multi-version code. Lemma 2.1 Suppose that L is a nest of length n of normalized loops, indexed by (i1; i2; ; in), and characterized by the loop bounds:

1993

"... In PAGE 4: ... Also, a large percentage of array references in theses benchmarks con- tain symbolic terms other than the enclosing loop index variables. These measurements are summarized in Table1 . Parallelizing compilers handle some of these cases by applying classi- cal optimization techniques such as constant propagation, induction variable substitution, and forward substitution.... ..."

Cited by 38

### Table 2. Compilation and analysis times.

2003

Cited by 3

### Table 2: Parallelism distribution for the hand- parallelized CMU benchmarks.

"... In PAGE 15: ... For the hand-parallelized version of t2, of the 99.99% execution time segment with DoP = 65536 in Table2 , 58.... In PAGE 15: ... This parallelism is hard to detect with a parallelizing compiler because it requires extensive interprocedural analysis. As shown in Figure 2 and Table2 , airshed is mostly parallel and has a high CCratio. This suggests that airshed is best mapped onto a homogeneous one-level organization.... ..."

### Table 2: Parallelism distribution for the hand- parallelized CMU benchmarks.

"... In PAGE 16: ... For the hand-parallelized version of t2, of the 99.99% execution time segment with DoP = 65536 in Table2 , 58.... In PAGE 16: ... This parallelism is hard to detect with a parallelizing compiler because it requires extensive interprocedural analysis. As shown in Figure 2 and Table2 , airshed is mostly parallel and has a high CCratio. This suggests that airshed is best mapped onto a homogeneous one-level organization.... ..."