### Table IV. Criteria that relate interaction with artifact Criterion Reference Division Interactive: each participant has a parallel electronic communication channel towards a group memory.

1999

Cited by 1

### Table 1 summarizes the breakdown of execution time. Some of the above categories can be further divided depending on the objective of profiling. A fine grained profiling tool may want to include categories like cache misses and page fault overheads. In our implementation, we provide a finer division of communication overhead. The point is that the non-scalable code can be classified into meaningful categories for program profiling.

"... In PAGE 4: ... Table1 : Processor states in parallel execution We believe that it is important that the profiler provide a precise measurement of these categories and not just a summary judgement. In general, a simple verdict (such as poor scalability, load imbalance, or poor mapping of distributed arrays) cannot be made accurately as there can be many possible causes of poor performance.... ..."

### Table 9.4: Forbidden Operators for Communication Objects

### Table 2: Communication Overheads

1995

"... In PAGE 12: ... Several regular and irregular data partitioning methods have been implemented to compare the communication overheads. Table2 presents average communication times of different data partitioning methods from 16 to 128 processors. Atom decomposition was used as the iteration partitioning algorithm.... In PAGE 12: ... BLOCK divides an array into contiguous chunks of size N=P and assigns one block to each processor, whereas CYCLIC specifies a round-robin division of an array and assigns every P th element to the same processor. Table2 shows that both BLOCK and CYCLIC do not exploit locality and, therefore, cause higher communication overheads. Weighted BLOCK divides an array into contiguous chunks with different sizes so that each chunk would have the same amount of computational work.... ..."

Cited by 47

### Table 2: Communication Overheads

"... In PAGE 12: ... Several regular and irregular data partitioning methods have been implemented to compare the communication overheads. Table2 presents average communication times of different data partitioning methods from 16 to 128 processors. Atom decomposition was used as the iteration partitioning algorithm.... In PAGE 12: ... BLOCK divides an array into contiguous chunks of size N=P and assigns one block to each processor, whereas CYCLIC specifies a round-robin division of an array and assigns every Pth element to the same processor. Table2 shows that both BLOCK and CYCLIC do not exploit locality and, therefore, cause higher communication overheads. Weighted BLOCK divides an array into contiguous chunks with different sizes so that each chunk would have the same amount of computational work.... ..."

### Table 2: Communication Overheads

"... In PAGE 12: ... Several regular and irregular data partitioning methods have been implemented to compare the communication overheads. Table2 presents average communication times of different data partitioning methods from 16 to 128 processors. Atom decomposition was used as the iteration partitioning algorithm.... In PAGE 12: ... BLOCK divides an array into contiguous chunks of size N=P and assigns one block to each processor, whereas CYCLIC specifies a round-robin division of an array and assigns every Pth element to the same processor. Table2 shows that both BLOCK and CYCLIC do not exploit locality and, therefore, 1Not available due to memory limitation of iPSC/860... ..."

### Table 2: Communication Overheads

"... In PAGE 12: ... Several regular and irregular data partitioning methods have been implemented to comparethe communication overheads. Table2 presents average communication times of different data partitioning methods from 16 to 128 processors. Atom decomposition was used as the iteration partitioning algorithm.... In PAGE 12: ... BLOCK divides an array into contiguous chunks of size a194a37a210a87a211 and assigns one block to each processor, whereas CYCLIC specifies a round-robin division of an array and assigns every a211a69a218a110a219 element to the same processor. Table2 shows that both BLOCK and CYCLIC do not exploit locality and, therefore, 1Not available due to memory limitation of iPSC/860... ..."

### Table 3: Communication and synchronisation cost for data distributions with p = 100

1994

"... In PAGE 21: ... It does not perform very well on small problems and even for larger problems there are superior distributions, such as the diagonal quot; distribution, which imposes an equal division of the matrix diagonal over the processors and hence causes a good load balance in the summation of partial sums. The results of Table3 show that it is quite hard to achieve a low communication cost for general sparse matrices, i.e.... ..."

Cited by 81

### Table 3: Communication and synchronisation cost for data distributions with p = 100

"... In PAGE 21: ... It does not perform very well on small problems and even for larger problems there are superior distributions, such as the diagonal quot; distribution, which imposes an equal division of the matrix diagonal over the processors and hence causes a good load balance in the summation of partial sums. The results of Table3 show that it is quite hard to achieve a low communication cost for general sparse matrices, i.e.... ..."

### Table 3: Divisibility of the discriminants

1993

"... In PAGE 12: ... is a norm in K (provided that (;D=N) = +1). This yields Table 2. Let S be a nite set of primes (here 4 and 8 are assumed to be distinct primes). We de ne N p (S) to be the number of D in D which are divisible by at least one prime of S: This quantity is tabulated in Table3 . From the above results, it is quite clear that bad numbers are those which are quadratic nonresidue modulo small primes, suchasN ;1 mod 12, which kill o one third of our discriminants.... ..."

Cited by 124