## Development of Parallel Methods for a 1024-Processor Hypercube (1988)

Venue: | SIAM Journal on Scientific and Statistical Computing |

Citations: | 119 - 3 self |

### BibTeX

@ARTICLE{Gustafson88developmentof,

author = {John L. Gustafson and Gary R. Montry and Robert E. Benner and C. W. Gear and John L. Gustafson and Gary R. Montry and Robert and E. Benner},

title = {Development of Parallel Methods for a 1024-Processor Hypercube},

journal = {SIAM Journal on Scientific and Statistical Computing},

year = {1988},

volume = {9},

pages = {609--638}

}

### OpenURL

### Abstract

paper. JLG 1995)

### Citations

303 |
Analysis of Numerical Methods
- Isaacson, Keller
- 1966
(Show Context)
Citation Context ...computed explicitly from F and F old . Hence, only two timesteps need to be maintained in memory simultaneously (“leapfrog” method). There is ample literature regarding the convergence of this method =-=[8, 15]-=- as a function of c 2 , h, and Δt. For example, it is necessary (but not sufficient) that (Δt) 2 ≤ (h/c) 2 / 2 (CFL condition). We use constant c and (Δt) 2 = (h/c) 2 / 2 in our benchmark. 8 24 40 56 ... |

218 |
The Cosmic Cube
- Seitz
- 1985
(Show Context)
Citation Context ...ssors, rather than Single-Instruction, Multiple-Data (SIMD) systems of one-bit processors such as the Goodyear MPP or Connection Machine. The suitability of parallel architectures, such as hypercubes =-=[20]-=-, of up to 64 processors has been demonstrated on a wide range of applications [5, 9, 10, 13, 14, 16]. The focus here is on the 1024-processor environment, which is very unforgiving of old-fashioned s... |

198 |
Validity of the single-processor approach to achieving largescale computing capabilities
- Amdahl
- 1967
(Show Context)
Citation Context ...y sequential aspect of a program. It also leads one to reexamine the traditional paradigm for measuring parallel processor performance. In this paper, we examine the relationship between Amdahl’s law =-=[1]-=- and two models of parallel performance [12]. We note that it can be much easier to achieve a high degree of parallelism than one might infer from Amdahl’s law. It is often stated that production scie... |

100 |
Flux corrected transport: Shasta, a fluid transport algorithm that works
- Boris, Book
- 1973
(Show Context)
Citation Context .... The solution of systems of hyperbolic equations often arises in simulations of fluid flow. One technique which has proved successful with hyperbolic fluid problems is Flux-Corrected Transport (FCT) =-=[2, 7]-=-. Such simulations model fluid behavior that is dominated either by large gradients or by strong shocks. The particular application here involves a nonconducting, compressible ideal gas under unstable... |

66 |
Data communications in hypercubes
- Saad, Schultz
- 1989
(Show Context)
Citation Context ...e performed in log 2 P time using a series of exchanges across the dimensions of the cube. For example, the accumulation of inner products is performed efficiently by means of bidirectional exchanges =-=[18]-=- of values along successive dimensions of the hypercube, interspersed with summation of the newly-acquired values (Fig. 5). This algorithm requires the optimal number of communication steps, log 2 P. ... |

11 |
Programming Concurrent Processors
- Fox
- 1989
(Show Context)
Citation Context ...essors such as the Goodyear MPP or Connection Machine. The suitability of parallel architectures, such as hypercubes [20], of up to 64 processors has been demonstrated on a wide range of applications =-=[5, 9, 10, 13, 14, 16]-=-. The focus here is on the 1024-processor environment, which is very unforgiving of old-fashioned serial programming habits. The large number of processors forces one to reexamine every sequential asp... |

7 |
Development and analysis of scientific application programs on a 1024-processor hypercube. SAND 88-0317. Sandia National Laboratories
- Gustafson, L, et al.
- 1988
(Show Context)
Citation Context ...performance, parallel computing, structural analysis, supercomputing, wave mechanics AMS(MOS) subject classifications. 65W05, 68M20, 68Q05, 68Q10 1. Introduction. We are currently engaged in research =-=[5]-=- to develop new mathematical methods, algorithms, and application programs for execution on massively parallel systems. In this paper, massive parallelism refers to general-purpose Multiple-Instructio... |

7 |
JAC – A Two-Dimensional Finite Element Computer Program for the Nonlinear Response
- Biffle
- 1984
(Show Context)
Citation Context ...ematical model and Preconditioned Conjugate Gradients (PCG) to solve the resulting large, sparse set of linear equations, Ax = b. These methods are used in the solid mechanics application program JAC =-=[6]-=-, a highly-vectorized serial production program used on the CRAY X-MP, as well as the new, highly 1024 256 64 16 4 3.99x Linear 63.3x Measured 15.9x 253x 1009xsparallel BEAM program. Jacobi (main diag... |

3 |
Amdahl's Law Re-Evaluated
- GUSTAFSON
- 1988
(Show Context)
Citation Context ...eads one to reexamine the traditional paradigm for measuring parallel processor performance. In this paper, we examine the relationship between Amdahl’s law [1] and two models of parallel performance =-=[12]-=-. We note that it can be much easier to achieve a high degree of parallelism than one might infer from Amdahl’s law. It is often stated that production scientific programs have a substantial (several ... |

3 |
Parallelizing conjugate gradient for the
- Seager
- 1986
(Show Context)
Citation Context ...on of three data items (§ 4.4): inner products z . b, z . Az, and p . Az used in PCG iteration and convergence test (step C11). Parallel PCG algorithms have been previously reported for the CRAY X-MP =-=[19]-=- and ELXSI 6400 [16]. Another investigation [3] recognized that the algorithm can be restructured to reduce memory and communication traffic, as well as synchronization. We find that, by precalculatin... |

2 |
A modified conjugate gradient solver for very large systems
- Barkai, Mortiarty, et al.
- 1985
(Show Context)
Citation Context ... . b, z . Az, and p . Az used in PCG iteration and convergence test (step C11). Parallel PCG algorithms have been previously reported for the CRAY X-MP [19] and ELXSI 6400 [16]. Another investigation =-=[3]-=- recognized that the algorithm can be restructured to reduce memory and communication traffic, as well as synchronization. We find that, by precalculating the quantity Az in place of Ap in the PCG ite... |

1 |
A two-dimensional flux-corrected transport solver for convectively dominated flows
- BAER, GROSS
- 1986
(Show Context)
Citation Context .... The solution of systems of hyperbolic equations often arises in simulations of fluid flow. One technique which has proved successful with hyperbolic fluid problems is Flux-Corrected Transport (FCT) =-=[2, 7]-=-. Such simulations model fluid behavior that is dominated either by large gradients or by strong shocks. The particular application here involves a nonconducting, compressible ideal gas under unstable... |

1 |
Private communication, Sandia National Laboratories
- BARSIS
- 1987
(Show Context)
Citation Context ...nd p’ to represent serial and parallel time spent on the parallel system, s’+p’ = 1, then a uniprocessor requires time s’+ p’P to perform the task. This reasoning gives an alternative to Amdahl’s law =-=[4]-=-, [12]: (2) Scaled speedup = (s’ + p’ P ) / (s’ + p’ ) = P + (1 P ) s’. 31x 24x 1024x SPEEDUP 0.01 0.02 0.03 0.04 SERIAL FRACTION, s' In contrast to the curve for (1), this function is simply a line, ... |

1 |
Finite difference solutions of the acoustic wave equation on a concurrent processor, Caltech publication HM–89
- CLAYTON
- 1985
(Show Context)
Citation Context ...computed explicitly from F and F old . Hence, only two timesteps need to be maintained in memory simultaneously (“leapfrog” method). There is ample literature regarding the convergence of this method =-=[8, 15]-=- as a function of c 2 , h, and Δt. For example, it is necessary (but not sufficient) that (Δt) 2 ≤ (h/c) 2 / 2 (CFL condition). We use constant c and (Δt) 2 = (h/c) 2 / 2 in our benchmark. 8 24 40 56 ... |

1 |
Parallel processing on an ELXSI
- MONTRY, BENNER
- 1987
(Show Context)
Citation Context ...essors such as the Goodyear MPP or Connection Machine. The suitability of parallel architectures, such as hypercubes [20], of up to 64 processors has been demonstrated on a wide range of applications =-=[5, 9, 10, 13, 14, 16]-=-. The focus here is on the 1024-processor environment, which is very unforgiving of old-fashioned serial programming habits. The large number of processors forces one to reexamine every sequential asp... |

1 |
Derivation of stiffness matrices for problems in plane elasticity by Galerkin’s method
- SZABO, LEE
- 1969
(Show Context)
Citation Context ...PROCESSORS FIG. 11. Fluid dynamics problem speedups. 7.2. Mathematical formulation. The differential equations of equilibrium in plane elasticity, which are used in the BEAM program, are presented in =-=[21]-=- with their finite element formulation. The equations can be summarized as (16a) αu xx + βv xy + G(u yy + v xy ) + F x = 0 (16b) βu xy + αv yy + G(u xy + v xx ) + F y = 0 519x SCALED SPEEDUP 1 1 4 16 ... |