Results 1 
6 of
6
Reevaluating Amdahl's Law
 Communications of the ACM
, 1988
"... At Sandia National Laboratories, we are currently engaged in research involving massivelyparallel processing. There is considerable skepticism regarding the viability of massive parallelism; the skepticism centers around Amdahl’s law, an argument put forth by Gene Amdahl in 1967 [1] that even when ..."
Abstract

Cited by 229 (4 self)
 Add to MetaCart
At Sandia National Laboratories, we are currently engaged in research involving massivelyparallel processing. There is considerable skepticism regarding the viability of massive parallelism; the skepticism centers around Amdahl’s law, an argument put forth by Gene Amdahl in 1967 [1] that even when the fraction of serial work in a given problem is small, say s, the maximum speedup obtainable from even an infinite number of parallel processors is only 1/s. We now have timing results for a 1024processor system that demonstrate that the assumptions underlying Amdahl’s 1967 argument are inappropriate for the current approach to massive ensemble parallelism. If N is the number of processors, s is the amount of time spent (by a serial processor) on serial parts of a program and p is the amount of time spent (by a serial processor) on parts of the program that can be done in parallel, then Amdahl’s law says that speedup is given by Speedup = (s + p) ⁄ ( s + p ⁄ N) = 1 ⁄ ( s + p ⁄ N), where we have set total time s␣+␣p ␣=␣1 for algebraic simplicity. For N = 1024, this is an unforgivingly steep function of s near s = 0 (see Figure 1). The steepness of the graph near s = 0 (approximately – N 2) implies that very few problems will experience even a 100fold speedup. Yet for three very practical applications (s = 0.4 – 0.8 percent) used at Sandia, we have achieved the speedup factors on a 1024processor hypercube which we believe are unprecedented [2]: 1021 for beam stress analysis using conjugate gradients, 1020 for baffled surface wave simulation using explicit finite differences, and 1016 for unstable fluid flow using fluxcorrected transport. How can this be, when Amdahl’s argument would predict otherwise?
Development of Parallel Methods for a 1024Processor Hypercube
 SIAM Journal on Scientific and Statistical Computing
, 1988
"... paper. JLG 1995) ..."
Experience with Automatic, Dynamic Load Balancing and Adaptive Finite Element Computation
, 1994
"... We describe a finegrained, elementbased data migration system that dynamically maintains global load balance on massively parallel MIMD computers, and is effective in the presence of changing work loads. Global load balance is achieved by overlapping neighborhoods of processors, where each neighbo ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
We describe a finegrained, elementbased data migration system that dynamically maintains global load balance on massively parallel MIMD computers, and is effective in the presence of changing work loads. Global load balance is achieved by overlapping neighborhoods of processors, where each neighborhood performs local load balancing. The method supports a large class of finite element and finite difference based applications and provides an automatic element management system to which applications are easily integrated. We test the system's effectiveness with an adaptive order ( ) refinement Discontinuous Galerkin finite element method for the solution of hyperbolic conservation laws on a 1024processor nCUBE2. The results show the significant reduction in execution time synergistically obtained by combining the automatic data migration system and the adaptive finite element method.
Performance Evaluation for Parallel Systems: A Survey
, 1997
"... Performance is often a key factor in determining the success of a parallel software system. Performance evaluation... ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Performance is often a key factor in determining the success of a parallel software system. Performance evaluation...
Toward The Design Of LargeScale, SharedMemory Multiprocessors
 Dept. of Comput. Sci., Univ. of WisconsinMadison
, 1992
"... The stateoftheart in multiprocessing today employs thousands of highperformance microprocessors. As system sizes continue to grow, increasing care must be taken to design costefficient, balanced (i.e. scalable) systems. This thesis addresses the scalability of sharedmemory multiprocessors, pres ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
The stateoftheart in multiprocessing today employs thousands of highperformance microprocessors. As system sizes continue to grow, increasing care must be taken to design costefficient, balanced (i.e. scalable) systems. This thesis addresses the scalability of sharedmemory multiprocessors, presenting a practical treatment of scalability, and proceeding to focus on aspects of two critical areas of largescale system design: interconnection networks and cache coherence mechanisms. In these areas, pipelinedchannel interconnection networks and pruningcache directories are investigated, respectively. Pipelinedchannel interconnection networks allow multiple bits to be simultaneously in flight on a single wire, decoupling channel throughput from channel latency. The first published performance analysis of the SCI ring, a new IEEE standard employing pipelined channels, is presented. This study serves as a proofofconcept for pipelinedchannel networks, demonstrating their very high p...
Technical Note REEVALUATING AMDAHL’S LAW
"... in research involving massively parallel processing. There is considerable skepticism regarding the viability of massive parallelism; the skepticism centers around Amdahl’s law, an argument put forth by Gene Amda.hl in 1967 [l] that even when the fraction of serial work in a given problem is small, ..."
Abstract
 Add to MetaCart
in research involving massively parallel processing. There is considerable skepticism regarding the viability of massive parallelism; the skepticism centers around Amdahl’s law, an argument put forth by Gene Amda.hl in 1967 [l] that even when the fraction of serial work in a given problem is small, say, s, the maximum speedup obtainable from even an infinite number of parallel processors is only l/s. We now have timing results for a 1024processor system that demonstrate that the assumptions underlying Amdahl’s 1967 argument are inappropriate for the current approach to massive ensemble parallelism. If N is the number of processors, s is the amount of time spent (by a serial processor) on serial parts of a program, and p is the amount of time spent (by a serial processor) on parts of the program that can be done in parallel, then Amdahl’s law says that speedup is given by Speedup = (s + p)/(s + p/N) = l/b + p/N), where we have set total time s + p = 1 for algebraic simphcity. For N = 1024 this is an unforgivingly steep function of s near s = 0 (see Figure 1). The steepness of the graph near s = 0 (approximatelyN’) implies that very few problems will experience even a loofold speedup. Yet, for three very practical applications (s = 0.40.8 percent) used at Sandia, we have achieved speedup factors on a 1024processor hypercube that we believe are unprecedented [2]: 2022 for beam stress analysis using conjugate gradients, 1020 for baffled surface wave simulation using explicit finite dif0 1988 ACM OOOI0782/88/05000532 $1.50 ferences, and 1016 for unstable fluid flow using fluxcorrected transport. How can this be, when Amdahl’s argument would predict otherwise? The expression and graph both contain the implicit assumption that p is independent of N, which is virtually never the case. One does not take a fixedsized problem and run it on various numbers of p:rocessors