Results 1 
7 of
7
Reevaluating Amdahl’s law
 Commun. ACM
, 1988
"... At Sandia National Laboratories, we are currently engaged in research involving massively parallel processing. There is considerable skepticism regarding the viability of massive parallelism; the skepticism centers around Amdahl’s law, an argument put forth by Gene Amda.hl in 1967 [l] that even w ..."
Abstract

Cited by 290 (4 self)
 Add to MetaCart
(Show Context)
At Sandia National Laboratories, we are currently engaged in research involving massively parallel processing. There is considerable skepticism regarding the viability of massive parallelism; the skepticism centers around Amdahl’s law, an argument put forth by Gene Amda.hl in 1967 [l] that even when the fraction of serial work in a given problem is small, say, s, the maximum speedup obtainable from even an infinite number of parallel processors is only l/s. We now have timing results for a 1024processor system that demonstrate that the assumptions underlying Amdahl’s 1967 argument are inappropriate for the current approach to massive ensemble parallelism. If N is the number of processors, s is the amount of time spent (by a serial processor) on serial parts of a program, and p is the amount of time spent (by a serial processor) on parts of the program that can be done in parallel, then Amdahl’s law says that speedup is given by Speedup = (s + p)/(s + p/N) = l/b + p/N), where we have set total time s + p = 1 for algebraic simphcity. For N = 1024 this is an unforgivingly steep function of s near s = 0 (see Figure 1). The steepness of the graph near s = 0 (approximatelyN’) implies that very few problems will experience even a loofold speedup. Yet, for three very practical applications (s = 0.40.8 percent) used at Sandia, we have achieved speedup factors on a 1024processor hypercube that we believe are unprecedented [2]: 2022 for beam stress analysis using conjugate gradients, 1020 for baffled surface wave simulation using explicit finite dif0 1988 ACM OOOI0782/88/05000532 $1.50 ferences, and 1016 for unstable fluid flow using fluxcorrected transport. How can this be, when Amdahl’s argument would predict otherwise? The expression and graph both contain the implicit assumption that p is independent of N, which is virtually never the case. One does not take a fixedsized problem and run it on various numbers of p:rocessors
Development of Parallel Methods for a 1024Processor Hypercube
 SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING
, 1988
"... ..."
(Show Context)
Experience with Automatic, Dynamic Load Balancing and Adaptive Finite Element Computation
, 1994
"... We describe a finegrained, elementbased data migration system that dynamically maintains global load balance on massively parallel MIMD computers, and is effective in the presence of changing work loads. Global load balance is achieved by overlapping neighborhoods of processors, where each neighbo ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
We describe a finegrained, elementbased data migration system that dynamically maintains global load balance on massively parallel MIMD computers, and is effective in the presence of changing work loads. Global load balance is achieved by overlapping neighborhoods of processors, where each neighborhood performs local load balancing. The method supports a large class of finite element and finite difference based applications and provides an automatic element management system to which applications are easily integrated. We test the system's effectiveness with an adaptive order ( ) refinement Discontinuous Galerkin finite element method for the solution of hyperbolic conservation laws on a 1024processor nCUBE2. The results show the significant reduction in execution time synergistically obtained by combining the automatic data migration system and the adaptive finite element method.
Performance Evaluation for Parallel Systems: A Survey
, 1997
"... Performance is often a key factor in determining the success of a parallel software system. Performance evaluation... ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Performance is often a key factor in determining the success of a parallel software system. Performance evaluation...
Toward The Design Of LargeScale, SharedMemory Multiprocessors
 Dept. of Comput. Sci., Univ. of WisconsinMadison
, 1992
"... The stateoftheart in multiprocessing today employs thousands of highperformance microprocessors. As system sizes continue to grow, increasing care must be taken to design costefficient, balanced (i.e. scalable) systems. This thesis addresses the scalability of sharedmemory multiprocessors, pres ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
The stateoftheart in multiprocessing today employs thousands of highperformance microprocessors. As system sizes continue to grow, increasing care must be taken to design costefficient, balanced (i.e. scalable) systems. This thesis addresses the scalability of sharedmemory multiprocessors, presenting a practical treatment of scalability, and proceeding to focus on aspects of two critical areas of largescale system design: interconnection networks and cache coherence mechanisms. In these areas, pipelinedchannel interconnection networks and pruningcache directories are investigated, respectively. Pipelinedchannel interconnection networks allow multiple bits to be simultaneously in flight on a single wire, decoupling channel throughput from channel latency. The first published performance analysis of the SCI ring, a new IEEE standard employing pipelined channels, is presented. This study serves as a proofofconcept for pipelinedchannel networks, demonstrating their very high p...
VIDEO CONTENT ANALYSIS FOR AUTOMATED DETECTION AND TRACKING OF HUMANS IN CCTV SURVEILLANCE APPLICATIONS
"... The problems of achieving high detection rate with low false alarm rate for human detection and tracking in video sequence, performance scalability, and improving response time are addressed in this thesis. The underlying causes are the effect of scene complexity, humantohuman interactions, scale ..."
Abstract
 Add to MetaCart
The problems of achieving high detection rate with low false alarm rate for human detection and tracking in video sequence, performance scalability, and improving response time are addressed in this thesis. The underlying causes are the effect of scene complexity, humantohuman interactions, scale changes, and scene backgroundhuman interactions. A twostage processing solution, namely, human detection, and human tracking with two novel pattern classifiers is presented. Scale independent human detection is achieved by processing in the wavelet domain using square wavelet features. These features used to characterise human silhouettes at different scales are similar to rectangular features used in [Viola 2001]. At the detection stage two detectors are combined to improve detection rate. The first detector is based on shapeoutline of humans extracted from the scene using a reduced complexity outline extraction algorithm. A Shape mismatch measure is used to differentiate between the human and the background class. The second detector uses rectangular features as primitives for silhouette description in the wavelet domain. The marginal distribution of features collocated at a particular position on a candidate human (a patch of the image) is used to
Technical Note REEVALUATING AMDAHL’S LAW
"... in research involving massively parallel processing. There is considerable skepticism regarding the viability of massive parallelism; the skepticism centers around Amdahl’s law, an argument put forth by Gene Amda.hl in 1967 [l] that even when the fraction of serial work in a given problem is small, ..."
Abstract
 Add to MetaCart
in research involving massively parallel processing. There is considerable skepticism regarding the viability of massive parallelism; the skepticism centers around Amdahl’s law, an argument put forth by Gene Amda.hl in 1967 [l] that even when the fraction of serial work in a given problem is small, say, s, the maximum speedup obtainable from even an infinite number of parallel processors is only l/s. We now have timing results for a 1024processor system that demonstrate that the assumptions underlying Amdahl’s 1967 argument are inappropriate for the current approach to massive ensemble parallelism. If N is the number of processors, s is the amount of time spent (by a serial processor) on serial parts of a program, and p is the amount of time spent (by a serial processor) on parts of the program that can be done in parallel, then Amdahl’s law says that speedup is given by Speedup = (s + p)/(s + p/N) = l/b + p/N), where we have set total time s + p = 1 for algebraic simphcity. For N = 1024 this is an unforgivingly steep function of s near s = 0 (see Figure 1). The steepness of the graph near s = 0 (approximatelyN’) implies that very few problems will experience even a loofold speedup. Yet, for three very practical applications (s = 0.40.8 percent) used at Sandia, we have achieved speedup factors on a 1024processor hypercube that we believe are unprecedented [2]: 2022 for beam stress analysis using conjugate gradients, 1020 for baffled surface wave simulation using explicit finite dif0 1988 ACM OOOI0782/88/05000532 $1.50 ferences, and 1016 for unstable fluid flow using fluxcorrected transport. How can this be, when Amdahl’s argument would predict otherwise? The expression and graph both contain the implicit assumption that p is independent of N, which is virtually never the case. One does not take a fixedsized problem and run it on various numbers of p:rocessors