## Performance Evaluation for Parallel Systems: A Survey (1997)

Citations: | 9 - 0 self |

### BibTeX

@TECHREPORT{Hu97performanceevaluation,

author = {Lei Hu and Ian Gorton},

title = {Performance Evaluation for Parallel Systems: A Survey},

institution = {},

year = {1997}

}

### OpenURL

### Abstract

Performance is often a key factor in determining the success of a parallel software system. Performance evaluation...

### Citations

1500 |
A k-means clustering algorithm
- Hartigan, Wong
(Show Context)
Citation Context ... the workload components can greatly be reduced. Everitt [Everitt74] described various clustering techniques. Also, various algorithms that can be applied to workload characterization can be found in =-=[Hartigan75]-=-. They fall into two classes: hierarchical and nonhierarchical. The minimum spanning tree method is one of the most widely used hierarchical algorithms, while the k-means method is one of the most wid... |

859 | Virtual Time
- Jefferson
- 1985
(Show Context)
Citation Context ...ut it also introduces several new problems, such as deadlock. Two different approaches are proposed. One is known as conservative approach [Chandy79], and the other optimistic (or Time Warp) approach =-=[Jefferson85]. Mi-=-sra’s survey paper [Misra86] describes the basic idea of PDES, and techniques of deadlock detection. Fujimoto’s paper [Fujimoto90] surveys existing approaches and analyzes the merits and drawbacks... |

731 |
Net Theory and the Modeling of Systems
- Peterson, Petri
- 1981
(Show Context)
Citation Context ...d this matter. Different firing rules are proposed to make the Petri net models suited for different modeling situations (we will discuss this later). For a formal definition, readers are referred to =-=[Peterson81]-=-. When Petri nets are used to model systems, the states of the system are defined by the markings. Transitions are used to represent events, and tokens in a place represent the condition that the corr... |

691 |
Parallel discrete event simulation
- Fujimoto
- 1990
(Show Context)
Citation Context ...ach [Chandy79], and the other optimistic (or Time Warp) approach [Jefferson85]. Misra’s survey paper [Misra86] describes the basic idea of PDES, and techniques of deadlock detection. Fujimoto’s pa=-=per [Fujimoto90]-=- surveys existing approaches and analyzes the merits and drawbacks of various techniques, and addresses a variety of amendments. These two papers are highly recommended to the interested readers, who ... |

643 |
The art of computer systems performance analysis: techniques for experimental design, measurement, simulation, and modeling
- Jain
- 1991
(Show Context)
Citation Context ...a. These criteria are called metrics. Different metrics may result in totally different performance values. Hence, selecting proper metrics to fairly evaluate the performance of a system is difficult =-=[Jain91]-=-. To know metrics, their relationships and their effects on performance parameters is the first step in performance studies. Next, selecting proper workload is almost equally important. A system is 3s... |

497 |
Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities
- Amdahl
- 1967
(Show Context)
Citation Context ...rtion. If the workload is normalized to 1, i.e., W + Wn= 1, then W1 = α , and Wn = 1 −α . The speedup is 1 10s1 n Sn ( ) = = α + ( 1 −α) n 1 + ( n −1) α This is known as Amdahl’s law. In =-=1967, Amdahl [Amdahl67] ma-=-de the observation that if α is the sequential fraction in an algorithm, then no matter how many processors are used, the speedup is upper bounded by 1/α. α is called sequential bottleneck. Two obs... |

471 |
Cluster Analysis
- Everitt
- 1974
(Show Context)
Citation Context ...ral representative components may be selected from each class, according to the size of the class (relative frequency). This way, the number of the workload components can greatly be reduced. Everitt =-=[Everitt74]-=- described various clustering techniques. Also, various algorithms that can be applied to workload characterization can be found in [Hartigan75]. They fall into two classes: hierarchical and nonhierar... |

442 | Design and implementation of the Sun Network Filesystem
- Goldberg, Kleiman, et al.
- 1985
(Show Context)
Citation Context ... to accurately reflect some real system issues, such as disk caches. Shein et al. [Shein89] proposed a synthetic program, named NFSStone, to measure the performance of SUN’s NFS (Network File System=-=) [Sandberg85]-=-. It uses a single client that issues various requests for file operations in order to stress and measure the performance of the server. However, using only one client is not sufficient to stress the ... |

376 |
Distributed simulation: A case study in design and verification of distributed programs
- Chandy, Misra
- 1978
(Show Context)
Citation Context ...me by partitioning the simulation among several processors, but it also introduces several new problems, such as deadlock. Two different approaches are proposed. One is known as conservative approach =-=[Chandy79], an-=-d the other optimistic (or Time Warp) approach [Jefferson85]. Misra’s survey paper [Misra86] describes the basic idea of PDES, and techniques of deadlock detection. Fujimoto’s paper [Fujimoto90] s... |

328 | Performance of various computers using standard linear equations software,”Report CS-89-85
- Dongarra
(Show Context)
Citation Context ...in91]. The results are measured in Kdhrystones/s (Dhrystone Instructions Per Second). The disadvantages found in Whetstone also apply to this benchmark. The LINPACK benchmark was designed by Dongarra =-=[Dongarra83]-=-. It contains a number of programs that solve dense systems of linear equations using the LINPACK subroutine package. LINPACK is a general-purpose FORTRAN library of mathematical software for solving ... |

270 |
mixed networks of queues with different classes of customers,”Journal of
- Baskett, Chandy, et al.
(Show Context)
Citation Context ...ks are easier to analyze. Much work has been done to extend the product form solutions in order to use them in various networks. Some classical papers on this subject are [Jackson64], [Gordon67], and =-=[Baskett75]-=-. 6.1.3 Fundamental Operational Laws Buzen and Denning [Buzen76; Denning78] describe some influential laws, known as operational laws. These are general laws and can be applied to any networks, becaus... |

258 | A Class of Generalized Stochastic Petri Nets for the Performance Evaluation of Multiprocessor Systems”, retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1 .90.8002 on 17/5/2013
- Marsan, Balbo, et al.
(Show Context)
Citation Context ...bove, other proposals exist to associate time features with tokens (tokentimed PNs) or even arcs (arc-timed PNs) [Ferscha95]. The stochastic Petri net (SPN), independently proposed by several authors =-=[Ajmone-Marsan84]-=-, is another class of Petri nets. In this model, a random firing time is associated with each transition. SPNs play an important role in performance studies, because they are very powerful modeling to... |

242 | Distributed Discrete-Event Simulation
- Misra
- 1986
(Show Context)
Citation Context ...oblems, such as deadlock. Two different approaches are proposed. One is known as conservative approach [Chandy79], and the other optimistic (or Time Warp) approach [Jefferson85]. Misra’s survey pape=-=r [Misra86] d-=-escribes the basic idea of PDES, and techniques of deadlock detection. Fujimoto’s paper [Fujimoto90] surveys existing approaches and analyzes the merits and drawbacks of various techniques, and addr... |

241 |
Modeling and verification of time dependent systems using time petri nets
- Berthomieu, Diaz
- 1991
(Show Context)
Citation Context ... general than thesuntimed PN. If we set the Min times of all transitions to zero, and the Max times to ∞, then these two models are equivalent. This approach has also been used by Berthomieu and Dia=-=z [Berthomieu91]-=-. They use an enumerative method to exhaustively validate the behavior of the model. Razouk [Razouk83] used firing times along with enabling times. A transition fires after the enabling time has elaps... |

231 | Re-evaluating Amdahl's Law
- Gustafson
- 1988
(Show Context)
Citation Context .... The speedup becomes Wi′+ Wn′ W1+ nWn S′ ( n) = = Wi + Wn W1+ Wn Let α = W1 and 1− α = Wn , we rewrite S′ ( n) as S′ ( n) = n+ ( 1−n) α This fixed-time speedup is known as Gustafson�=-=��s scaled speedup [Gustafson88]. To k-=-eep the turnaround time unchanged, we have Wn′ = nWn . This means that the parallel part scales up linearly with the system size. Hence, Gustafson’s law supports scaled performance. In Amdahl’s ... |

225 | PROTEUS: A High-Performance Parallel-Architecture Simulator - Brewer, Dellarocas, et al. - 1991 |

216 |
value analysis of closed multichain queueing networks
- Reiser, Lavenberg
(Show Context)
Citation Context ... no need for them to wait in the queue (infinite servers available). Mean value analysis (MVA) is developed to analyze closed queueing networks. The original paper was written by Reiser and Lavenberg =-=[Reiser80]-=-. It gives the mean performance in a similar way described above. The main idea is based on a recursive algorithm. Given a system with N customers, its performance is computed using the performance fo... |

192 |
Performance analysis using stochastic Petri nets
- Molloy
- 1982
(Show Context)
Citation Context ... be regarded as a semi-Markov process. Dugan’s ESPNs (Extended Stochastic Petri Nets) [Dugan84] uses additional features, such as immediate transitions, inhibitor arcs, and probabilistic arcs. Mollo=-=y [Molloy82]-=- uses exponentially distributed firing times associated with transitions. The author shows that the Petri nets so extended are isomorphic to continuous time Markov chains (MCs), thus can be solved as ... |

181 |
Some Computer Organizations and Their Effectiveness
- Flynn
- 1972
(Show Context)
Citation Context ...chitectures and exploit parallelism in order to meet the demands of high speed computing. Different systems have different architectures. Based on notions of instruction stream and data stream, Flynn =-=[Flynn72]-=- classified various computer architectures into four categories: SISD, SIMD, MIMD, and MISD. Of the four machine models, MIMD is most widely used for constructing machines for generalpurpose computati... |

144 |
Concurrent programming: principles and practice
- Andrews
- 1991
(Show Context)
Citation Context ... the modeling technique must have the ability to describe synchronization between parallel processes. Two forms of synchronization can be distinguished: mutual exclusion and condition synchronization =-=[Andrews91a]-=-. Product form queueing networks can describe mutual exclusion but cannot express condition synchronization [Jonkers94]. Because of this, other techniques must be used together with queueing models. U... |

144 | Modern factor analysis - Harman - 1967 |

138 |
Speedup versus efficiency in parallel systems
- Eager, Zahorian, et al.
- 1989
(Show Context)
Citation Context ...speedup and efficiency. It is defined as the average number of processors that are busy during the execution time of the software system in question, given an unbounded number of available processors =-=[Eager89]. According to this definition-=-, the average parallelism can be written as m m A= ⎛ ⎜ ∑i⋅t ⎞ i⎟ ti ⎝ ⎠ ∑ ⎛ ⎜ ⎞ ⎟ ⎝ ⎠ i= 1 i= 1 Eager et al. [Eager89] carefully examine the average parallelism and inves... |

137 |
A proof for the queueing formula
- Little
- 1993
(Show Context)
Citation Context ...0 = ρ. That is why ρ is sometimes called utilization by some authors (e.g., [Robertazzi94]). The mean number of customers in the system is given by ∞ ρ N = E[] n = ∑ n= 1npn= 1 − ρ Using Lit=-=tle’s law [Little61], N = -=-λT, the mean response time, T, is computed by T= N 1 = λ µ − λ Other statistics include the mean length of the queue, the mean waiting time, the idle time of the server, the busy period, and so ... |

136 |
On Petri Nets with Deterministic and Exponentially Distributed Firing Times. Advances in Petri Nets, vol 266
- Marsan
- 1987
(Show Context)
Citation Context ...alent to a continuous-time stochastic process. An example of using GSPNs in parallel program performance study can be found in [Balbo92a]. DSPNs (Deterministic and Stochastic Petri Nets) described in =-=[Ajmone-Marsan87]-=- are another class of SPNs, which allow both deterministic and random firing times, and thus are useful for many practical situations. The analysis tool has also been published [Lindemann94]. An appli... |

134 |
An empirical study of FORTRAN programs
- Knuth
- 1974
(Show Context)
Citation Context ...get high efficiency, the problems are difficult to tackle properly with instruction mixes. Because of this, this technique becomes obsolete. Some early papers are available to the interested readers: =-=[Knuth71], [I-=-SE79], and [Febish81]. Synthetic Programs Synthetic programs are also called synthetic benchmarks which are programs designed to simulate real workload. They do no “useful” work, but consume amoun... |

120 | Performance Tradeoffs in Multithreaded Processors
- Agarwal
- 1992
(Show Context)
Citation Context ... This phenomenon, or performance loss, is caused by so-called parallel overhead. Rayfield and Silverman attributed this to interprocessor communication and the sequential part of computation. Agarwal =-=[Agarwal92]-=- gave two reasons for the decreasing processor utilization. First, the cost of each memory access increases because network delays increase with system size. Second, as we strive for greater speedups ... |

119 |
Queuing Systems, Vol. I: Theory
- Kleinrock
- 1975
(Show Context)
Citation Context ...from the measured data. In the test workload we can simulate the probabilities with the help of a random number generator. Detailed introductions to Markov models can be found in many books (See e.g. =-=[Kleinrock75]-=-). The descriptions of the technique used in workload characterization are given in more detail in [Ferrari83; Jain91]. And an application example is presented in [Agrawala78]. 4.2.4 Parallel Workload... |

119 |
Analysis of Asynchronous Concurrent Systems by Timed Petri Nets
- Ramchandani
- 1974
(Show Context)
Citation Context .... One of the first efforts in this direction is represented by E-nets [Noe73], which associates with each transition a fixed time to specify the delay between the enabling and the firing. Ramchandani =-=[Ramchandani74]-=- associated a firing finite duration with each transition of the net. The firing rule of standard PNs is modified to account for the time it takes to fire a transition. Another modification is that a ... |

118 |
Computer Performance Modeling Handbook
- Lavenberg, editor
- 1983
(Show Context)
Citation Context ...his is because whether we use simulation or analytical modeling, we must first construct a model for the system under study. simulation is considered to be one powerful method to solve complex models =-=[Lavenberg83]-=-. This classification is based on the fact that measurement must be done on a real system, i.e., the system to be evaluated must exist and be available, while modeling does not require that. For the c... |

110 | R.: Development of parallel methods for a 1024-processor hypercube - Gustafson, Montry, et al. - 1988 |

109 | Paradigms for process interaction in distributed programs
- Andrews
- 1991
(Show Context)
Citation Context ...ever, there exist several other problems that must be considered in designing tools that monitor parallel programs. There are many parallel programming models available on computing systems (see e.g. =-=[Andrews91]-=-), so tools should be highly flexible. In tuning applications system effects must be distinguished from application bottlenecks [Ries93]. The primary motivation of building multiprocessor systems is t... |

102 |
Characterizations of parallelism in applications and their use in scheduling
- Sevcik
- 1989
(Show Context)
Citation Context ...orkload. For the convenience of analysis, the parallelism profile can be rearranged by accumulating the time spent at each degree of parallelism. The resulting cumulative plot is referred to as shape =-=[Sevcik89]-=-. The following figure shows the shape of the parallelism profile shown in Figure 1 (section 3.1). 8 7 6 5 4 3 2 1 0 Processor Shape 0 0.2 0.4 0.6 0.8 1 Time (%) Figure 2. The shape derived from paral... |

101 | Recoverability of communication protocols - implication of a theoretical study - Merlin, Farber - 1976 |

99 |
Computational algorithms for closed queueing networks with exponential servers
- Buzen
- 1973
(Show Context)
Citation Context ... 1 2 yM ) n or M ni GN ( ) = ∏ yi The problem with this formula is that to compute G(N) we must enumerate all the permutations of possible states. The convolution algorithm is thus developed by Buze=-=n [Buzen73] to solve -=-this problem. The idea is based on the following fact. If k ni gnk ( , ) = ∏ yi 33 ∑ n i=1 ∑ n i=1 then we can rewrite it as a recursive form: gnk ( , ) = gnk ( , − 1) + ygn ( −1, k) kswhere... |

98 |
Designing efficient algorithms for parallel computers
- Quinn
- 1986
(Show Context)
Citation Context ...e parallel algorithm for solving the 0/1 knapsack problem given in [Lee87] is highly scalable for its isoefficiency function is ON ( log N) , while a frequently used parallel formulation of quicksort =-=[Quinn87]-=- has an exponential isoefficiency function, thus it is poorly scalable. [Gupta91] presents scalability analysis of four different algorithms, which shows isoefficiency function is a proper metric for ... |

97 | A Users' Guide to PICL: A Portable Instrumented Communication Library - Geist, Heath, et al. - 1992 |

90 | Analyzing Scalability of Parallel Algorithms and Architectures
- Kumar, Gupta
- 1991
(Show Context)
Citation Context ...fferent conclusions. An architecture may be scalable for one algorithm, but may not at all for another [Hwang93]. In the literature, lots of work on scalability analysis can be found. Kumar and Gupta =-=[Kumar94] s-=-ummarize the following situations where scalability analysis is found to be very useful. • Selecting the best algorithm-architecture combination for a problem under different constraints on the grow... |

85 | A study of the recoverability of computing systems - Merlin - 1974 |

85 | Measuring the parallelism available for very long instruction word architecture - Nicolau, Fisher - 1984 |

82 |
Predicting the Performance of Parallel Computations
- Mak, Lundstrom
- 1990
(Show Context)
Citation Context ...allelism that can be obtained during the execution; and the problem size is a measure of the size of the data. Task graphs for representing workload are widely used in the literature (see for example =-=[Mak90]-=-). [Calzarossa93] gives two examples to illustrate a workload characterization process, where two typical parallel algorithms are analyzed. One is the block decomposition matrix multiplication from [F... |

77 |
Quartz: A tool for tuning parallel program performance
- Anderson, Lazowska
- 1990
(Show Context)
Citation Context ...rmance. Obviously, good performance can easily be achieved when multiple independent sequential job streams are run. It is, however, quite difficult to get good performance from parallel applications =-=[Anderson89]-=-. In the discussions that follow, we assume that the entire computer is devoted to a single parallel application. In general, as more processors are added to a parallel processing system or used by a ... |

77 |
Computer systems performance evaluation
- Ferrari
- 1978
(Show Context)
Citation Context ...oad. In modeling workload, representativeness is one of the main characteristics that should be considered. The representativeness of a workload model can be determined by an equivalence relationship =-=[Ferrari78].-=- Given a system S, a set of performance indices L can be obtained from a certain workload W. Formally, a function ƒs can be defined for S, so that L= fs( W) . If Wr is a real workload and Lr is the c... |

76 |
Reasoning about parallel architectures
- Collier
- 1992
(Show Context)
Citation Context ... have no or little effect on it. Once a model has been completed, it can be solved. Quite often the model needs to be validated and modified if the result is too far away from the real world. Collier =-=[Collier92]-=- summarized the following four steps the modeling process must follow. (1) Decide what answers are sought. (2) Reduce the complexity of the real world to its essence, eliminating the irrelevant and ap... |

75 | On the scalability of FFT on parallel computers
- Gupta, Kumar
- 1990
(Show Context)
Citation Context ...s isoefficiency function is a proper metric for scalability analysis. For a detailed description of the isoefficiency concept and its applications, the following papers are recommended: [Grama93] and =-=[Gupta93a]-=-. 3.4 Scalability Analysis Scalability analysis plays an important role in performance evaluation of large parallel systems. On large 13ssystems, one key issue is how to effectively utilize the proces... |

73 | Workload characterization: A survey
- Calzarossa, Serazzi
- 1993
(Show Context)
Citation Context ...sses the design issues and a wide variety of techniques used by existing monitors. In addition, a new program execution monitor is presented, and the design and use of this new monitor are discussed. =-=[Calzarossa93]-=- discusses several methodologies and techniques for constructing workload models in terms of different system architectures, such as centralized systems, network-based systems and multiprocessor syste... |

70 |
Wijsho . Performance measurement intrusion and perturbation analysis
- Malony, Reed, et al.
- 1992
(Show Context)
Citation Context ...mance overheads and may also affect the system’s dynamic behavior [Jelly94]. This is called instrumentation perturbation. Different kinds of perturbations, direct or indirect, are described by Malon=-=y [Malony92]-=-. Besides, the accuracy of measurement also depends on several other things, such as time of the measurement and the workload used. Thus, measurement techniques may offer very accurate data or very in... |

68 | PICL: A portable instrumented communications library, C reference manual - Geist, Heath, et al. - 1990 |

67 | The operational analysis of queueing network models - DENNING, BUZEN - 1978 |

67 | Approximate analysis of multiclass closed networks of queues - Schweitzer - 1979 |

64 |
Measuring Parallel Processor Performance
- Karp, Flatt
- 1990
(Show Context)
Citation Context ...finition is useful in comparing different architectures for a given algorithm. It is, however, useless in comparing different algorithm-architecture pairs for solving the same problem. Karp and Flatt =-=[Karp90] use -=-serial fraction ƒ as a metric for measuring the performance of a parallel system on a fix-sized problem. They define 1 S − 1 n f = 1 − 1 n where S is the speedup, and n is the number of processor... |