## The statistical properties of host load (extended version (1999)

### Cached

### Download Links

- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]

Venue: | School of Computer Science, Carnegie Mellon University |

Citations: | 1 - 1 self |

### BibTeX

@TECHREPORT{Dinda99thestatistical,

author = {Peter A. Dinda},

title = {The statistical properties of host load (extended version},

institution = {School of Computer Science, Carnegie Mellon University},

year = {1999}

}

### OpenURL

### Abstract

Understanding how host load changes over time is instrumental in predicting the execution time of tasks or jobs, such as in dynamic load balancing and distributed soft real-time systems. To improve this understanding, we collected week-long, 1 Hz resolution traces of the Digital Unix 5 second exponential load average on over 35 different machines including production and research cluster machines, compute servers, and desktop workstations. Separate sets of traces were collected at two different times of the year. The traces capture all of the dynamic load information available to userlevel programs on these machines. We present a detailed statistical analysis of these traces here, including summary statistics, distributions, and time series analysis results. Two significant new results are that load is self-similar and that it displays epochal behavior. All of the traces exhibit a high degree of self-similarity with Hurst parameters ranging from 0.73 to 0.99, strongly biased toward the top of that range. The traces also display epochal behavior in that the local frequency content of the load signal remains quite stable for long periods of time (150-450 seconds mean) and changes abruptly at epoch boundaries. Despite these complex behaviors, we have found that relatively simple linear models are sufficient for short-range host load prediction.

### Citations

6080 |
A mathematical theory of communication
- Shannon
- 1948
(Show Context)
Citation Context ...Figure 8(b)) is due to the fact that the underlying quantity being measured (ready queue length) is discrete. Load typically takes on 600-3000 unique values in these traces. Shannon’s entropy measure =-=[25]-=- indicates that the load traces can be encoded in 1.4 to 8.5 bits per value, depending on the trace. These observations and the histograms suggest that load spends most of its time in one of a small n... |

1765 | On the self-similar nature of Ethernet traffic
- Leland, Taqqu, et al.
- 1993
(Show Context)
Citation Context ...processes. These stochastic processes model the sort of the mechanisms that give rise to self-similar signals. We shall avoid a mathematical treatment here, but interested readers may want to consult =-=[19]-=- or [20] for a treatment in the context of networking or [2] for its connection to fractal geometry, or [3] for a treatment from a linear time series point of view. Interestingly, self-similarity has ... |

874 |
The Art Of Computer Systems Performance Analysis
- Jain
- 1991
(Show Context)
Citation Context ...ams share very few common characteristics and did not conform well to the analytic distributions we fit to them. Quantile-quantile plots are a powerful way to assess how a distribution fits data (cf. =-=[14]-=-, pp. 196–200.) The quantiles (the quantile of a pdf (or histogram) is the value x at which 100 % of the probability (or data) falls to the left of x) of the data set are plotted against the quantiles... |

599 | Self-similarity through high-variability: Statistical analysis of Ethernet LAN tra c at the source level
- Willinger, Taqqu, et al.
- 1997
(Show Context)
Citation Context ...or [2] for its connection to fractal geometry, or [3] for a treatment from a linear time series point of view. Interestingly, self-similarity has revolutionized network traffic modelling in the 1990s =-=[9, 19, 20, 28]-=-. The degree and nature of the self-similarity of a sequence is summarized by the Hurst parameter, H [13]. Intuitively, H describes the relative contribution of low and high frequency components to th... |

315 | Exploiting process lifetime distributions for dynamic load balancing
- Harchol-Balter, Downey
- 1997
(Show Context)
Citation Context ...ere has been little work on characterizing the properties of load at fine resolutions. The available studies concentrate on understanding functions of load, such as availability [21] or job durations =-=[8, 18, 11]-=-. Furthermore, they deal with the coarse grain behavior of load — how it changes over minutes, hours and days. This paper is a first step to a better understanding the properties of load on real syste... |

305 |
and Richard A Davis. Introduction to Time Series and Forecasting
- Brockwell
- 2002
(Show Context)
Citation Context ..., 1997 traces, (b) March, 1998 traces. Lack of seasonality: It is important to note that the epochal behavior of the load traces is not the same thing as seasonality in the time series analysis sense =-=[5, 4]-=-. Seasonality means that there are dominant (or at least visible) underlying periodic signals on top of which are layered other signals. It is not unreasonable to expect seasonality given that other s... |

290 |
An introduction to long-memory time series models and fractional di¤erencing,”Journal of Time Series Analysis
- Granger, Joyeux
- 1980
(Show Context)
Citation Context ...d conditions may be preferable to waiting for the adversity to be ameliorated over the long term. The self-similarity result also suggests certain modeling approaches, such as fractional ARIMA models =-=[12, 10, 3]-=- which can capture this property. (6) The traces display epochal behavior. The local frequency content of the load signal remains quite stable for long periods of time (150-450 seconds mean) and chang... |

213 |
Long Term Storage Capacities of Reservoirs
- Hurst
- 1951
(Show Context)
Citation Context ...estingly, self-similarity has revolutionized network traffic modelling in the 1990s [9, 19, 20, 28]. The degree and nature of the self-similarity of a sequence is summarized by the Hurst parameter, H =-=[13]-=-. Intuitively, H describes the relative contribution of low and high frequency components to the signal. Consider Figure 12(a), which plots the periodogram (the magnitude of the 14s1.2 1 0.8 0.6 0.4 0... |

209 |
M.: Fractional differencing
- Hosking
- 1981
(Show Context)
Citation Context ...d conditions may be preferable to waiting for the adversity to be ameliorated over the long term. The self-similarity result also suggests certain modeling approaches, such as fractional ARIMA models =-=[12, 10, 3]-=- which can capture this property. (6) The traces display epochal behavior. The local frequency content of the load signal remains quite stable for long periods of time (150-450 seconds mean) and chang... |

136 |
A Time Driven Scheduling Model for Real-Time Operating Systems
- Jensen, Locke, et al.
- 1985
(Show Context)
Citation Context ...arises in a number of important contexts, such as dynamically load-balancing the tasks in a parallel program [24, 1, 26], and scheduling tasks to meet deadlines in a distributed soft real-time system =-=[15, 22, 23, 17]-=-. Host load has a significant effect on running time. Indeed, the running time of a compute bound task is directly related to the average load it encounters during execution. Determining a good mappin... |

120 |
Load-balancing heuristics and process behavior
- Leland, Ott
- 1986
(Show Context)
Citation Context ...ere has been little work on characterizing the properties of load at fine resolutions. The available studies concentrate on understanding functions of load, such as availability [21] or job durations =-=[8, 18, 11]-=-. Furthermore, they deal with the coarse grain behavior of load — how it changes over minutes, hours and days. This paper is a first step to a better understanding the properties of load on real syste... |

103 | The limited performance benefits of migrating active processes for load sharing", Performance Evaluation Review
- Lazowska, Eager, et al.
- 1998
(Show Context)
Citation Context ...ere has been little work on characterizing the properties of load at fine resolutions. The available studies concentrate on understanding functions of load, such as availability [21] or job durations =-=[8, 18, 11]-=-. Furthermore, they deal with the coarse grain behavior of load — how it changes over minutes, hours and days. This paper is a first step to a better understanding the properties of load on real syste... |

99 | Estimators for long-range dependence: an empirical study
- Taqqu, Teverovsky, et al.
- 1995
(Show Context)
Citation Context ...estimates for H are much larger than 0.5. We examined each of the load traces for self-similarity and estimated each one’s Hurst parameter. There are many different estimators for the Hurst parameter =-=[27]-=-, but there is no consensus on how to best estimate the Hurst parameter of a measured series. The most common technique is to use several Hurst parameter estimators and try to find agreement among the... |

98 |
The available capacity of a privately owned workstation environment, Perfomance Evaluation
- Mutka, Livny
- 1991
(Show Context)
Citation Context ...ortunately, to date there has been little work on characterizing the properties of load at fine resolutions. The available studies concentrate on understanding functions of load, such as availability =-=[21]-=- or job durations [8, 18, 11]. Furthermore, they deal with the coarse grain behavior of load — how it changes over minutes, hours and days. This paper is a first step to a better understanding the pro... |

93 |
Statistical methods for data with long-range dependence
- Beran
- 1992
(Show Context)
Citation Context ...d conditions may be preferable to waiting for the adversity to be ameliorated over the long term. The self-similarity result also suggests certain modeling approaches, such as fractional ARIMA models =-=[12, 10, 3]-=- which can capture this property. (6) The traces display epochal behavior. The local frequency content of the load signal remains quite stable for long periods of time (150-450 seconds mean) and chang... |

89 | Jade: A high-level, machine-independent language for parallel programming
- Rinard, Scales, et al.
- 1993
(Show Context)
Citation Context ...dynamically changing loads (what we will call the mapping problem) is a basic problem that arises in a number of important contexts, such as dynamically load-balancing the tasks in a parallel program =-=[24, 1, 26]-=-, and scheduling tasks to meet deadlines in a distributed soft real-time system [15, 22, 23, 17]. Host load has a significant effect on running time. Indeed, the running time of a compute bound task i... |

76 | Dome: Parallel Programming in a Heterogeneous Multi-User Environment
- Arabe, Lowekamp, et al.
(Show Context)
Citation Context ...dynamically changing loads (what we will call the mapping problem) is a basic problem that arises in a number of important contexts, such as dynamically load-balancing the tasks in a parallel program =-=[24, 1, 26]-=-, and scheduling tasks to meet deadlines in a distributed soft real-time system [15, 22, 23, 17]. Host load has a significant effect on running time. Indeed, the running time of a compute bound task i... |

43 | An Evaluation of Linear Models for Host Load Prediction
- Dinda, O’Hallaron
- 1999
(Show Context)
Citation Context ...r completing this study, we evaluated linear models for predicting host load using the traces, finding that relatively simple autoregressive models are sufficient for short range host load prediction =-=[7]-=-. 2 Measurement methodology The load on a Unix system at any given instant is the number of processes that are running or are ready to run, which is the length of the ready queue maintained by the sch... |

37 | The statistical properties of host load
- Dinda
- 1999
(Show Context)
Citation Context .... In this paper, we present a detailed statistical analysis of both sets of traces and contemplate the implications of the properties we find for the mapping problem. An earlier version of this paper =-=[6]-=-, concentrated on the first set of traces. The basic question is whether load traces that might seem at first glance to be random and unpredictable might have structure that could be exploited by a ma... |

37 | A comparison of queueing, cluster and distributed compuing systems
- Kaplan, Nelson
- 1994
(Show Context)
Citation Context ...Production Cluster: 13 hosts of the PSC’s “Supercluster”, including two front-end machines (axpfea, axpfeb), four interactive machines (axp0 through axp3), and seven batch machines scheduled by a DQS =-=[16]-=- variant (axp4 through axp10.) Research Cluster: eight machines in an experimental cluster in the CMCL (manchester-1 through manchester-8.) Compute servers: two high performance large memory machines ... |

35 | Automatic generation of parallel programs with dynamic load balancing
- Siegell, Steenkiste
- 1994
(Show Context)
Citation Context ...dynamically changing loads (what we will call the mapping problem) is a basic problem that arises in a number of important contexts, such as dynamically load-balancing the tasks in a parallel program =-=[24, 1, 26]-=-, and scheduling tasks to meet deadlines in a distributed soft real-time system [15, 22, 23, 17]. Host load has a significant effect on running time. Indeed, the running time of a compute bound task i... |

16 |
Predictable network computing
- Polze, Fohler, et al.
- 1997
(Show Context)
Citation Context ...arises in a number of important contexts, such as dynamically load-balancing the tasks in a parallel program [24, 1, 26], and scheduling tasks to meet deadlines in a distributed soft real-time system =-=[15, 22, 23, 17]-=-. Host load has a significant effect on running time. Indeed, the running time of a compute bound task is directly related to the average load it encounters during execution. Determining a good mappin... |

14 | The impact of self-similarity on network performance analysis
- Morin
- 1995
(Show Context)
Citation Context ...s. These stochastic processes model the sort of the mechanisms that give rise to self-similar signals. We shall avoid a mathematical treatment here, but interested readers may want to consult [19] or =-=[20]-=- for a treatment in the context of networking or [2] for its connection to fractal geometry, or [3] for a treatment from a linear time series point of view. Interestingly, self-similarity has revoluti... |

12 |
Load Sharing in Soft RealTime Distributed Computer Systems
- Kurose, Chipalkatti
- 1987
(Show Context)
Citation Context ...arises in a number of important contexts, such as dynamically load-balancing the tasks in a parallel program [24, 1, 26], and scheduling tasks to meet deadlines in a distributed soft real-time system =-=[15, 22, 23, 17]-=-. Host load has a significant effect on running time. Indeed, the running time of a compute bound task is directly related to the average load it encounters during execution. Determining a good mappin... |

7 |
Fractal structures and processes
- BASSINGTHWAIGHTE, BEARD, et al.
- 1995
(Show Context)
Citation Context ...echanisms that give rise to self-similar signals. We shall avoid a mathematical treatment here, but interested readers may want to consult [19] or [20] for a treatment in the context of networking or =-=[2]-=- for its connection to fractal geometry, or [3] for a treatment from a linear time series point of view. Interestingly, self-similarity has revolutionized network traffic modelling in the 1990s [9, 19... |

3 |
Analysis, modeling and genreation of self-similar VBR video traffic
- Garrett, Willinger
- 1994
(Show Context)
Citation Context ...or [2] for its connection to fractal geometry, or [3] for a treatment from a linear time series point of view. Interestingly, self-similarity has revolutionized network traffic modelling in the 1990s =-=[9, 19, 20, 28]-=-. The degree and nature of the self-similarity of a sequence is summarized by the Hurst parameter, H [13]. Intuitively, H describes the relative contribution of low and high frequency components to th... |

3 |
Realtime corba: A white paper. http://www.omg.org
- GROUP
- 1996
(Show Context)
Citation Context |