Results 1  10
of
74
WideArea Traffic: The Failure of Poisson Modeling
 IEEE/ACM TRANSACTIONS ON NETWORKING
, 1995
"... Network arrivals are often modeled as Poisson processes for analytic simplicity, even though a number of traffic studies have shown that packet interarrivals are not exponentially distributed. We evaluate 24 widearea traces, investigating a number of widearea TCP arrival processes (session and con ..."
Abstract

Cited by 1413 (21 self)
 Add to MetaCart
Network arrivals are often modeled as Poisson processes for analytic simplicity, even though a number of traffic studies have shown that packet interarrivals are not exponentially distributed. We evaluate 24 widearea traces, investigating a number of widearea TCP arrival processes (session and connection arrivals, FTP data connection arrivals within FTP sessions, and TELNET packet arrivals) to determine the error introduced by modeling them using Poisson processes. We find that userinitiated TCP session arrivals, such as remotelogin and filetransfer, are wellmodeled as Poisson processes with fixed hourly rates, but that other connection arrivals deviate considerably from Poisson; that modeling TELNET packet interarrivals as exponential grievously underestimates the burstiness of TELNET traffic, but using the empirical Tcplib [Danzig et al, 1992] interarrivals preserves burstiness over many time scales; and that FTP data connection arrivals within FTP sessions come bunched into “connection bursts,” the largest of which are so large that they completely dominate FTP data traffic. Finally, we offer some results regarding how our findings relate to the possible selfsimilarity of widearea traffic.
Empirical properties of asset returns: stylized facts and statistical issues
 Quantitative Finance
, 2001
"... We present a set of stylized empirical facts emerging from the statistical analysis of price variations in various types of financial markets. We first discuss some general issues common to all statistical studies of financial time series. Various statistical properties of asset returns are then des ..."
Abstract

Cited by 155 (2 self)
 Add to MetaCart
We present a set of stylized empirical facts emerging from the statistical analysis of price variations in various types of financial markets. We first discuss some general issues common to all statistical studies of financial time series. Various statistical properties of asset returns are then described: distributional properties, tail properties and extreme fluctuations, pathwise regularity, linear and nonlinear dependence of returns in time and across stocks. Our description emphasizes properties common to a wide variety of markets and instruments. We then show how these statistical properties invalidate many of the common statistical approaches used to study financial data sets and examine some of the statistical problems encountered in each case.
On the Relevance of LongRange Dependence in Network Traffic
, 1996
"... There is much experimental evidence that network traffic processes exhibit ubiquitous properties of selfsimilarity and long range dependence (LRD), i.e., of correlations over a wide range of time scales. However, there is still considerable debate about how to model such processes and about their im ..."
Abstract

Cited by 152 (1 self)
 Add to MetaCart
There is much experimental evidence that network traffic processes exhibit ubiquitous properties of selfsimilarity and long range dependence (LRD), i.e., of correlations over a wide range of time scales. However, there is still considerable debate about how to model such processes and about their impact on network and application performance. In this paper, we argue that much recent modeling work has failed to consider the impact of two important parameters, namely the finite range of time scales of interest in performance evaluation and prediction problems, and the firstorder statistics such as the marginal distribution of the process.
Fast Approximation of SelfSimilar Network Traffic
, 1995
"... Recent network traffic studies argue that network arrival processes are much more faithfully modeled using statistically selfsimilar processes instead of traditional Poisson processes [LTWW94a, PF94]. One difficulty in dealing with selfsimilar models is how to efficiently synthesize traces (sample ..."
Abstract

Cited by 94 (0 self)
 Add to MetaCart
Recent network traffic studies argue that network arrival processes are much more faithfully modeled using statistically selfsimilar processes instead of traditional Poisson processes [LTWW94a, PF94]. One difficulty in dealing with selfsimilar models is how to efficiently synthesize traces (sample paths) corresponding to selfsimilar traffic. We present a fast Fourier transform method for synthesizing approximate selfsimilar sample paths and assess its performance and validity. We find that the method is as fast or faster than existing methods and appears to generate a closer approximation to true selfsimilar sample paths than the other known fast method (Random Midpoint Displacement). We then discuss issues in using such synthesized sample paths for simulating network traffic, and how an approximation used by our method can dramatically speed up evaluation of Whittle's estimator for H, the Hurst parameter giving the strength of longrange dependence present in a selfsimilar time s...
Host Load Prediction Using Linear Models
, 2000
"... This paper evaluates linear models for predicting the Digital Unix fivesecond host load average from 1 to 30 seconds into the future. A detailed statistical study of a large number of long, fine grain load traces from a variety of real machines leads to consideration of the BoxJenkins models (AR ..."
Abstract

Cited by 65 (13 self)
 Add to MetaCart
This paper evaluates linear models for predicting the Digital Unix fivesecond host load average from 1 to 30 seconds into the future. A detailed statistical study of a large number of long, fine grain load traces from a variety of real machines leads to consideration of the BoxJenkins models (AR, MA, ARMA, ARIMA), and the ARFIMA models (due to selfsimilarity.) We also consider a simple windowedmean model. The computational requirements of these models span a wide range, making some more practical than others for incorporation into an online prediction system. We rigorously evaluate the predictive power of the models by running a large number of randomized testcases on the load traces and then datamining their results. The main conclusions are that load is consistently predictable to a very useful degree, and that the simple, practical models such as AR are sufficient for host load prediction. We recommend AR(16) models or better for host load prediction. We implement an online host load prediction system around the AR(16) model and evaluate its overhead, finding that it uses miniscule amounts of CPU time and network bandwidth
Fast, Approximate Synthesis of Fractional Gaussian Noise for Generating SelfSimilar Network Traffic
 ACM SIGCOMM, Computer Communication Review
, 1997
"... Recent network traffic studies argue that network arrival processes are much more faithfully modeled using statistically selfsimilar processes instead of traditional Poisson processes [LTWW94, PF95]. One difficulty in dealing with selfsimilar models is how to efficiently synthesize traces (sample p ..."
Abstract

Cited by 58 (2 self)
 Add to MetaCart
Recent network traffic studies argue that network arrival processes are much more faithfully modeled using statistically selfsimilar processes instead of traditional Poisson processes [LTWW94, PF95]. One difficulty in dealing with selfsimilar models is how to efficiently synthesize traces (sample paths) corresponding to selfsimilar traffic. We present a fast Fourier transform method for synthesizing approximate selfsimilar sample paths for one type of selfsimilar process, Fractional Gaussian Noise, and assess its performance and validity. We find that the method is as fast or faster than existing methods and appears to generate close approximations to true selfsimilar sample paths. We also discuss issues in using such synthesized sample paths for simulating network traffic, and how an approximation used by our method can dramatically speed up evaluation of Whittle's estimator for H, the Hurst parameter giving the strength of longrange dependence present in a selfsimilar time series. 1
An Extensible Toolkit for Resource Prediction in Distributed Systems
, 1999
"... Abstract—RPS is a publicly available toolkit that allows a practitioner to straightforwardly create flexible online and offline resource prediction systems in which resources are represented by independent, periodically sampled, scalarvalued measurement streams. The systems predict the future value ..."
Abstract

Cited by 47 (21 self)
 Add to MetaCart
Abstract—RPS is a publicly available toolkit that allows a practitioner to straightforwardly create flexible online and offline resource prediction systems in which resources are represented by independent, periodically sampled, scalarvalued measurement streams. The systems predict the future values of such streams from past values and are composed at runtime out of a large and extensible set of communicating components that are in turn constructed using RPS’s extensible sensor, prediction, wavelet, and communication libraries. This paper describes the design, implementation, and performance of RPS. We have used RPS extensively to evaluate predictive models and build online prediction systems for host load, Windows performance data, and network bandwidth. The computation and communication overheads involved in such systems are quite low. Index Terms—Distributed systems, performance of systems. æ 1
An Evaluation of Linear Models for Host Load Prediction
, 1998
"... This paper evaluates linear models for predicting the Digital Unix fivesecond load average from 1 to 30 seconds into the future. A detailed statistical study of a large number of load traces leads to consideration of the BoxJenkins models (AR, MA, ARMA, ARIMA), and the ARFIMA models (due to selfs ..."
Abstract

Cited by 43 (7 self)
 Add to MetaCart
This paper evaluates linear models for predicting the Digital Unix fivesecond load average from 1 to 30 seconds into the future. A detailed statistical study of a large number of load traces leads to consideration of the BoxJenkins models (AR, MA, ARMA, ARIMA), and the ARFIMA models (due to selfsimilarity.) These models, as well as a simple windowedmean scheme, are evaluated by running a large number of randomized testcases on the load traces. The main conclusions are that load is consistently predictable to a useful degree, and that the simpler models such as AR are sufficient for doing this prediction.
The Statistical Properties of Host Load
 Scientific Programming
, 1998
"... the authors and should not be interpreted as necessarily representing the official ..."
Abstract

Cited by 37 (3 self)
 Add to MetaCart
the authors and should not be interpreted as necessarily representing the official
On estimation of the wavelet variance
 Biometrika
, 1995
"... The wavelet variance provides a scalebased decomposition of the process variance for a time series or random field. It has seen increasing use in geophysics, astronomy, genetics, hydrology, medical imaging, oceanography, soil science, signal processing and texture analysis. In practice, however, da ..."
Abstract

Cited by 33 (4 self)
 Add to MetaCart
The wavelet variance provides a scalebased decomposition of the process variance for a time series or random field. It has seen increasing use in geophysics, astronomy, genetics, hydrology, medical imaging, oceanography, soil science, signal processing and texture analysis. In practice, however, data collected in the form of a time series or random field often suffer from various types of contamination. We discuss the difficulties and limitations of existing contamination models (pure replacement models, additive outliers, level shift models and innovation outliers that hide themselves in the original time series) for robust nonparametric estimates of secondorder statistics. We then introduce a new model based upon the idea of scalebased multiplicative contamination. This model supposes that contamination can occur and affect data at certain scales and thus arises naturally in multiscale processes and in the wavelet variance context. For this new contamination model, we develop a full Mestimation theory for the wavelet variance and derive its large sample theory when the underlying time series or random field is Gaussian. Our approach treats the wavelet variance as a scale parameter and offers protection against contamination that operates additively on the log of squared wavelet coefficients and acts independently at different scales.