Results 1 -
9 of
9
Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes
, 1996
"... Recently the notion of self-similarity has been shown to apply to wide-area and local-area network traffic. In this paper we examine the mechanisms that give rise to the self-similarity of network traffic. We present a hypothesized explanation for the possible self-similarity of traffic by using a p ..."
Abstract
-
Cited by 1023 (22 self)
- Add to MetaCart
Recently the notion of self-similarity has been shown to apply to wide-area and local-area network traffic. In this paper we examine the mechanisms that give rise to the self-similarity of network traffic. We present a hypothesized explanation for the possible self-similarity of traffic by using a particular subset of wide area traffic: traffic due to the World Wide Web (WWW). Using an extensive set of traces of actual user executions of NCSA Mosaic, reflecting over half a million requests for WWW documents, we examine the dependence structure of WWW traffic. While our measurements are not conclusive, we show evidence that WWW traffic exhibits behavior that is consistent with self-similar traffic models. Then we show that the self-similarity insuch traffic can be explained based on the underlying distributions of WWW document sizes, the effects of caching and user preference in le transfer, the effect of user "think time", and the superimposition of many such transfers in a local area network. To do this we rely on empirically measured distributions both from our traces and from data independently collected at over thirty WWW sites.
Evidence for long-tailed distributions in the Internet
- In Proceedings of ACM SIGCOMM Internet Measurment Workshop
, 2001
"... We review evidence that Internet traffic is characterized by long-tailed distributions of interarrival times, transfer times, burst sizes and burst lengths. We propose a new statistical technique for identifying long-tailed distributions, and apply it to a variety of datasets collected on the Intern ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
We review evidence that Internet traffic is characterized by long-tailed distributions of interarrival times, transfer times, burst sizes and burst lengths. We propose a new statistical technique for identifying long-tailed distributions, and apply it to a variety of datasets collected on the Internet. We find that there is little evidence that interarrival times and transfer times are long-tailed, but that there is some evidence for long-tailed burst sizes. We speculate on the causes of long-tailed bursts. I.
The Case for SRPT Scheduling in Web Servers
, 1998
"... The Shortest-Remaining-Processing-Time (SRPT) scheduling policy is known to be the optimal policy for minimizing mean response time, but it is rarely employed in computing systems for a number of reasons. These reasons include: lack of knowledge of task size, fear of starvation of the large tasks ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
The Shortest-Remaining-Processing-Time (SRPT) scheduling policy is known to be the optimal policy for minimizing mean response time, but it is rarely employed in computing systems for a number of reasons. These reasons include: lack of knowledge of task size, fear of starvation of the large tasks, concern over pre-emption overhead, and lack of empirical evidence on the performance benefits of switching to SRPT. In this paper we argue that the special characteristics of Web servers and Web workloads make the usual objections to SRPT less persuasive. We start by
Size-based Scheduling Policies with Inaccurate Scheduling Information
- In Proc. of IEEE Mascots
, 2004
"... Size-based scheduling policies such as SRPT have been studied since 1960s and have been applied in various arenas including packet networks and web server scheduling. SRPT has been proven to be optimal in the sense that it yields---compared to any other conceivable strategy---the smallest mean v ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
Size-based scheduling policies such as SRPT have been studied since 1960s and have been applied in various arenas including packet networks and web server scheduling. SRPT has been proven to be optimal in the sense that it yields---compared to any other conceivable strategy---the smallest mean value of occupancy and therefore also of waiting and delay time. One important prerequisite to applying size-based scheduling is to know the sizes of all jobs in advance, which are unfortunately not always available.
Lognormal and Pareto distributions in the Internet
- Comput. Commun
, 2005
"... Numerous studies have reported long-tailed distributions for various network metrics, including file sizes, transfer times, and burst lengths. We review techniques for identifying long-tailed distributions based on a sample, propose a new technique, and apply these methods to datasets used in previo ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Numerous studies have reported long-tailed distributions for various network metrics, including file sizes, transfer times, and burst lengths. We review techniques for identifying long-tailed distributions based on a sample, propose a new technique, and apply these methods to datasets used in previous reports. We find that the evidence for long tails is inconsistent, and that lognormal and other non-long-tailed models are usually sufficient to characterize network metrics. We discuss the implications of this result for current explanations of self-similarity in network traffic.
Looking at the Server Side of Peer-to-Peer Systems
- In 7th Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers (LCR
, 2004
"... Peer-to-peer systems have grown significantly in popularity over the last few years. An increasing number of research projects have been closely following this trend, looking at many of the paradigm's technical aspects. In the context of data-sharing services, efforts have focused on a variety of ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Peer-to-peer systems have grown significantly in popularity over the last few years. An increasing number of research projects have been closely following this trend, looking at many of the paradigm's technical aspects. In the context of data-sharing services, efforts have focused on a variety of issues from object location and routing to fair sharing and peer lifespans. Overall, the majority of these projects have concentrated on either the whole P2P infrastructure or the client-side of peers. Little attention has been given to the peer's server-side, even when that side determines much of the everyday user's experience. In this paper, we make the case for looking at the server side of peers, focusing on the problem of scheduling with the intent of minimizing the average response time experienced by users. We start by characterizing server workload based on extensive trace collection and analysis. We then evaluate the performance and fairness of different scheduling policies through trace-driven simulations. Our results show that average response time can be dramatically reduced by more effectively scheduling the requests on the server-side of P2P systems.
Effects and Implications of File Size/Service Time Correlation on Web Server Scheduling Policies
- In Proc. of IEEE Mascots
, 2004
"... Recently, size-based policies such as SRPT and FSP have been proposed for scheduling requests in web servers. SRPT and FSP are superior to policies that ignore request size, such as PS, in both efficiency and fairness given heavy-tailed service times. However, a central assumption that is usually ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Recently, size-based policies such as SRPT and FSP have been proposed for scheduling requests in web servers. SRPT and FSP are superior to policies that ignore request size, such as PS, in both efficiency and fairness given heavy-tailed service times. However, a central assumption that is usually made in implementing size-based policies in a web server is that the service time of a request is strongly correlated with the size of the file it serves. This paper shows how the performance of SRPT and FSP are affected by the degree of this correlation. We developed a simulator that supports both M/G/1/m and G/G/n/m queuing models. The simulator can be driven with trace data, which can be taken from the logs of modified Apache servers, or which can be produced by a workload generator we have developed that allows us to control the correlation. Using both trace data and generated data, we find that the degree of correlation has a dramatic effect on the performance of SRPT and FSP. In response, we propose and evaluate domain-based scheduling, a simple technique that better estimates connection times by making use of the source IP address of the request. Domain-based scheduling improves SRPT and FSP performance on web servers, particularly in regimes where correlation is low, thus making size-based policies such as these more broadly deployable.
Correlation on Web Server Scheduling Policies
, 2003
"... Recently, size-based policies such as SRPT and FSP have been proposed for scheduling requests in web servers. SRPT and FSP are superior to policies that ignore request size, such as PS, in both efficiency and fairness given heavy-tailed service times. However, a central assumption that is usually ma ..."
Abstract
- Add to MetaCart
Recently, size-based policies such as SRPT and FSP have been proposed for scheduling requests in web servers. SRPT and FSP are superior to policies that ignore request size, such as PS, in both efficiency and fairness given heavy-tailed service times. However, a central assumption that is usually made in implementing size-based policies in a web server is that the service time of a request is strongly correlated with the size of the file it serves. This paper shows how the performance of SRPT and FSP are affected by the degree of this correlation. We developed a simulator that supports both M/G/1 / and G/G/n/m queuing models. The simulator can be driven with trace data, which can be taken from the logs of modified Apache servers, or which can be produced by a workload generator we have developed that allows us to control the correlation. Using both trace data and generated data, we find that the degree of correlation has a dramatic effect on the performance of SRPT and FSP. In response, we propose and evaluate domain-based scheduling, a simple technique that better estimates connection times by making use of the source IP address of the request. Domain-based scheduling improves SRPT and FSP performance on web servers, particularly in regimes where correlation is low, thus making size-based policies such as these more broadly deployable.
Improving Peer-to-Peer . . .
, 2008
"... We show how to significantly improve the mean response time seen by both uploaders and downloaders in peer-to-peer data-sharing systems. Our work is motivated by the observation that response times are largely determined by the performance of the peers serving the requested objects, that is, by the ..."
Abstract
- Add to MetaCart
We show how to significantly improve the mean response time seen by both uploaders and downloaders in peer-to-peer data-sharing systems. Our work is motivated by the observation that response times are largely determined by the performance of the peers serving the requested objects, that is, by the peers in their capacity as servers. With this in mind, we take a close look at this server side of peers, characterizing its workload by collecting and examining an extensive set of traces. Using trace-driven simulation, we demonstrate the promise and potential problems with scheduling policies based on shortest-remaining-processing-time (SRPT), the algorithm known to be optimal for minimizing mean response time. The key challenge to using SRPT in this context is determining request service times. In addressing this challenge, we introduce two new estimators that enable predictive SRPT scheduling policies that closely approach the performance of ideal SRPT. We evaluate our approach through extensive single-server and system-level simulation coupled with real Internet deployment and experimentation.

