Results 1 - 10
of
78
Generating Representative Web Workloads for Network and Server Performance Evaluation
, 1997
"... One role for workload generation is as a means for understanding how servers and networks respond to variation in load. This enables management and capacity planning based on current and projected usage. This paper applies a number of observations of Web server usage to create a realistic Web worklo ..."
Abstract
-
Cited by 681 (8 self)
- Add to MetaCart
One role for workload generation is as a means for understanding how servers and networks respond to variation in load. This enables management and capacity planning based on current and projected usage. This paper applies a number of observations of Web server usage to create a realistic Web workload generation tool which mimics a set of real users accessing a server. The tool, called Surge (Scalable URL Reference Generator) generates references matching empirical measurements of 1) server file size distribution; 2) request size distribution; 3) relative file popularity; 4) embedded file references; 5) temporal locality of reference; and 6) idle periods of individual users. This paper reviews the essential elements required in the generation of a representative Web workload. It also addresses the technical challenges to satisfying this large set of simultaneous constraints on the properties of the reference stream, the solutions we adopted, and their associated accuracy. Finally, we present evidence that Surge exercises servers in a manner significantly different from other Web server benchmarks.
Cluster-Based Scalable Network Services
, 1997
"... This paper has benefited from the detailed and perceptive comments of our reviewers, especially our shepherd Hank Levy. We thank Randy Katz and Eric Anderson for their detailed readings of early drafts of this paper, and David Culler for his ideas on TACC's potential as a model for cluster programmi ..."
Abstract
-
Cited by 343 (34 self)
- Add to MetaCart
This paper has benefited from the detailed and perceptive comments of our reviewers, especially our shepherd Hank Levy. We thank Randy Katz and Eric Anderson for their detailed readings of early drafts of this paper, and David Culler for his ideas on TACC's potential as a model for cluster programming. Ken Lutz and Eric Fraser configured and administered the test network on which the TranSend scaling experiments were performed. Cliff Frost of the UC Berkeley Data Communications and Networks Services group allowed us to collect traces on the Berkeley dialup IP network and has worked with us to deploy and promote TranSend within Berkeley. Undergraduate researchers Anthony Polito, Benjamin Ling, and Andrew Huang implemented various parts of TranSend's user profile database and user interface. Ian Goldberg and David Wagner helped us debug TranSend, especially through their implementation of the rewebber
An Empirical Model of HTTP Network Traffic
, 1997
"... The workload of the global Internet is dominated by the Hypertext Transfer Protocol (HTTP), an application protocol used by World Wide Web clients and servers. Simulation studies of this environment will require a model of the traffic patterns of the World Wide Web, in order to investigate the perfo ..."
Abstract
-
Cited by 210 (1 self)
- Add to MetaCart
The workload of the global Internet is dominated by the Hypertext Transfer Protocol (HTTP), an application protocol used by World Wide Web clients and servers. Simulation studies of this environment will require a model of the traffic patterns of the World Wide Web, in order to investigate the performance aspects of this increasingly popular application. We have developed an empirical model of network traffic produced by HTTP. Instead of relying on server or client logs, our approach is based on gathering packet traces of HTTP network conversations. Through traffic analysis, we have determined statistics and distributions for higher-level quantities such as the size of HTTP items retrieved, the number of items per "Web page", think time, and user browsing behavior. These quantities form a model can then be used by simulations to mimic World Wide Web network applications in wide-area IP internetworks. Keywords: World Wide Web, HTTP, traffic model, traffic measurements, workload, Interne...
Rate of Change and other Metrics: a Live Study of the World Wide Web
, 1997
"... Caching in the World Wide Web is based on two critical assumptions: that a significant fraction of requests reaccess resources that have already been retrieved; and that those resources do not change between accesses. We tested the validity of these assumptions, and their dependence on characterist ..."
Abstract
-
Cited by 176 (22 self)
- Add to MetaCart
Caching in the World Wide Web is based on two critical assumptions: that a significant fraction of requests reaccess resources that have already been retrieved; and that those resources do not change between accesses. We tested the validity of these assumptions, and their dependence on characteristics of Web resources, including access rate, age at time of reference, content type, resource size, and Internet top-level domain. We also measured the rate at which resources change, and the prevalence of duplicate copies in the Web. We quantified the potential benefit of a shared proxycaching server in a large environment by using traces that were collected at the Internet connection points for two large corporations, representing significant numbers of references. Only 22% of the resources referenced in the traces we analyzed were accessed more than once, but about half of the references were to those multiplyreferenced resources. Of this half, 13% were to a resource that had been modifi...
Mining Longest Repeating Subsequences To Predict World Wide Web Surfing
, 1999
"... Modeling and predicting user surfing paths involves tradeoffs between model complexity and predictive accuracy. In this paper we explore predictive modeling techniques that attempt to reduce model complexity while retaining predictive accuracy. We show that compared to various Markov models, longest ..."
Abstract
-
Cited by 145 (3 self)
- Add to MetaCart
Modeling and predicting user surfing paths involves tradeoffs between model complexity and predictive accuracy. In this paper we explore predictive modeling techniques that attempt to reduce model complexity while retaining predictive accuracy. We show that compared to various Markov models, longest repeating subsequence models are able to significantly reduce model size while retaining the ability to make accurate predictions. In addition, sharp increases in the overall predictive capabilities of these models are achievable by modest increases to the number of predictions made. 1. Introduction Users surf the World Wide Web (WWW) by navigating along the hyperlinks that connect islands of content. If we could predict where surfers were going (that is, what they were seeking) we might be able to improve surfers' interactions with the WWW. Indeed, several research and industrial thrusts attempt to generate and utilize such predictions. These technologies include those for searching thro...
Heavy-Tailed Probability Distributions in the World Wide Web
- IN A PRACTICAL GUIDE TO HEAVY TAILS: STATISTICAL TECHNIQUES AND APPLICATIONS
, 1998
"... The explosion of the World Wide Web as a medium for information dissemination has made it important to understand its characteristics, in particular the distribution of its file sizes. This paper presents evidence that a number of file size distributions in the Web exhibit heavy tails, including ..."
Abstract
-
Cited by 117 (10 self)
- Add to MetaCart
The explosion of the World Wide Web as a medium for information dissemination has made it important to understand its characteristics, in particular the distribution of its file sizes. This paper presents evidence that a number of file size distributions in the Web exhibit heavy tails, including files requested by users, files transmitted through the network, transmission durations of files, and files stored on servers. In addition, we argue that because of the presence of caching in the Web, the size distribution of transmitted files is primarily determined by the distribution of files available in the Web, and is relatively insensitive to the distribution of files requested by users. Finally, we discuss some of the implications of heavy-tailed transmission durations and relate these results to selfsimilarity in network traffic.
Digestor: Device-independent Access to the World Wide Web
- Proc. WWW-6
, 1997
"... Digestor is a software system which automatically re-authors arbitrary documents from the World-Wide Web to display appropriately on small screen devices such as PDAs and cellular phones, providing device-independent access to the Web. Digestor is implemented as an HTTP proxy which dynamically re-au ..."
Abstract
-
Cited by 97 (2 self)
- Add to MetaCart
Digestor is a software system which automatically re-authors arbitrary documents from the World-Wide Web to display appropriately on small screen devices such as PDAs and cellular phones, providing device-independent access to the Web. Digestor is implemented as an HTTP proxy which dynamically re-authors requested Web pages using a heuristic planning algorithm and a set of structural page transformations to achieve the best looking document for a given display size. 1. Introduction Access to World-Wide Web documents from personal electronic devices has been demonstrated in research projects [2,10,17,18], and is now becoming a commercial reality. General Magic's Presto!Links for Sony's MagicLink, AllPen's NetHopper for the Newton and Sharp's MI-10 (Figure 1, shown at right), all provide WWW browsers for PDA class devices, while the Nokia 9000 Communicator and Samsung's Duett provide Web access capabilities from cellular phones. Figure 1. Digestor: Device-Independent Access to the W...
Reproduced and emergent genres of communication on the World-Wide Web
- The Information Society
, 1997
"... The World Wide Web is growing quickly and being applied to many new types of communications. As a basis for studying organizational communications, Yates and Orlikowski (1992; Orlikowski & Yates, 1994) proposed using genres. They de � ned genres as “typi� ed communicative actions characterized by si ..."
Abstract
-
Cited by 78 (9 self)
- Add to MetaCart
The World Wide Web is growing quickly and being applied to many new types of communications. As a basis for studying organizational communications, Yates and Orlikowski (1992; Orlikowski & Yates, 1994) proposed using genres. They de � ned genres as “typi� ed communicative actions characterized by similar substance and form and taken in response to recurrent situations ” (Yates & Orlikowski, 1992, p. 299). They further suggested that communications in a new media would show both reproduction and adaptation of existing communicative genres as well as the emergence of new genres. We studied these phenomena on the World Wide Web by examining 1000 randomly selected Web pages and categorizing the type of genre represented. Although many pages recreated genres familiar from traditional media, we also saw examples of genres being adapted to take advantage of the linking and interactivity of the new medium and novel genres emerging to � t the unique communicative needs of the audience. We suggest that Web-site designers consider the genres that are appropriate for their situation and attempt to reproduce or adapt familiar genres.
Summary of WWW Characterizations
- World Wide Web
, 1998
"... To date there have been a number of efforts that attempt to characterize various aspects of the World Wide Web. This paper presents a summary of these efforts, highlighting regularities and invariants that have been discovered. Keywords: Statistics, Metrics, Analysis, and Modeling ..."
Abstract
-
Cited by 78 (0 self)
- Add to MetaCart
To date there have been a number of efforts that attempt to characterize various aspects of the World Wide Web. This paper presents a summary of these efforts, highlighting regularities and invariants that have been discovered. Keywords: Statistics, Metrics, Analysis, and Modeling
Informetric analyses on the world wide web: methodological approaches to ‘webometrics
- Journal of Documentation
, 1997
"... 1988, no part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying or otherwise without the prior written permission of the publisher. ..."
Abstract
-
Cited by 55 (4 self)
- Add to MetaCart
1988, no part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying or otherwise without the prior written permission of the publisher.

