Results 1 - 10
of
116
Web Caching and Zipf-like Distributions: Evidence and Implications
- In INFOCOM
, 1999
"... This paper addresses two unresolved issues about web caching. The first issue is whether web requests from a fixed user community are distributed according to Zipf's law [22]. Several early studies have supported this claim [9], [5], while other recent studies have suggested otherwise [16], [2]. The ..."
Abstract
-
Cited by 715 (2 self)
- Add to MetaCart
This paper addresses two unresolved issues about web caching. The first issue is whether web requests from a fixed user community are distributed according to Zipf's law [22]. Several early studies have supported this claim [9], [5], while other recent studies have suggested otherwise [16], [2]. The ;econd issue relates to a number of recent studies on the characteristics of web proxy traces, which have shown that the hit-ratios and temporal locality of the traces exhibit certain asymptotic properties that are uniform across the different sets of the traces [43, [XO], [71, [XO], [XS]. In partlc- ular, the question is whether these properties are inherent to web accesses or whether they are simply an artifact of the traces. An answer to these unresolved issues will facili- tate both web cache resource planning and cache hierarchy design
Summary cache: A scalable wide-area web cache sharing protocol
, 1998
"... The sharing of caches among Web proxies is an important technique to reduce Web traffic and alleviate network bottlenecks. Nevertheless it is not widely deployed due to the overhead of existing protocols. In this paper we propose a new protocol called "Summary Cache"; each proxy keeps a summary of t ..."
Abstract
-
Cited by 596 (2 self)
- Add to MetaCart
The sharing of caches among Web proxies is an important technique to reduce Web traffic and alleviate network bottlenecks. Nevertheless it is not widely deployed due to the overhead of existing protocols. In this paper we propose a new protocol called "Summary Cache"; each proxy keeps a summary of the URLs of cached documents of each participating proxy and checks these summaries for potential hits before sending any queries. Two factors contribute to the low overhead: the summaries are updated only periodically, and the summary representations are economical -- as low as 8 bits per entry. Using trace-driven simulations and a prototype implementation, we show that compared to the existing Internet Cache Protocol (ICP), Summary Cache reduces the number of inter-cache messages by a factor of 25 to 60, reduces the bandwidth consumption by over 50%, and eliminates between 30 % to 95 % of the CPU overhead, while at the same time maintaining almost the same hit ratio as ICP. Hence Summary Cache enables cache sharing among a large number of proxies.
On the Scale and Performance of Cooperative Web Proxy Caching
- ACM Symposium on Operating Systems Principles
, 1999
"... While algorithms for cooperative proxy caching have been widely studied, little is understood about cooperative-caching performance in the large-scale World Wide Web environment. This paper uses both trace-based analysis and analytic modelling to show the potential advantages and drawbacks of inter- ..."
Abstract
-
Cited by 250 (15 self)
- Add to MetaCart
While algorithms for cooperative proxy caching have been widely studied, little is understood about cooperative-caching performance in the large-scale World Wide Web environment. This paper uses both trace-based analysis and analytic modelling to show the potential advantages and drawbacks of inter-proxy cooperation. With our traces, we evaluate quantitatively the performance-improvement potential of cooperation between 200 small-organization proxies within a university environment, and between two large-organization proxies handling 23,000 and 60,000 clients, respectively. With our model, we extend beyond these populations to project cooperative caching behavior in regions with millions of clients. Overall, we demonstrate that cooperative caching has performance benefits only within limited population bounds. We also use our model to examine the implications of future trends in Web-access behavior and traffic.
An Analysis of Internet Content Delivery Systems
, 2002
"... In the span of only a few years, the Internet has experienced an astronomical increase in the use of specialized content delivery systems, such as content delivery networks and peer-to-peer file sharing systems. Therefore, an understanding of content delivery on the Internet now requires a detailed ..."
Abstract
-
Cited by 239 (10 self)
- Add to MetaCart
In the span of only a few years, the Internet has experienced an astronomical increase in the use of specialized content delivery systems, such as content delivery networks and peer-to-peer file sharing systems. Therefore, an understanding of content delivery on the Internet now requires a detailed understanding of how these systems are used in practice. This paper examines content delivery from the point of view of four content delivery systems: HTTP web traffic, the Akamai content delivery network, and Kazaa and Gnutella peer-to-peer file sharing traffic. We collected a trace of all incoming and outgoing network traffic at the University of Washington, a large university with over 60,000 students, faculty, and staff. From this trace, we isolated and characterized traffic belonging to each of these four delivery classes. Our results (1) quantify the rapidly increasing importance of new content delivery systems, particularly peerto-peer networks, (2) characterize the behavior of these systems from the perspectives of clients, objects, and servers, and (3) derive implications for caching in these systems. 1
A survey of web caching schemes for the internet
- ACM Computer Communication Review
, 1999
"... The World Wide Web can be considered as a large distributed information system that provides access to shared data objects. As one of the most popular applications currently running on the Internet, the World Wide Web is of an exponential growth in size, which results in network congestion and serve ..."
Abstract
-
Cited by 200 (1 self)
- Add to MetaCart
The World Wide Web can be considered as a large distributed information system that provides access to shared data objects. As one of the most popular applications currently running on the Internet, the World Wide Web is of an exponential growth in size, which results in network congestion and server overloading. Web caching has been recognized as one of the effective schemes to alleviate the service bottleneck and reduce the network traffic, thereby minimize the user access latency. In this paper, we first describe the elements of a Web caching system and its desirable properties. Then, we survey the state-of-art techniques which have been used in Web caching systems. Finally, we discuss the research frontier
Workload Characterization of the 1998 World Cup Web Site
- IEEE Network
, 1999
"... Web, workload characterization, performance, servers, caching, World Cup © Copyright Hewlett-Packard Company 1999 This paper presents a detailed workload characterization study of the 1998 World Cup Web site. Measurements from this site were collected over a three month period. During this time the ..."
Abstract
-
Cited by 157 (5 self)
- Add to MetaCart
Web, workload characterization, performance, servers, caching, World Cup © Copyright Hewlett-Packard Company 1999 This paper presents a detailed workload characterization study of the 1998 World Cup Web site. Measurements from this site were collected over a three month period. During this time the site received 1.35 billion requests, making this the largest Web workload analyzed to date. By examining this extremely busy site and through comparison with existing characterization studies we are able to determine how Web server workloads are evolving. We find that improvements in the caching architecture of the World-Wide Web are changing the workloads of Web servers, but that major improvements to that architecture are still necessary. In particular, we uncover evidence that a better consistency mechanism is required for World-Wide Web caches.
A Scalable System for Consistently Caching Dynamic Web Data
, 1999
"... This paper presents a new approach for consistently caching dynamic Web data in order to improve performance. Our algorithm, which we call Data Update Propagation (DUP), maintains data dependence informationbetween cached objects and the underlying data which affect their values in a graph. When the ..."
Abstract
-
Cited by 123 (16 self)
- Add to MetaCart
This paper presents a new approach for consistently caching dynamic Web data in order to improve performance. Our algorithm, which we call Data Update Propagation (DUP), maintains data dependence informationbetween cached objects and the underlying data which affect their values in a graph. When the system becomes aware of a change to underlying data, graph traversal algorithms are applied to determine which cached objects are affected by the change. Cached objects which are found to be highly obsolete are then either invalidated or updated. DUP was a critical component at the official Web site for the 1998 Olympic Winter Games. By using DUP, we were able to achieve cache hit rates close to 100% compared with 80% for an earlier version of our system which did not employ DUP. As a result of the high cache hit rates, the Olympic Games Web site was able to serve data quickly even during peak request periods. 1 Introduction Web servers provide two types of data: static data from files st...
Beyond hierarchies: Design considerations for distributed caching on the internet
- in Proceedings of the 19th International Conference on Distributed Computing Systems (ICDCS
, 1998
"... Abstract In this paper, we examine several distributed caching strategies to improve the response time for accessing data over theInternet. By studying several Internet caches and workloads, we derive four basic design principles for large scale distributed ..."
Abstract
-
Cited by 100 (6 self)
- Add to MetaCart
Abstract In this paper, we examine several distributed caching strategies to improve the response time for accessing data over theInternet. By studying several Internet caches and workloads, we derive four basic design principles for large scale distributed
End-to-end WAN Service Availability
- In Proc. 3rd USITS
, 2001
"... This study seeks to understand how network failures affect the availability of service delivery across wide area networks and to evaluate classes of techniques for improving end-to-end service availability. Using several large-scale connectivity traces, we develop a model of network unavailability t ..."
Abstract
-
Cited by 96 (14 self)
- Add to MetaCart
This study seeks to understand how network failures affect the availability of service delivery across wide area networks and to evaluate classes of techniques for improving end-to-end service availability. Using several large-scale connectivity traces, we develop a model of network unavailability that includes key parameters such as failure location and failure duration. We then use trace-based simulation to evaluate several classes of techniques for coping with network unavailability. We find that caching alone is seldom effective at insulating services from failures but that the combination of mobile extension code and prefetching can improve average unavailability by as much as an order of magnitude for classes of service whose semantics support disconnected operation. We find that routing-based techniques may provide significant improvements, but that the improvements of many individual techniques are limited because they do not address all significant categories of network failures. By combining the techniques we examine, some systems may be able to reduce average unavailability by as much as one or two orders of magnitude.
Design considerations for distributed caching on the Internet
- In ICDCS
, 1999
"... In this paper, we describe the design and implementation of an integrated architecture for cache systems that scale to hundreds or thousands of caches with thousands to millions of users. Rather than simply try to maximize hit rates, we take an end-to-end approach to improving response time by also ..."
Abstract
-
Cited by 91 (17 self)
- Add to MetaCart
In this paper, we describe the design and implementation of an integrated architecture for cache systems that scale to hundreds or thousands of caches with thousands to millions of users. Rather than simply try to maximize hit rates, we take an end-to-end approach to improving response time by also considering hit times and miss times. We begin by studying several Internet caches and workloads, and we derive three core design principles for large scale distributed caches: (1) minimize the number of hops to locate and access data on both hits and misses, (2) share data among many users and scale to many caches, and (3) cache data close to clients. Our strategies for addressing these issues are built around a scalable, high-performance data-location service that tracks where objects are replicated. We describe how to construct such a service and how to use this service to provide direct access to remote data and push-based data replication. We evaluate our system through trace-driven simulation and find that these strategies together provide response time speedups of 1.27 to 2.43 compared to a traditional three-level cache hierarchy for a range of trace workloads and simulated environments. 1.

