Results 1 - 10
of
70
A survey of web caching schemes for the internet
- ACM Computer Communication Review
, 1999
"... The World Wide Web can be considered as a large distributed information system that provides access to shared data objects. As one of the most popular applications currently running on the Internet, the World Wide Web is of an exponential growth in size, which results in network congestion and serve ..."
Abstract
-
Cited by 200 (1 self)
- Add to MetaCart
The World Wide Web can be considered as a large distributed information system that provides access to shared data objects. As one of the most popular applications currently running on the Internet, the World Wide Web is of an exponential growth in size, which results in network congestion and server overloading. Web caching has been recognized as one of the effective schemes to alleviate the service bottleneck and reduce the network traffic, thereby minimize the user access latency. In this paper, we first describe the elements of a Web caching system and its desirable properties. Then, we survey the state-of-art techniques which have been used in Web caching systems. Finally, we discuss the research frontier
Mining Longest Repeating Subsequences To Predict World Wide Web Surfing
, 1999
"... Modeling and predicting user surfing paths involves tradeoffs between model complexity and predictive accuracy. In this paper we explore predictive modeling techniques that attempt to reduce model complexity while retaining predictive accuracy. We show that compared to various Markov models, longest ..."
Abstract
-
Cited by 145 (3 self)
- Add to MetaCart
Modeling and predicting user surfing paths involves tradeoffs between model complexity and predictive accuracy. In this paper we explore predictive modeling techniques that attempt to reduce model complexity while retaining predictive accuracy. We show that compared to various Markov models, longest repeating subsequence models are able to significantly reduce model size while retaining the ability to make accurate predictions. In addition, sharp increases in the overall predictive capabilities of these models are achievable by modest increases to the number of predictions made. 1. Introduction Users surf the World Wide Web (WWW) by navigating along the hyperlinks that connect islands of content. If we could predict where surfers were going (that is, what they were seeking) we might be able to improve surfers' interactions with the WWW. Indeed, several research and industrial thrusts attempt to generate and utilize such predictions. These technologies include those for searching thro...
The Content and Access Dynamics of a Busy Web Site: Findings and Implications
, 2000
"... In this paper, we study the dynamics of the MSNBC news site, one of the busiest Web sites in the Internet today. Unlike many other efforts that have analyzed client accesses as seen by proxies, we focus on the server end. We analyze the dynamics of both the server content and client accesses made to ..."
Abstract
-
Cited by 104 (9 self)
- Add to MetaCart
In this paper, we study the dynamics of the MSNBC news site, one of the busiest Web sites in the Internet today. Unlike many other efforts that have analyzed client accesses as seen by proxies, we focus on the server end. We analyze the dynamics of both the server content and client accesses made to the server. The former considers the content creation and modification process while the latter considers page popularity and locality in client accesses. Some of our key results are: (a) files tend to change little when they are modified, (b) a small set of files tends to get modified repeatedly, (c) file popularity follows a Zipf-like distribution with a parameter ff that is much larger than reported in previous, proxy-based studies, and (d) there is significant temporal stability in file popularity but not much stability in the domains from which clients access the popular content. We discuss the implications of these findings for techniques such as Web caching (including cache consisten...
The Cache Location Problem
- IEEE/ACM Transactions on Networking
"... This paper studies the problem of where to place network caches. Emphasis is given to caches that are transparent to the clients since they are easier to manage and they require no cooperation from the clients. Our goal is to minimize the overall flow or the average delay by placing a given number o ..."
Abstract
-
Cited by 102 (6 self)
- Add to MetaCart
This paper studies the problem of where to place network caches. Emphasis is given to caches that are transparent to the clients since they are easier to manage and they require no cooperation from the clients. Our goal is to minimize the overall flow or the average delay by placing a given number of caches in the network.
Web Prefetching Between Low-Bandwidth Clients and Proxies: Potential and Performance
, 1999
"... The majority of the Internet population access the World Wide Web via dial-up modem connections. Studies have shown that the limited modem bandwidth is the main contributor to latency perceived by users. In this paper, we investigate one approach to reduce latency: prefetching between caching proxie ..."
Abstract
-
Cited by 91 (0 self)
- Add to MetaCart
The majority of the Internet population access the World Wide Web via dial-up modem connections. Studies have shown that the limited modem bandwidth is the main contributor to latency perceived by users. In this paper, we investigate one approach to reduce latency: prefetching between caching proxies and browsers. The approach relies on the proxy to predict which cached documents a user might reference next, and takes advantage of the idle time between user requests to push or pull the documents to the user. Using traces of modem Web accesses, we evaluate the potential of the technique at reducing client latency, examine the design of prediction algorithms, and investigate their performance varying the parameters and implementation concerns. Our results show that prefetching combined with large browser cache and delta-compression can reduce client latency up to 23.4%. The reduction is achieved using the Prediction-by-Partial-Matching (PPM) algorithm, whose accuracy ranges from 40% to ...
A Scalable Web Cache Consistency Architecture
, 1999
"... The rapid increase in web usage has led to dramatically increased loads on the network infrastructure and on individual web servers. To ameliorate these mounting burdens, there has been much recent interest in web caching architectures and algorithms. Web caching reduces network load, server load, a ..."
Abstract
-
Cited by 88 (1 self)
- Add to MetaCart
The rapid increase in web usage has led to dramatically increased loads on the network infrastructure and on individual web servers. To ameliorate these mounting burdens, there has been much recent interest in web caching architectures and algorithms. Web caching reduces network load, server load, and the latency of responses. However, web caching has the disadvantage that the pages returned to clients by caches may be stale, in that they may not be consistent with the version currently on the server. In this paper we describe a scalable web cache consistency architecture that provides fairly tight bounds on the staleness of pages. Our architecture borrows heavily from the literature, and can best be described as an invalidation approach made scalable by using a caching hierarchy and application-level multicast routing to convey the invalidations. We evaluate this design with calculations and simulations, and compare it to several other approaches. 1 Introduction The world-wide-web h...
Prefetching Hyperlinks
, 1999
"... This paper develops a new method for prefetching Web pages into the client cache. Clients send reference information to Web servers, which aggregate the reference information in near-real-time and then disperse the aggregated information to all clients, piggybacked on GET responses. The information ..."
Abstract
-
Cited by 84 (0 self)
- Add to MetaCart
This paper develops a new method for prefetching Web pages into the client cache. Clients send reference information to Web servers, which aggregate the reference information in near-real-time and then disperse the aggregated information to all clients, piggybacked on GET responses. The information indicates how often hyperlink URLs embedded in pages have been previously accessed relative to the embedding page. Based on knowledge about which hyperlinks are generally popular, clients initiate prefetching of the hyperlinks and their embedded images according to any algorithm they prefer. Both client and server may cap the prefetching mechanism's space overhead and waste of network resources due to speculation. The result of these differences is improved prefetching: lower client latency (by 52.3%) and less wasted network bandwidth (24.0%).
Adaptive Leases: A Strong Consistency Mechanism for the World Wide Web
- IEEE Transactions on Knowledge and Data Engineering
, 1999
"... In this paper, we argue that weak cache consistency mechanisms supported by existing web proxy caches must be augmented by strong consistency mechanisms to support the growing diversity in application requirements. Existing strong consistency mechanisms are not appealing for web environments due to ..."
Abstract
-
Cited by 82 (16 self)
- Add to MetaCart
In this paper, we argue that weak cache consistency mechanisms supported by existing web proxy caches must be augmented by strong consistency mechanisms to support the growing diversity in application requirements. Existing strong consistency mechanisms are not appealing for web environments due to their large state space or control message overhead. We focus on the lease approach that balances these tradeoffs and present analytical models and policies for determining the optimal lease duration. We present extensions to the http protocol to incorporate leases and then implement our techniques in the Squid proxy cache and the Apache web server. Our experimental evaluation of the leases approach shows that: (i) our techniques impose modest overheads even for long leases (a lease duration of 1 hour requires state to be maintained for 1030 leases and imposes an per-object overhead of a control message every 33 minutes); (ii) leases yields a 138-425% improvement over existing strong consist...
Prefetching the Means for Document Transfer: A New Approach for Reducing Web Latency
"... User-perceived latency is recognized as the central performance problem in the Web. We systematically measure factors contributing to this latency, across several locations. Our study reveals that DNS query times, TCP connection establishment, and start-of-session delays at HTTP servers, more so tha ..."
Abstract
-
Cited by 68 (2 self)
- Add to MetaCart
User-perceived latency is recognized as the central performance problem in the Web. We systematically measure factors contributing to this latency, across several locations. Our study reveals that DNS query times, TCP connection establishment, and start-of-session delays at HTTP servers, more so than transmission time, are major causes of long waits. Wait due to these factors also afflicts high-bandwidth users and has detrimental effect on perceived performance.
Web Usage Mining: Discovery and Application of Interestin Patterns from Web Data
, 2000
"... Web Usage Mining is the application of data mining techniques to Web clickstream data in order to extract usage patterns. As Web sites continue to grow in size and complexity, the results of Web Usage Mining have become critical for a number of applications such as Web site design, business and mark ..."
Abstract
-
Cited by 56 (0 self)
- Add to MetaCart
Web Usage Mining is the application of data mining techniques to Web clickstream data in order to extract usage patterns. As Web sites continue to grow in size and complexity, the results of Web Usage Mining have become critical for a number of applications such as Web site design, business and marketing decision support, personalization, usability studies, and network trac analysis. The two major challenges involved in Web Usage Mining are preprocessing the raw data to provide an accurate picture of how a site is being used, and ltering the results of the various data mining algorithms in order to present only the rules and patterns that are potentially interesting. This thesis develops and tests an architecture and algorithms for performing Web Usage Mining. An evidence combination framework referred to as the information lter is developed to compare and combine usage, content, and structure information about a Web site. The information lter automatically identi es the discovered ...

