Results 1 - 10
of
169
Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
, 2003
"... Peer-to-peer (P2P) file sharing accounts for an astonishing volume of current Internet tra#c. This paper probes deeply into modern P2P file sharing systems and the forces that drive them. By doing so, we seek to increase our understanding of P2P file sharing workloads and their implications for futu ..."
Abstract
-
Cited by 333 (6 self)
- Add to MetaCart
Peer-to-peer (P2P) file sharing accounts for an astonishing volume of current Internet tra#c. This paper probes deeply into modern P2P file sharing systems and the forces that drive them. By doing so, we seek to increase our understanding of P2P file sharing workloads and their implications for future multimedia workloads. Our research uses a three-tiered approach. First, we analyze a 200-day trace of over 20 terabytes of Kazaa P2P tra#c collected at the University of Washington. Second, we develop a model of multimedia workloads that lets us isolate, vary, and explore the impact of key system parameters. Our model, which we parameterize with statistics from our trace, lets us confirm various hypotheses about file-sharing behavior observed in the trace. Third, we explore the potential impact of localityawareness in Kazaa.
Democratizing content publication with Coral
- In NSDI
, 2004
"... CoralCDN is a peer-to-peer content distribution network that allows a user to run a web site that offers high performance and meets huge demand, all for the price of a cheap broadband Internet connection. Volunteer sites that run CoralCDN automatically replicate content as a side effect of users acc ..."
Abstract
-
Cited by 242 (22 self)
- Add to MetaCart
CoralCDN is a peer-to-peer content distribution network that allows a user to run a web site that offers high performance and meets huge demand, all for the price of a cheap broadband Internet connection. Volunteer sites that run CoralCDN automatically replicate content as a side effect of users accessing it. Publishing through CoralCDN is as simple as making a small change to the hostname in an object's URL; a peer-to-peer DNS layer transparently redirects browsers to nearby participating cache nodes, which in turn cooperate to minimize load on the origin web server. One of the system's key goals is to avoid creating hot spots that might dissuade volunteers and hurt performance. It achieves this through Coral, a latency-optimized hierarchical indexing infrastructure based on a novel abstraction called a distributed sloppy hash table, or DSHT.
An Analysis of Internet Content Delivery Systems
, 2002
"... In the span of only a few years, the Internet has experienced an astronomical increase in the use of specialized content delivery systems, such as content delivery networks and peer-to-peer file sharing systems. Therefore, an understanding of content delivery on the Internet now requires a detailed ..."
Abstract
-
Cited by 239 (10 self)
- Add to MetaCart
In the span of only a few years, the Internet has experienced an astronomical increase in the use of specialized content delivery systems, such as content delivery networks and peer-to-peer file sharing systems. Therefore, an understanding of content delivery on the Internet now requires a detailed understanding of how these systems are used in practice. This paper examines content delivery from the point of view of four content delivery systems: HTTP web traffic, the Akamai content delivery network, and Kazaa and Gnutella peer-to-peer file sharing traffic. We collected a trace of all incoming and outgoing network traffic at the University of Washington, a large university with over 60,000 students, faculty, and staff. From this trace, we isolated and characterized traffic belonging to each of these four delivery classes. Our results (1) quantify the rapidly increasing importance of new content delivery systems, particularly peerto-peer networks, (2) characterize the behavior of these systems from the perspectives of clients, objects, and servers, and (3) derive implications for caching in these systems. 1
Squirrel: A decentralized peer-to-peer web cache
, 2002
"... This paper presents a decentralized, peer-to-peer web cache called Squirrel. The key idea is to enable web browsers on desktop machines to share their local caches, to form an efficient and scalable web cache, without the need for dedicated hardware and the associated administrative cost. We propose ..."
Abstract
-
Cited by 155 (2 self)
- Add to MetaCart
This paper presents a decentralized, peer-to-peer web cache called Squirrel. The key idea is to enable web browsers on desktop machines to share their local caches, to form an efficient and scalable web cache, without the need for dedicated hardware and the associated administrative cost. We propose and evaluate decentralized web caching algorithms for Squirrel, and discover that it exhibits performance comparable to a centralized web cache in terms of hit ratio, bandwidth usage and latency. It also achieves the benefits of decentralization, such as being scalable, self-organizing and resilient to node failures, while imposing low overhead on the participating nodes. 1.
Characterizing user behavior and network performance in a public wireless LAN
- in: Proceedings of ACM SIGMETRICS, Marina Del Rey, 2002
"... This paper presents and analyzes user behavior and network performance in a public-area wireless network using a workload captured at a well-attended ACM conference. The goals of our study are: (1) to extend our understanding of wireless user behavior and wireless network performance; (2) to charact ..."
Abstract
-
Cited by 150 (14 self)
- Add to MetaCart
This paper presents and analyzes user behavior and network performance in a public-area wireless network using a workload captured at a well-attended ACM conference. The goals of our study are: (1) to extend our understanding of wireless user behavior and wireless network performance; (2) to characterize wireless users in terms of a parameterized model for use with analytic and simulation studies involving wireless LAN traffic; and (3) to apply our workload analysis results to issues in wireless network deployment, such as capacity planning, and potential network optimizations, such as algorithms for load balancing across multiple access points (APs) in a wireless network. 1.
Dns performance and the effectiveness of caching
- IEEE/ACM Transactions on Networking
, 2001
"... Abstract—This paper presents a detailed analysis of traces of domain name system (DNS) and associated TCP traffic collected ..."
Abstract
-
Cited by 127 (6 self)
- Add to MetaCart
Abstract—This paper presents a detailed analysis of traces of domain name system (DNS) and associated TCP traffic collected
Giggle: A Framework for Constructing Scalable Replica Location Services
, 2002
"... In wide area computing systems, it is often desirable to create remote read-only copies (replicas) of files. Replication can be used to reduce access latency, improve data locality, and/or increase robustness, scalability and performance for distributed applications. We define a replica location ser ..."
Abstract
-
Cited by 122 (36 self)
- Add to MetaCart
In wide area computing systems, it is often desirable to create remote read-only copies (replicas) of files. Replication can be used to reduce access latency, improve data locality, and/or increase robustness, scalability and performance for distributed applications. We define a replica location service (RLS) as a system that maintains and provides access to information about the physical locations of copies. An RLS typically functions as one component of a data grid architecture. This paper makes the following contributions. First, we characterize RLS requirements. Next, we describe a parameterized architectural framework, which we name Giggle (for GIGa-scale Global Location Engine), within which a wide range of RLSs can be defined. We define several concrete instantiations of this framework with different performance characteristics. Finally, we present initial performance results for an RLS prototype, demonstrating that RLS systems can be constructed that meet performance goals.
Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications
, 2002
"... In high energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate large data sets. Socalled Data Grids seek to harness geographically distributed resources for such large-scale data-intensive problems. Yet ..."
Abstract
-
Cited by 121 (7 self)
- Add to MetaCart
In high energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate large data sets. Socalled Data Grids seek to harness geographically distributed resources for such large-scale data-intensive problems. Yet effective scheduling in such environments is challenging, due to a need to address a variety of metrics and constraints (e.g., resource utilization, response time, global and local allocation policies) while dealing with multiple, potentially independent sources of jobs and a large number of storage, compute, and network resources.
The Content and Access Dynamics of a Busy Web Site: Findings and Implications
, 2000
"... In this paper, we study the dynamics of the MSNBC news site, one of the busiest Web sites in the Internet today. Unlike many other efforts that have analyzed client accesses as seen by proxies, we focus on the server end. We analyze the dynamics of both the server content and client accesses made to ..."
Abstract
-
Cited by 104 (9 self)
- Add to MetaCart
In this paper, we study the dynamics of the MSNBC news site, one of the busiest Web sites in the Internet today. Unlike many other efforts that have analyzed client accesses as seen by proxies, we focus on the server end. We analyze the dynamics of both the server content and client accesses made to the server. The former considers the content creation and modification process while the latter considers page popularity and locality in client accesses. Some of our key results are: (a) files tend to change little when they are modified, (b) a small set of files tends to get modified repeatedly, (c) file popularity follows a Zipf-like distribution with a parameter ff that is much larger than reported in previous, proxy-based studies, and (d) there is significant temporal stability in file popularity but not much stability in the domains from which clients access the popular content. We discuss the implications of these findings for techniques such as Web caching (including cache consisten...
End-to-end WAN Service Availability
- In Proc. 3rd USITS
, 2001
"... This study seeks to understand how network failures affect the availability of service delivery across wide area networks and to evaluate classes of techniques for improving end-to-end service availability. Using several large-scale connectivity traces, we develop a model of network unavailability t ..."
Abstract
-
Cited by 96 (14 self)
- Add to MetaCart
This study seeks to understand how network failures affect the availability of service delivery across wide area networks and to evaluate classes of techniques for improving end-to-end service availability. Using several large-scale connectivity traces, we develop a model of network unavailability that includes key parameters such as failure location and failure duration. We then use trace-based simulation to evaluate several classes of techniques for coping with network unavailability. We find that caching alone is seldom effective at insulating services from failures but that the combination of mobile extension code and prefetching can improve average unavailability by as much as an order of magnitude for classes of service whose semantics support disconnected operation. We find that routing-based techniques may provide significant improvements, but that the improvements of many individual techniques are limited because they do not address all significant categories of network failures. By combining the techniques we examine, some systems may be able to reduce average unavailability by as much as one or two orders of magnitude.

