Results 1 - 10
of
32
Wide-area cooperative storage with CFS
, 2001
"... The Cooperative File System (CFS) is a new peer-to-peer readonly storage system that provides provable guarantees for the efficiency, robustness, and load-balance of file storage and retrieval. CFS does this with a completely decentralized architecture that can scale to large systems. CFS servers pr ..."
Abstract
-
Cited by 778 (52 self)
- Add to MetaCart
The Cooperative File System (CFS) is a new peer-to-peer readonly storage system that provides provable guarantees for the efficiency, robustness, and load-balance of file storage and retrieval. CFS does this with a completely decentralized architecture that can scale to large systems. CFS servers provide a distributed hash table (DHash) for block storage. CFS clients interpret DHash blocks as a file system. DHash distributes and caches blocks at a fine granularity to achieve load balance, uses replication for robustness, and decreases latency with server selection. DHash finds blocks using the Chord location protocol, which operates in time logarithmic in the number of servers. CFS is implemented using the SFS file system toolkit and runs on Linux, OpenBSD, and FreeBSD. Experience on a globally deployed prototype shows that CFS delivers data to clients as fast as FTP. Controlled tests show that CFS is scalable: with 4,096 servers, looking up a block of data involves contacting only seven servers. The tests also demonstrate nearly perfect robustness and unimpaired performance even when as many as half the servers fail.
Democratizing content publication with Coral
- In NSDI
, 2004
"... CoralCDN is a peer-to-peer content distribution network that allows a user to run a web site that offers high performance and meets huge demand, all for the price of a cheap broadband Internet connection. Volunteer sites that run CoralCDN automatically replicate content as a side effect of users acc ..."
Abstract
-
Cited by 242 (22 self)
- Add to MetaCart
CoralCDN is a peer-to-peer content distribution network that allows a user to run a web site that offers high performance and meets huge demand, all for the price of a cheap broadband Internet connection. Volunteer sites that run CoralCDN automatically replicate content as a side effect of users accessing it. Publishing through CoralCDN is as simple as making a small change to the hostname in an object's URL; a peer-to-peer DNS layer transparently redirects browsers to nearby participating cache nodes, which in turn cooperate to minimize load on the origin web server. One of the system's key goals is to avoid creating hot spots that might dissuade volunteers and hurt performance. It achieves this through Coral, a latency-optimized hierarchical indexing infrastructure based on a novel abstraction called a distributed sloppy hash table, or DSHT.
Squirrel: A decentralized peer-to-peer web cache
, 2002
"... This paper presents a decentralized, peer-to-peer web cache called Squirrel. The key idea is to enable web browsers on desktop machines to share their local caches, to form an efficient and scalable web cache, without the need for dedicated hardware and the associated administrative cost. We propose ..."
Abstract
-
Cited by 155 (2 self)
- Add to MetaCart
This paper presents a decentralized, peer-to-peer web cache called Squirrel. The key idea is to enable web browsers on desktop machines to share their local caches, to form an efficient and scalable web cache, without the need for dedicated hardware and the associated administrative cost. We propose and evaluate decentralized web caching algorithms for Squirrel, and discover that it exhibits performance comparable to a centralized web cache in terms of hit ratio, bandwidth usage and latency. It also achieves the benefits of decentralization, such as being scalable, self-organizing and resilient to node failures, while imposing low overhead on the participating nodes. 1.
PRACTI replication
- IN PROC NSDI
, 2006
"... We present PRACTI, a new approach for large-scale replication. PRACTI systems can replicate or cache any subset of data on any node (Partial Replication), provide a broad range of consistency guarantees (Arbitrary Consistency), and permit any node to send information to any other node (Topology Inde ..."
Abstract
-
Cited by 41 (14 self)
- Add to MetaCart
We present PRACTI, a new approach for large-scale replication. PRACTI systems can replicate or cache any subset of data on any node (Partial Replication), provide a broad range of consistency guarantees (Arbitrary Consistency), and permit any node to send information to any other node (Topology Independence). A PRACTI architecture yields two significant advantages. First, by providing all three PRACTI properties, it enables better trade-offs than existing mechanisms that support at most two of the three desirable properties. The PRACTI approach thus exposes new points in the design space for replication systems. Second, the flexibility of PRACTI protocols simplifies the design of replication systems by allowing a single architecture to subsume a broad range of existing systems and to reduce development costs for new ones. To illustrate both advantages, we use our PRACTI prototype to emulate existing server replication, client-server, and object replication systems and to implement novel policies that improve performance for mobile users, web edge servers, and grid computing by as much as an order of magnitude.
Analysis and Characterization of Large-Scale Web Server Access Patterns and Performance
- World Wide Web
, 1999
"... In this paper we develop a general methodology for characterizing the access patterns of Web server requests based on a time-series analysis of finite collections of observed data from real systems. Our approach is used together with the access logs from the IBM Web site for the Olympic Games to dem ..."
Abstract
-
Cited by 40 (7 self)
- Add to MetaCart
In this paper we develop a general methodology for characterizing the access patterns of Web server requests based on a time-series analysis of finite collections of observed data from real systems. Our approach is used together with the access logs from the IBM Web site for the Olympic Games to demonstrate some of its advantages over previous methods and to construct a particular class of benchmarks for large-scale heavily-accessed Web server environments. We then apply an instance of this class of benchmarks to analyze aspects of large-scale Web server performance, demonstrating some additional problems with commonly used methods to evaluate Web server performance at different request traffic intensities.
A Survey of Proxy Cache Evaluation Techniques
- In Proceedings of the Fourth International Web Caching Workshop (WCW99
, 1999
"... Proxy caches are increasingly used around the world to reduce bandwidth requirements and alleviate delays associated with the World-Wide Web. In order to compare proxy cache performances, objective measurements must be made. In this paper, we define a space of proxy evaluation methodologies based on ..."
Abstract
-
Cited by 28 (8 self)
- Add to MetaCart
Proxy caches are increasingly used around the world to reduce bandwidth requirements and alleviate delays associated with the World-Wide Web. In order to compare proxy cache performances, objective measurements must be made. In this paper, we define a space of proxy evaluation methodologies based on source of workload used and form of algorithm implementation. We then survey recent publications and show their locations within this space. 1 Introduction Proxy caches are increasingly used around the world to reduce bandwidth and alleviate delays associated with the World-Wide Web. This paper describes the space of proxy cache evaluation methodologies and places current research within that space. The primary contributions of this paper are threefold: 1) definition and description of the space of evaluation techniques; 2) appraisal of the di#erent methods within that space; and 3) a survey of cache evaluation techniques from the research literature. In the next section we provide backgro...
Freeloader: Scavenging desktop storage resources for scientific data
- IN PROCEEDINGS OF SUPERCOMPUTING
, 2005
"... High-end computing is suffering a data deluge from experiments, simulations, and apparatus that creates overwhelming application dataset sizes. End-user workstations—despite more processing power than ever before—are ill-equipped to cope with such data demands due to insufficient secondary storage s ..."
Abstract
-
Cited by 23 (11 self)
- Add to MetaCart
High-end computing is suffering a data deluge from experiments, simulations, and apparatus that creates overwhelming application dataset sizes. End-user workstations—despite more processing power than ever before—are ill-equipped to cope with such data demands due to insufficient secondary storage space and I/O rates. Meanwhile, a large portion of desktop storage is unused. We present the FreeLoader framework, which aggregates unused desktop storage space and I/O bandwidth into a shared cache/scratch space, for hosting large, immutable datasets and exploiting data access locality. Our experiments show that FreeLoader is an appealing low-cost solution to storing massive datasets, by delivering higher data access rates than traditional storage facilities. In particular, we present novel data striping techniques that allow FreeLoader to efficiently aggregate a workstation’s network communication bandwidth and local I/O bandwidth. In addition, the performance impact on the native workload of donor machines is small and can be effectively controlled.
HashCache: Cache Storage for the Next Billion
"... We present HashCache, a configurable cache storage engine designed to meet the needs of cache storage in the developing world. With the advent of cheap commodity laptops geared for mass deployments, developing regions are poised to become major users of the Internet, and given the high cost of bandw ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
We present HashCache, a configurable cache storage engine designed to meet the needs of cache storage in the developing world. With the advent of cheap commodity laptops geared for mass deployments, developing regions are poised to become major users of the Internet, and given the high cost of bandwidth in these parts of the world, they stand to gain significantly from network caching. However, current Web proxies are incapable of providing large storage capacities while using small resource footprints, a requirement for the integrated multi-purpose servers needed to effectively support developing-world deployments. Hash-Cache presents a radical departure from the conventional wisdom in network cache design, and uses 6 to 20 times less memory than current techniques while still providing comparable or better performance. As such, Hash-Cache can be deployed in configurations not attainable with current approaches, such as having multiple terabytes of external storage cache attached to low-powered machines. HashCache has been successfully deployed in two locations in Africa, and further deployments are in progress. 1
PRACTI Replication for Large-Scale Systems
, 2004
"... Many replication mechanisms for large scale distributed systems exist, but they require a designer to compromise a system's replication policy (e.g., by requiring full replication of all data to all nodes), consistency policy (e.g., by supporting per-object coherence but not multiobject consistency) ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
Many replication mechanisms for large scale distributed systems exist, but they require a designer to compromise a system's replication policy (e.g., by requiring full replication of all data to all nodes), consistency policy (e.g., by supporting per-object coherence but not multiobject consistency), or topology policy (e.g., by assuming a hierarchical organization of nodes.) In this paper, we present the first PRACTI (Partial Replication, Arbitrary Consistency, and Topology Independence) mechanisms for replication in large scale systems. These new mechanisms allow construction of systems that replicate or cache any data on any node, that provide a broad range of consistency and coherence guarantees, and that permit any node to communicate with any other node at any time. Our evaluation of a prototype suggests that by disentangling mechanism from policy, PRACTI replication enables better trade-offs for system designers than possible with existing mechanisms. For example, for one workload we study, PRACTI's partial replication reduces bandwidth requirements by over an order of magnitude compared to full replication for nodes that only care about a subset of the system's data.
Constructing collaborative desktop storage caches for large scientific datasets
- ACM Transaction on Storage (TOS
, 2006
"... or classroom use provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
or classroom use provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and

