Results 1 - 10
of
28
Planetlab: An overlay testbed for broad-coverage services
- ACM SIGCOMM Computer Communication Review
, 2003
"... PlanetLab is a global overlay network for developing and accessing broad-coverage network services. Our goal is to grow to 1000 geographically distributed nodes, connected by a diverse collection of links. PlanetLab allows multiple services to run concurrently and continuously, each in its own slice ..."
Abstract
-
Cited by 237 (3 self)
- Add to MetaCart
PlanetLab is a global overlay network for developing and accessing broad-coverage network services. Our goal is to grow to 1000 geographically distributed nodes, connected by a diverse collection of links. PlanetLab allows multiple services to run concurrently and continuously, each in its own slice of PlanetLab. This paper describes our initial implementation of PlanetLab, including the mechanisms used to implement virtualization, and the collection of core services used to manage PlanetLab. 1.
Operating System Support for Planetary-Scale Network Services
, 2004
"... PlanetLab is a geographically distributed overlay network designed to support the deployment and evaluation of planetary-scale network services. Two high-level goals shape its design. First, to enable a large research community to share the infrastructure, PlanetLab provides distributed virtualizati ..."
Abstract
-
Cited by 179 (17 self)
- Add to MetaCart
PlanetLab is a geographically distributed overlay network designed to support the deployment and evaluation of planetary-scale network services. Two high-level goals shape its design. First, to enable a large research community to share the infrastructure, PlanetLab provides distributed virtualization, whereby each service runs in an isolated slice of PlanetLab’s global resources. Second, to support competition among multiple network services, PlanetLab decouples the operating system running on each node from the networkwide services that define PlanetLab, a principle referred to as unbundled management. This paper describes how Planet-Lab realizes the goals of distributed virtualization and unbundled management, with a focus on the OS running on each node. 1
Opportunistic Use of Content Addressable Storage for Distributed File Systems
- IN PROCEEDINGS OF THE 2003 USENIX ANNUAL TECHNICAL CONFERENCE
, 2003
"... Motivated by the prospect of readily available Content Addressable Storage (CAS), we introduce the concept of file recipes. A file's recipe is a first-class file system object listing content hashes that describe the data blocks composing the file. File recipes provide applications with instructions ..."
Abstract
-
Cited by 46 (11 self)
- Add to MetaCart
Motivated by the prospect of readily available Content Addressable Storage (CAS), we introduce the concept of file recipes. A file's recipe is a first-class file system object listing content hashes that describe the data blocks composing the file. File recipes provide applications with instructions for reconstructing the original file from available CAS data blocks. We describe one such application of recipes, the CASPER distributed file system. A CASPER client opportunistically fetches blocks from nearby CAS providers to improve its performance when the connection to a file server traverses a low-bandwidth path. We use measurements of our prototype to evaluate its performance under varying network conditions. Our results demonstrate significant improvements in execution times of applications that use a network file system. We conclude by describing fuzzy block matching, a promising technique for using approximately matching blocks on CAS providers to reconstitute the exact desired contents of a file at a client.
Redundancy Elimination Within Large Collections of Files
, 2004
"... Ongoing advancements in technology lead to everincreasing storage capacities. In spite of this, optimizing storage usage can still provide rich dividends. Several techniques based on delta-encoding and duplicate block suppression have been shown to reduce storage overheads, with varying requirements ..."
Abstract
-
Cited by 45 (2 self)
- Add to MetaCart
Ongoing advancements in technology lead to everincreasing storage capacities. In spite of this, optimizing storage usage can still provide rich dividends. Several techniques based on delta-encoding and duplicate block suppression have been shown to reduce storage overheads, with varying requirements for resources such as computation and memory. We propose a new scheme for storage reduction that reduces data sizes with an effectiveness comparable to the more expensive techniques, but at a cost comparable to the faster but less effective ones. The scheme, called Redundancy Elimination at the Block Level (REBL), leverages the benefits of compression, duplicate block suppression, and delta-encoding to eliminate a broad spectrum of redundant data in a scalable and efficient manner. REBL generally encodes more compactly than compression (up to a factor of 14) and a combination of compression and duplicate suppression (up to a factor of 6.7). REBL also encodes similarly to a technique based on delta-encoding, reducing overall space significantly in one case. Furthermore, REBL uses super-fingerprints, a technique that reduces the data needed to identify similar blocks while dramatically reducing the computational requirements of matching the blocks: it turns comparisons into hash table lookups. As a result, using super-fingerprints to avoid enumerating matching data objects decreases computation in the resemblance detection phase of REBL by up to a couple orders of magnitude.
A five-year study of file-system metadata
- In Proceedings of the 5th USENIX Conference on File and Storage Technologies. USENIX Association
, 2007
"... For five years, we collected annual snapshots of file-system metadata from over 60,000 Windows PC file systems in a large corporation. In this article, we use these snapshots to study temporal changes in file size, file age, file-type frequency, directory size, namespace structure, file-system popul ..."
Abstract
-
Cited by 37 (4 self)
- Add to MetaCart
For five years, we collected annual snapshots of file-system metadata from over 60,000 Windows PC file systems in a large corporation. In this article, we use these snapshots to study temporal changes in file size, file age, file-type frequency, directory size, namespace structure, file-system population, storage capacity and consumption, and degree of file modification. We present a generative model that explains the namespace structure and the distribution of directory sizes. We find significant temporal trends relating to the popularity of certain file types, the origin of file content, the way the namespace is used, and the degree of variation among file systems, as well as more pedestrian changes in size and capacities. We give examples of consequent lessons for designers of file systems and related software.
Design tradeoffs in applying content addressable storage to enterprise-scale systems based on virtual machines
- In Proc. USENIX Annual Techincal Conference
, 2006
"... ..."
Duplicate data elimination in a san file system
- In In Proceedings of the 21st IEEE / 12th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST
, 2004
"... Duplicate Data Elimination (DDE) is our method for identifying and coalescing identical data blocks in Storage Tank, a SAN file system. On-line file systems pose a unique set of performance and implementation challenges for this feature. Existing techniques, which are used to improve both storage an ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Duplicate Data Elimination (DDE) is our method for identifying and coalescing identical data blocks in Storage Tank, a SAN file system. On-line file systems pose a unique set of performance and implementation challenges for this feature. Existing techniques, which are used to improve both storage and network utilization, do not satisfy these constraints. Our design employs a combination of content hashing, copy-on-write, and lazy updates to achieve its functional and performance goals. DDE executes primarily as a background process. The design also builds on Storage Tank’s FlashCopy function to ease implementation. 1 We include an analysis of selected real-world data sets that is aimed at demonstrating the space-saving potential of coalescing duplicate data. Our results show that DDE can reduce storage consumption by up to 80 % in some application environments. The analysis explores several additional features, such as the impact of varying file block size and the contribution of whole file duplication to the net savings. 1
Improving Duplicate Elimination in Storage Systems
"... Minimizing the amount of data that must be stored and managed is a key goal for any storage architecture that purports to be scalable. One way to achieve this goal is to avoid maintaining duplicate copies of the same data. Eliminating redundant data at the source by not writing data which has alread ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Minimizing the amount of data that must be stored and managed is a key goal for any storage architecture that purports to be scalable. One way to achieve this goal is to avoid maintaining duplicate copies of the same data. Eliminating redundant data at the source by not writing data which has already been stored, not only reduces storage overheads, but can also improve bandwidth utilization. For these reasons, in the face of today’s exponentially growing data volumes, redundant data elimination techniques have assumed critical significance in the design of modern storage systems. Intelligent object partitioning techniques identify data that are new when objects are updated, and transfer only those chunks to a storage server. In this paper, we propose a new object partitioning technique, called fingerdiff, that improves upon existing schemes in several important respects. Most notably fingerdiff dynamically chooses a partitioning strategy for a data object based on its similarities with previously stored objects in order to improve storage and bandwidth utilization. We present a detailed evaluation of fingerdiff, and other existing object partitioning schemes, using a set of real-world workloads. We show that for these workloads, the duplicate elimination strategies employed by fingerdiff improve storage utilization on average by 25%, and bandwidth utilization on average by 40 % over comparable techniques.
Improving Mobile Database Access over Wide-area Networks without Degrading Consistency
- In Proceedings of the 5th International Conference on Mobile Systems, Applications and Services
, 2007
"... We report on the design, implementation, and evaluation of a system called Cedar that enables mobile database access with good performance over low-bandwidth networks. This is accomplished without degrading consistency. Cedar exploits the disk storage and processing power of a mobile client to compe ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We report on the design, implementation, and evaluation of a system called Cedar that enables mobile database access with good performance over low-bandwidth networks. This is accomplished without degrading consistency. Cedar exploits the disk storage and processing power of a mobile client to compensate for weak connectivity. Its central organizing principle is that even a stale client replica can be used to reduce data transmission volume from a database server. The reduction is achieved by using content addressable storage to discover and elide commonality between client and server results. This organizing principle allows Cedar to use an optimistic approach to solving the difficult problem of database replica control. For laptop-class clients, our experiments show that Cedar improves the throughput of read-write workloads by 39 % to as much as 224 % while reducing response time by 28 % to as much as 79%.
Jumbo Store: Providing Efficient Incremental Upload and Versioning for a Utility Rendering Service
"... We have developed a new storage system called the Jumbo Store (JS) based on encoding directory tree snapshots as graphs called HDAGs whose nodes are small variable-length chunks of data and whose edges are hash pointers. We store or transmit each node only once and encode using landmark-based chunki ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
We have developed a new storage system called the Jumbo Store (JS) based on encoding directory tree snapshots as graphs called HDAGs whose nodes are small variable-length chunks of data and whose edges are hash pointers. We store or transmit each node only once and encode using landmark-based chunking plus some new tricks. This leads to very efficient incremental upload and storage of successive snapshots: we report compression factors over 16x for real data; a comparison shows that our incremental upload sends only 1/5 as much data as Rsync. To demonstrate the utility of the Jumbo Store, we have integrated it into HP Labs ’ prototype Utility Rendering Service (URS), which accepts rendering data in the form of directory tree snapshots from small teams of animators, renders one or more requested frames using a processor farm, and then makes the rendered frames available for download. Efficient incremental upload is crucial to the URS’s usability and responsiveness because of the teams ’ slow Internet connections. We report on the JS’s performance during a major field test of the URS where the URS was offered to 11 groups of animators for 10 months during an animation showcase to create high-quality short animations. 1

