• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

File system support for delta compression (2000)

by J MacDonald
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 37
Next 10 →

Self-securing Storage: Protecting Data in Compromised Systems

by John D. Strunk, Garth R. Goodson, Michael L. Scheinholtz, Craig A. N. Soules, Gregory R. Ganger - SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION , 2000
"... Self-securing storage prevents intruders from undetectably tampering with or permanently deleting stored data. To accomplish this, self-securing storage devices internally audit all requests and keep old versions of data for a window of time, regardless of the commands received from potentially comp ..."
Abstract - Cited by 118 (17 self) - Add to MetaCart
Self-securing storage prevents intruders from undetectably tampering with or permanently deleting stored data. To accomplish this, self-securing storage devices internally audit all requests and keep old versions of data for a window of time, regardless of the commands received from potentially compromised host operating systems. Within the window, system administrators have this valuable information for intrusion diagnosis and recovery. Our implementation, called S4, combines log-structuring with journal-based metadata to minimize the performance costs of comprehensive versioning. Experiments show that self-securing storage devices can deliver performance that is comparable with conventional storage systems. In addition, analyses indicate that several weeks worth of all versions can reasonably be kept on state-of-the-art disks, especially when differencing and compression technologies are employed.

Metadata efficiency in versioning file systems

by Craig A. N. Soules, Garth R. Goodson, John D. Strunk, Gregory R. Ganger - Conference on File and Storage Technologies (San Francisco, CA, 31 March–02 April 2003 , 2003
"... Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. ..."
Abstract - Cited by 75 (11 self) - Add to MetaCart
Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein.

Compactly Encoding Unstructured Inputs with Differential Compression

by Miklos Ajtai, Randal Burns, Ronald Fagin, Darrell D. E. Long, Larry Stockmeyer - JOURNAL OF THE ACM , 2002
"... The subject of this article is differential compression, the algorithmic task of finding common strings between versions of data and using them to encode one version compactly by describing it as a set of changes from its companion. A main goal of this work is to present new differencing algorithms ..."
Abstract - Cited by 35 (8 self) - Add to MetaCart
The subject of this article is differential compression, the algorithmic task of finding common strings between versions of data and using them to encode one version compactly by describing it as a set of changes from its companion. A main goal of this work is to present new differencing algorithms that (i) operate at a fine granularity (the atomic unit of change), (ii) make no assumptions about the format or alignment of input data, and (iii) in practice use linear time, use constant space, and give good compression. We present new algorithms, which do not always compress optimally but use considerably less time or space than existing algorithms. One new algorithm runs in O(n) time and O(1) space in the worst case (where each unit of space contains n# bits), as compared to

Remote incremental linking for energy-efficient reprogramming of sensor networks

by Joel Koshy - In Proceedings of the second European Workshop on Wireless Sensor Networks , 2005
"... With sensor networks expected to be deployed for long periods of time, the ability to reprogram them remotely is necessary for providing new services, fixing bugs, and enhancing applications and system software. Given the envisioned scales of future sensor network deployments, their restricted acces ..."
Abstract - Cited by 26 (4 self) - Add to MetaCart
With sensor networks expected to be deployed for long periods of time, the ability to reprogram them remotely is necessary for providing new services, fixing bugs, and enhancing applications and system software. Given the envisioned scales of future sensor network deployments, their restricted accessibility, and the limited energy and computing resources of sensors, transmitting raw binary images is inefficient. We present a technique to minimize the cost of application evolution by remotely and incrementally linking updated modules at the base station, and distributing deltas of the pre-linked software modules. This paper provides details of our implementation, some preliminary results, and surveys critical research issues in developing a comprehensive framework for reprogramming sensor networks. 1.

Metadata Efficiency in a Comprehensive Versioning File System

by Craig A. N. Soules, Garth R. Goodson, John D. Strunk, Gregory R Ganger - In Proceedings of USENIX Conference on File and Storage Technologies , 2002
"... A comprehensive versioning file system creates and retains a new file version for every WRITE or other modification request. The resulting history of file modifications provides a detailed view to tools and administrators seeking to investigate a suspect system state. Conventional versioning systems ..."
Abstract - Cited by 21 (2 self) - Add to MetaCart
A comprehensive versioning file system creates and retains a new file version for every WRITE or other modification request. The resulting history of file modifications provides a detailed view to tools and administrators seeking to investigate a suspect system state. Conventional versioning systems do not efficiently record the many prior versions that result. In particular, the versioned metadata they keep consumes almost as much space as the versioned data. This paper examines two space-efficient metadata structures for versioning file systems and describes their integration into the Comprehensive Versioning File System (CVFS). Journal-based metadata encodes each metadata version into a single journal entry; CVFS uses this structure for inodes and indirect blocks, reducing the associated space requirements by 80%. Multiversion b-trees extend the per-entry key with a timestamp and keep current and historical entries in a single tree; CVFS uses this structure for directories, reducing the associated space requirements by 99%. Experiments with CVFS verify that its current-version performance is similar to that of non-versioning file systems. Although access to historical versions is slower than conventional versioning systems, checkpointing is shown to mitigate this effect.

Evaluation of Efficient Archival Storage Techniques

by Lawrence L. You, Christos Karamanolis , 2004
"... The ever-increasing volume of archival data that need to be retained for long periods of time has motivated the design of low-cost, high-efficiency storage systems. Inter-file compression has been proposed as a technique to improve storage efficiency by exploiting the high degree of similarity among ..."
Abstract - Cited by 17 (2 self) - Add to MetaCart
The ever-increasing volume of archival data that need to be retained for long periods of time has motivated the design of low-cost, high-efficiency storage systems. Inter-file compression has been proposed as a technique to improve storage efficiency by exploiting the high degree of similarity among archival data. We evaluate the two main inter-file compression techniques, data chunking and delta encoding, and compare them with traditional intra-file compression. We report on experimental results from a range of representative archival data sets.

Why can't I find my files? New methods for automating attribute assignment

by Craig A. N. Soules, Gregory R. Ganger - PROCEEDINGS OF THE NINTH WORKSHOP ON HOT TOPICS IN OPERATING SYSTEMS , 2003
"... Attribute-based naming enables powerful search and organization tools for ever-increasing user data sets. However, such tools are only useful in combination with accurate attribute assignment. Existing systems rely on user input and content analysis, but they have enjoyed minimal success. This paper ..."
Abstract - Cited by 16 (2 self) - Add to MetaCart
Attribute-based naming enables powerful search and organization tools for ever-increasing user data sets. However, such tools are only useful in combination with accurate attribute assignment. Existing systems rely on user input and content analysis, but they have enjoyed minimal success. This paper discusses new approaches to automatically assigning attributes to files, including several forms of context analysis, which has been highly successful in the Google web search engine. With extensions like application hints (e.g., web links for downloaded files) and inter-file relationships, it should be possible to infer useful attributes for many files, making attribute-based search tools more effective.

Improved File Synchronization Techniques for Maintaining Large Replicated Collections over Slow Networks

by Torsten Suel, Patrick Noel, Dimitre Trendafilov - IN PROC. OF THE INT. CONF. ON DATA ENGINEERING , 2004
"... We study the problem of maintaining large replicated collections of files or documents in a distributed environment with limited bandwidth. This problem arises in a number of important applications, such as synchronization of data between accounts or devices, content distibution and web caching netw ..."
Abstract - Cited by 14 (5 self) - Add to MetaCart
We study the problem of maintaining large replicated collections of files or documents in a distributed environment with limited bandwidth. This problem arises in a number of important applications, such as synchronization of data between accounts or devices, content distibution and web caching networks, web site mirroring, storage networks, and large scale web search and mining. At the core of the problem lies the following challenge, called the file synchronization problem: given two versions of a file on different machines, say an outdated and a current one, how can we update the outdated version with minimum communication cost, by exploiting the significant similarity between the versions? While a popular open source tool for this problem called rsync is used in hundreds of thousands of installations, there have been only very few attempts to improve upon this tool in practice. In this paper,

Cluster-Based Delta Compression of a Collection of Files

by Zan Ouyang, Zan Ouyang, Nasir Memon, Nasir Memon, Torsten Suel, Torsten Suel, Dimitre Trendafilov, Dimitre Trendafilov - In Third Int. Conf. on Web Information Systems Engineering , 2002
"... Delta compression techniques are commonly used to succinctly represent an updated version of a file with respect to an earlier one. In this paper, we study the use of delta compression in a somewhat different scenario, where we wish to compress a large collection of (more or less) related files by p ..."
Abstract - Cited by 13 (5 self) - Add to MetaCart
Delta compression techniques are commonly used to succinctly represent an updated version of a file with respect to an earlier one. In this paper, we study the use of delta compression in a somewhat different scenario, where we wish to compress a large collection of (more or less) related files by performing a sequence of pairwise delta compressions. The problem of finding an optimal delta encoding for a collection of files by taking pairwise deltas can be reduced to the problem of computing a branching of maximum weight in a weighted directed graph, but this solution is inefficient and thus does not scale to larger file collections. This motivates us to propose a framework for cluster-based delta compression that uses text clustering techniques to prune the graph of possible pairwise delta encodings. To demonstrate the efficacy of our approach, we present experimental results on collections of web pages. Our experiments show that cluster-based delta compression of collections provides significant improvements in compression ratio as compared to individually compressing each file or using tar+gzip, at a moderate cost in efficiency.

Algorithms for Delta Compression and Remote File Synchronization

by Torsten Suel, Nasir Memon - In Khalid Sayood, editor, Lossless Compression Handbook , 2002
"... Delta compression and remote file synchronization techniques are concerned with efficient file transfer over a slow communication link in the case where the receiving party already has a similar file (or files). This problem arises naturally, e.g., when distributing updated versions of software o ..."
Abstract - Cited by 13 (8 self) - Add to MetaCart
Delta compression and remote file synchronization techniques are concerned with efficient file transfer over a slow communication link in the case where the receiving party already has a similar file (or files). This problem arises naturally, e.g., when distributing updated versions of software over a network or synchronizing personal files between different accounts and devices. More generally, the problem is becoming increasingly common in many networkbased applications where files and content are widely replicated, frequently modified, and cut and reassembled in different contexts and packagings.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University