Results 1 - 10
of
14
Compactly Encoding Unstructured Inputs with Differential Compression
- JOURNAL OF THE ACM
, 2002
"... The subject of this article is differential compression, the algorithmic task of finding common strings between versions of data and using them to encode one version compactly by describing it as a set of changes from its companion. A main goal of this work is to present new differencing algorithms ..."
Abstract
-
Cited by 35 (8 self)
- Add to MetaCart
The subject of this article is differential compression, the algorithmic task of finding common strings between versions of data and using them to encode one version compactly by describing it as a set of changes from its companion. A main goal of this work is to present new differencing algorithms that (i) operate at a fine granularity (the atomic unit of change), (ii) make no assumptions about the format or alignment of input data, and (iii) in practice use linear time, use constant space, and give good compression. We present new algorithms, which do not always compress optimally but use considerably less time or space than existing algorithms. One new algorithm runs in O(n) time and O(1) space in the worst case (where each unit of space contains n# bits), as compared to
A Hypermedia Version Control Framework
, 1998
"... The areas of application of hypermedia technology, combined with the capabilities that hypermedia provides for manipulating structure, create an environment in which version control is very important. A hypermedia version control framework has been designed to specifically address the version contro ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
The areas of application of hypermedia technology, combined with the capabilities that hypermedia provides for manipulating structure, create an environment in which version control is very important. A hypermedia version control framework has been designed to specifically address the version control problem in open hypermedia environments. One of the primary distinctions of the framework is the partitioning of hypermedia version control functionality into intrinsic and application-specific categories. The version control framework has been used as a model for the design of version control services for a hyperbase management system that provides complete version support for both data and structural entities. In addition to serving as a version control model for open hypermedia environments, the framework offers a clarifying and unifying context in which to examine the issues of version control in hypermedia.
Gras, A Graph-Oriented (software) Engineering Database System
- Information Systems
, 1995
"... Modern software systems for application areas like software engineering, CAD, or office automation are usually highly interactive and deal with rather complex object structures. For the realization of these systems a nonstandard database system is needed which is able to efficiently handle different ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
Modern software systems for application areas like software engineering, CAD, or office automation are usually highly interactive and deal with rather complex object structures. For the realization of these systems a nonstandard database system is needed which is able to efficiently handle different types of coarse- and fine-grained objects (like documents and paragraphs), hierarchical and non-hierarchical relations between objects (like composition-links and cross-references), and finally attributes of rather different size (like chapter numbers and bitmaps). Furthermore, this database system should support incremental computation of derived data, undo/redo of data modifications, error recovery from system crashes, and version control mechanisms. In this paper, we describe the underlying data model and the functionality of GRAS, a database system which has been designed according to the requirements mentioned above. Furthermore, we motivate our central design decisions concerning its ...
Differential Compression: A Generalized Solution For Binary Files
, 1996
"... Differential Compression: A Generalized Solution for Binary Files by Randal C. Burns This work presents the development and analysis of a family of algorithms for generating differentially compressed output from binary sources. The algorithms all perform the same fundamental task: given two versi ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
Differential Compression: A Generalized Solution for Binary Files by Randal C. Burns This work presents the development and analysis of a family of algorithms for generating differentially compressed output from binary sources. The algorithms all perform the same fundamental task: given two versions of the same data as input streams, generate and output a compact encoding of one of the input streams by representing it as a set of changes with respect to the other input stream. Differential compression provides a computationally efficient compression technique for applications that generate versioned data and we often expect differencing to produce a significantly more compact file than more traditional compression techniques. The greedy algorithm for file differencing is presented and this algorithm is proven to produce the optimally compressed differential output. However, this algorithm requires execution time quadratic in the size of the input files. We next present an algorithm...
A Linear Time, Constant Space Differencing Algorithm
- In Performance, Computing, and Communication Conference (IPCCC
, 1997
"... An efficient differencing algorithm can be used to compress version of files for both transmission over low bandwidth channels and compact storage. This can greatly reduce network traffic and execution time for distributed applications which include software distribution, source code control, file s ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
An efficient differencing algorithm can be used to compress version of files for both transmission over low bandwidth channels and compact storage. This can greatly reduce network traffic and execution time for distributed applications which include software distribution, source code control, file system replication, and data backup and restore. An algorithm for such applications needs to be both general and efficient; able to compress binary inputs in linear time. We present such an algorithm for differencing files at the granularity of a byte. The algorithm uses constant memory and handles arbitrarily large input files. While the algorithm makes minor sacrifices in compression to attain linear runtime performance, it outperforms the byte-wise differencing algorithms that we have encountered in the literature on all inputs. I. INTRODUCTION Differencing algorithms compress data by taking advantage of statistical correlations between different versions of the same data sets. Strictly ...
In-Place Reconstruction of Delta Compressed Files
- In Proceedings of the Seventeenth ACM Symposium on Principles of Distributed Computing
, 1998
"... We present and algorithm for modifying delta compressed files so that the compressed versions may be reconstructed without requiring additional memory or storage space. This allows network clients with limited resources to efficiently update software by downloading delta compressed versions over a n ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We present and algorithm for modifying delta compressed files so that the compressed versions may be reconstructed without requiring additional memory or storage space. This allows network clients with limited resources to efficiently update software by downloading delta compressed versions over a network. Delta compression for binary files, compactly encoding a version of data with only the changed bytes from a previous version, may be used to efficiently distribute software over low bandwidth channels, such as the Internet. Traditional methods of rebuilding these delta files require memory or storage space on the target machine for both the old and new version of the file to be reconstructed. With the advent of network computing and Internet set-top boxes, many of these network attached target machines have limited additional scratch space in memory or storage. We provide an algorithm for modifying a delta compressed version file so that it may rebuild the new file version in the spa...
Impact of the Research Community On the Field of Software Configuration Management
, 2002
"... Software Configuration Management (SCM) is an important discipline in professional software development and maintenance. The importance of SCM has increased as programs have become larger and more complex and mission/life-critical. This paper discusses the evolution of SCM technology from the early ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Software Configuration Management (SCM) is an important discipline in professional software development and maintenance. The importance of SCM has increased as programs have become larger and more complex and mission/life-critical. This paper discusses the evolution of SCM technology from the early days of software development to present and the impact university and industrial research has had along the way. It also includes a survey of the industrial state-of-the-practice and research directions.
An Approximation to the Greedy Algorithm for Differential Compression of Very Large Files
- in Tech. Report, IBM Alamaden
, 2003
"... We present a new differential compression algorithm that combines the hash value techniques and suffix array techniques of previous work. Differential compression refers to encoding a file (a version file) as a set of changes with respect to another file (a reference file). Previous differential com ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We present a new differential compression algorithm that combines the hash value techniques and suffix array techniques of previous work. Differential compression refers to encoding a file (a version file) as a set of changes with respect to another file (a reference file). Previous differential compression algorithms can be shown empirically to run in linear-time but they have certain drawbacks, namely they do not find the best matches for every offset of the version file. Our algorithm finds the best matches for every offset of the version file, with respect to a certain granularity (or block size) and above a certain length threshold. It has two variations depending on how we choose the block size. If we keep the block size fixed, we show that the compression performance of our algorithm is similar to that of the greedy algorithm, without the expensive space and time requirements. If we vary the block size linearly with the reference file size, we show that our algorithm can run in linear-time and constant-space to compress very large files. Our algorithm combines the techniques of hashing sections of the files to obtain footprints and the use of suffix arrays to find the longest match. We also show empirically that our algorithm performs better than xdelta [7], vcdiff [3], and the work of Ajtai et al. [1] in most cases in terms of compression and performs better than vcdiff and the work of Ajtai et al. in terms of speed. 1
Decoding Code on a Sensor Node
"... Abstract. Wireless sensor networks come of age and start moving out of the laboratory into the field. As the number of deployments is increasing the need for an efficient and reliable code update mechanism becomes pressing. Reasons for updates are manifold ranging from fixing software bugs to retask ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. Wireless sensor networks come of age and start moving out of the laboratory into the field. As the number of deployments is increasing the need for an efficient and reliable code update mechanism becomes pressing. Reasons for updates are manifold ranging from fixing software bugs to retasking the whole sensor network. The scale of deployments and the potential physical inaccessibility of individual nodes asks for a wireless software management scheme. In this paper we present an efficient code update strategy which utilizes the knowledge of former program versions to distribute mere incremental changes. Using a small set of instructions, a delta of minimal size is generated. This delta is then disseminated throughout the network allowing nodes to rebuild the new application based on their currently running code. The asymmetry of computational power available during the process of encoding (PC) and decoding (sensor node) necessitates a careful balancing of the decoder complexity to respect the limitations of today’s sensor network hardware. We provide a seamless integration of our work into Deluge, the standard TinyOS code dissemination protocol. The efficiency of our approach is evaluated by means of testbed experiments showing a significant reduction in message complexity and thus faster updates. 1
In-Place Reconstruction of Version Differences
, 2002
"... Abstract—In-place reconstruction of differenced data allows information on devices with limited storage capacity to be updated efficiently over low-bandwidth channels. Differencing encodes a version of data compactly as a set of changes from a previous version. Transmitting updates to data as a vers ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract—In-place reconstruction of differenced data allows information on devices with limited storage capacity to be updated efficiently over low-bandwidth channels. Differencing encodes a version of data compactly as a set of changes from a previous version. Transmitting updates to data as a version difference saves both time and bandwidth. In-place reconstruction rebuilds the new version of the data in the storage or memory the current version occupies—no scratch space is needed for a second version. By combining these technologies, we support highly mobile applications on space-constrained hardware. We present an algorithm that modifies a differentially encoded version to be in-place reconstructible. The algorithm trades a small amount of compression to achieve this property. Our treatment includes experimental results that show our implementation to be efficient in space and time and verify that compression losses are small. Also, we give results on the computational complexity of performing this modification while minimizing lost compression.

