## Automatic Synthesis of Compression Techniques for Heterogeneous Files (1995)

Citations: | 11 - 5 self |

### BibTeX

@MISC{Hsu95automaticsynthesis,

author = {William H. Hsu and Amy E. Zwarico},

title = {Automatic Synthesis of Compression Techniques for Heterogeneous Files},

year = {1995}

}

### Years of Citing Articles

### OpenURL

### Abstract

this paper uses a straightforward program synthesis technique: a compression plan, consisting of instructions for each block of input data, is generated, guided by the statistical properties of the input data. Because of its use of algorithms specifically suited to the types of redundancy exhibited by the particular input file, the system achieves consistent average performance throughout the file, as shown by experimental evidence

### Citations

6031 |
A mathematical theory of communication
- Shannon
(Show Context)
Citation Context ...ndent. Each of the redundancy types is exploited by dierent compression algorithms. Frequency of characters is exploited by alphabetic encoding algorithms such as the Human [Huf52] and Shannon-Fano [S=-=W63]-=- algorithms for compressing textsles. In these algorithms, more frequently occurring characters are replaced by shorter units than the less frequently occurring characters. Contiguous strings, long st... |

1211 | Automatic text processing: The transformation, analysis, and retrieval of information by computer - Salton - 1989 |

1138 | A Universal Algorithm for Sequential Data Compression
- Ziv, Lempel
- 1977
(Show Context)
Citation Context ...onally, no selection of compression methods is attempted within asle. Compact Pro uses the same methodology as StuIt and compress, but incorporates an improved Lempel-Ziv derived directly from LZ77. [=-=ZL7-=-7] The public-domain version of StuIt is derived from UNIX compress, as is evident from the similarity of their performance results. Compression systems such as StuItperform simple selection among alt... |

941 | A method for the construction of minimumredundancy codes - Huffman - 1952 |

730 | Compression of individual sequences via variable-rate coding - Ziv, Lempel - 1978 |

664 |
Arithmetic coding for data compression
- Witten, Neal, et al.
- 1987
(Show Context)
Citation Context ...d are indicated in the table by `-'. The selection procedure is given in the section describing heuristics. 3.2.1 The Algorithms There are four basic algorithms used by the system: arithmetic coding, =-=[WNC87]-=- Lempel-Ziv, [Bro91] run length encoding (RLE), [Sed88], and JPEG for image/graphics compression. [Ind94] Arithmetic coding algorithms compress data by representing that data by an interval of real nu... |

617 |
Text Compression
- Bell, Cleary, et al.
- 1990
(Show Context)
Citation Context ...the string, called a run, plus a count of the number of identical strings following. Both alphabetic distribution and average run length are sometimes characterized as statistical redundancy metrics. =-=[BCW90]-=- Recurrent strings, which occur repeatedly in the input stream with any number of interleaved symbols, are exploited by textual substitution algorithms such as Lempel-Ziv. [Wel84, ZL77, ZL78] In these... |

414 |
A technique for high performance data compression
- Welch
- 1984
(Show Context)
Citation Context ...orms its compression in a single pass, with only one method selected persle. Thus, the possibility of heterogeneoussles is ignored. UNIX compress uses an adaptive version of the Lempel-Ziv algorithm. =-=[Wel84]-=- It operates by substituting asxed-length code for common substrings. compress, like other adaptive textual substitution algorithms, periodically tests its own performance and reinitializes its string... |

173 |
A First Course in Probability
- Ross
- 1994
(Show Context)
Citation Context ...also considered, but these proved to be too specic for the application (since they are both specic parametric cases of the gamma distribution). The gamma distribution is dened as follows (cf. Ross [Ros88]): G (x ) = Z x 0 f (x)dx f (x) = e x ( x) t 1 (t ) (t ) = Z 1 0 e y y t 1 dy 9 where f is the density function, is the gamma function, x is the unnormalized measure, t ... |

157 |
Data Compression: Methods and Theory
- Storer
- 1988
(Show Context)
Citation Context ...e, often inaccurately, thatsles are homogeneous throughout. Consequently, each exploits only a subset of the redundancy found in thesle. Unfortunately, no algorithm is eective in compressing allsles [=-=Sto88-=-]. For example, Hu- man encoding works best on datasles with a high variance in the frequency of individual characters (including some graphics and audio data), achieves mediocre performance on natura... |

148 |
A locally adaptive data compression scheme
- Bentley, Sleator, et al.
- 1986
(Show Context)
Citation Context ...8 vary the length of strings used in compression. [Wel84, ZL78] Yet another adaptive (alphabetic distribution-based) comrpession scheme, the Move-ToFront (MTF) method, was developed by Bentley et al. =-=[BSTW86]-=- In MTF, the `word code' for a symbol is determined by the position of the word in a sequential list. The word list is ordered so that frequently accessed words are near the front, thus shortening the... |

64 |
D.H.: Data compression with finite windows
- Fiala, Greene
- 1989
(Show Context)
Citation Context ...ut through a Human coder. An example of a truly composite technique is the compression achieved by using Shannon-Fano tries 2 in conjunction with the Fiala-Greene algorithm (a variant of Lempel-Ziv) [=-=FG89]-=- in the PKZIP [Kat90] commercial package. Tries are used to optimally encode strings by character frequency. [HS87] PKZIP was selected as the representative test program from this group in our experim... |

49 |
A method for the construction of minimum-redundancy codes
- Huâ†µman
- 1952
(Show Context)
Citation Context ... of redundancy are independent. Each of the redundancy types is exploited by dierent compression algorithms. Frequency of characters is exploited by alphabetic encoding algorithms such as the Human [H=-=uf52]-=- and Shannon-Fano [SW63] algorithms for compressing textsles. In these algorithms, more frequently occurring characters are replaced by shorter units than the less frequently occurring characters. Con... |

13 | Dynamic Huffman coding - Vitter - 1989 |

9 | Algorithms, 2nd edn - Sedgewick - 1988 |

7 | Data Compression : techniques and applications, hardware and software - Held - 1991 |

7 | Image and Text Compression - Storer - 1992 |

4 |
Dynamic Hu man Coding
- Vitter
- 1989
(Show Context)
Citation Context ...ies within asle by modifying parameters used by the algorithm, such as the dictionary, during execution. For example, adaptive alphabetic distribution-based algorithms such as dynamic Human encoding [=-=Vit89]-=- maintain a tree structure to minimize the encoded length of the most frequently occurring characters. This property can be made to change continuously as asle is processed. An adaptive textual substi... |

2 |
Commercial compression system, version 1.1
- Katz, PKZIP
- 1990
(Show Context)
Citation Context ...ly tailored program for eachsle gives improved performance over using one program for allsles. This system produces better compression results than four commonly available compression packages, PKZIP =-=[Kat90-=-], Unix 2 compress [Mic87], StuIt [Lau90], and Compact Pro [Lau90, Goo91] for arbitrary heterogeneoussles. The major contributions of this work are twofold. Thesrst is an improved compression system f... |

1 |
Freeze implementation of LZHuf algorithm. comp.sources.misc archives, Internet
- Broukhis
- 1991
(Show Context)
Citation Context ...he table by `-'. The selection procedure is given in the section describing heuristics. 3.2.1 The Algorithms There are four basic algorithms used by the system: arithmetic coding, [WNC87] Lempel-Ziv, =-=[Bro91]-=- run length encoding (RLE), [Sed88], and JPEG for image/graphics compression. [Ind94] Arithmetic coding algorithms compress data by representing that data by an interval of real numbers between zero a... |

1 |
file (program). Berkeley Unix operating system
- Darwin
- 1987
(Show Context)
Citation Context ... a computer system. Finally, object data has slightly shorter runs but is similarly redundant. To determine the block type we use a procedure new-file which is our extension of the Unix file command. =-=[Dar87]-=- file works by examining thesrst 512 bytes of asle and comparing the pattern of data contained in it to a collection of known data patterns from Unix and other operating systems. new-file works in a s... |

1 |
Compact Pro. Commercial compression system
- Goodman
- 1991
(Show Context)
Citation Context ...perior to commercially available systems. The four systems we studied are PKZIP, developed for microcomputers running MS-DOS; [Kat90] UNIX compress; [Mic87] and StuIt Classic [Lau90] and Compact Pro [=-=Goo91]-=-, developed for the Apple Macintosh operating system. Each of these products performs its compression in a single pass, with only one method selected persle. Thus, the possibility of heterogeneoussles... |

1 |
StuIt Classic and StuIt Deluxe. Commercial compression system
- Lau
- 1990
(Show Context)
Citation Context ...roved performance over using one program for allsles. This system produces better compression results than four commonly available compression packages, PKZIP [Kat90], Unix 2 compress [Mic87], StuIt [=-=Lau90]-=-, and Compact Pro [Lau90, Goo91] for arbitrary heterogeneoussles. The major contributions of this work are twofold. Thesrst is an improved compression system for heterogeneoussles. The second is the d... |

1 |
compress. Commercial compression system, operating system version 5.3
- Microsystems
- 1987
(Show Context)
Citation Context ...achsle gives improved performance over using one program for allsles. This system produces better compression results than four commonly available compression packages, PKZIP [Kat90], Unix 2 compress =-=[Mic87-=-], StuIt [Lau90], and Compact Pro [Lau90, Goo91] for arbitrary heterogeneoussles. The major contributions of this work are twofold. Thesrst is an improved compression system for heterogeneoussles. The... |

1 | C implementation of dynamic Human compressor by J.S. Vitter. comp.source.misc archives, Internet - Toal - 1990 |

1 | LZHuf: Encoding /Decoding module for LHarc. Compression system, version 0.03 (Beta - Tagawa, Okumura, et al. - 1989 |

1 | LHA: A high-performance file-compression program. Compression system, version - Yoshizaki - 1991 |

1 | StuffIt Classic and StuffIt Deluxe. Commercial compression system - Lau - 1990 |

1 | comp.compression benchmark (Calgary test corpus - Gailly - 1992 |

1 | Haruhiko Okumura and Haruyasu Yoshizaki, LZHuf: encoding/decoding module for LHarc. Compression system, version 0.03 (Beta - Tagawa - 1989 |

1 | JPEG image compression systemâ€™, think.com FTP archives, Internet - Group - 1994 |

1 | C implementation of dynamic Huffman compressor by - Toal - 1990 |