This paper surveys a variety of data compression methods spanning almost forty years of research, from the work of Shannon, Fano and Huffman in the late 40's to a technique developed in 1986. The aim of data compression is to reduce redundancy in stored or communicated data, thus increasing effective data density. Data compression has important application in the areas of file storage and distributed systems. Concepts from information theory, as they relate to the goals and evaluation of data compression methods, are discussed briefly. A framework for evaluation and comparison of methods is constructed and applied to the algorithms presented. Comparisons of both theoretical and empirical natures are reported and possibilities for future research are suggested. INTRODUCTION Data compression is often referred to as coding, where coding is a very general term encompassing any special representation of data which satisfies a given need. Information theory is defined to be the study of eff...
|
3170
|
The mathematical theory of communication
– Shannon
- 1962
|
|
846
|
Information Theory and Reliable Communication
– Gallager
- 1968
|
|
799
|
A universal algorithm for sequential data compression
– Ziv, Lempel
- 1977
|
|
613
|
A method for the construction of minimum-redundancy codes
– Huffman
- 1952
|
|
517
|
Arithmetic coding for data compression
– Witten, Neal, et al.
- 1987
|
|
332
|
A technique for high performance data compression
– Welch
- 1984
|
|
324
|
The quadtree and related hierarchical data structures
– SAMET
- 1984
|
|
229
|
Universal codeword sets and representation of the integers
– Elias
- 1975
|
|
112
|
A universal data compression system
– Rissanen
- 1983
|
|
111
|
A locally adaptive data compression scheme
– Bentley, Sleator, et al.
- 1986
|
|
108
|
Information Theory and Coding
– Abramson
- 1963
|
|
83
|
Variations on a theme by Huffman
– Gallager
- 1978
|
|
81
|
Dynamic Huffman coding
– Knuth
- 1985
|
|
79
|
Data compression via textual substitution
– Storer, Szymanski
- 1982
|
|
78
|
Optimum binary search trees
– Knuth
- 1971
|
|
77
|
The Transmission of Information
– Fano
- 1961
|
|
72
|
Generalized Kraft Inequality and Arithmetic Coding
– Rissanen
- 1976
|
|
65
|
The design and analysis of dynamic huffman codes
– Vitter
- 1987
|
|
59
|
Fast algorithms for manipulating formal power series
– Brent, Kung
- 1978
|
|
49
|
Linear algorithm for data compression via string matching
– Rodeh, Pratt, et al.
- 1981
|
|
47
|
Data Structure Techniques
– Standish
- 1980
|
|
41
|
Source Coding Algorithm for Fast Data Compression
– Pasco
- 1976
|
|
34
|
An Adaptive System for Data Compression
– Faller
- 1973
|
|
33
|
Variable-length binary encodings
– Gilbert, Moore
- 1959
|
|
33
|
Optimal Computer Search Trees and Variable-Length Alphabetical Codes
– Hu, Tucker
- 1971
|
|
29
|
Interval and recency rank source coding: two on-line adaptive variable-length schemes
– Elias
- 1987
|
|
29
|
Optimal alphabetic trees
– Itai
- 1976
|
|
29
|
Minimum-redundancy coding for the discrete noiseless channel
– Karp
- 1961
|
|
28
|
Generating a canonical prefix encoding
– Schwartz, Kallick
- 1964
|
|
24
|
Bounds on the redundancy of Huffman codes
– Capocelli, Giancarlo, et al.
- 1986
|
|
24
|
Arithmetic stream coding using fixed precision registers
– Rubin
- 1979
|
|
23
|
Data compression on a database system
– Cormack
- 1985
|
|
23
|
Optimal binary search trees with restricted maximal depth
– Garey
- 1974
|
|
22
|
Robust transmission of unbounded strings using Fibonacci representations
– Apostolico, Fraenkel
- 1987
|
|
18
|
Information Theory. Interscience
– Ash
- 1965
|
|
18
|
Channels which transmit letters of unequal duration
– Krause
- 1962
|
|
18
|
An Optimum Encoding with Minimum Longest Code and Total Number of Digits
– Schwartz
- 1964
|
|
17
|
Codes based on inaccurate source probabilities
– Gilbert
- 1971
|
|
16
|
Common Phrases and Minimum-Space Text Storage
– Wagner
- 1973
|
|
15
|
Path length of binary search trees
– Hu, Tan
- 1972
|
|
15
|
Efficient Generation of Optimal Prefix Code: Equiprobable Words Using Unequal Cost Letters
– Perl, Garey, et al.
- 1975
|
|
15
|
Optimal Variable Length Codes (Arbitrary Symbol Costs and Equal Code Word Probabilities
– Varn
- 1971
|
|
14
|
for Adaptive Huffman Codes
– Cormack, Horspool
- 1984
|
|
14
|
Experiments in text file compression
– Rubin
- 1976
|
|
13
|
Conditions for optimality of the Huffman algorithm
– Parker
- 1980
|
|
13
|
A practitioner's guide to data base compression
– Severance
- 1983
|
|
12
|
Theory of synchronous Communications
– Stiffler
- 1971
|
|
10
|
A Double-adaptive File Compression Algorithm
– Langdon, Rissanen
- 1983
|
|
10
|
Data structure of huffman codes and its application to efficient encoding and decoding
– Tanaka
- 1987
|
|
9
|
A Huffman-Shannon-Fano code
– Connell
- 1973
|