## The Design and Analysis of Efficient Lossless Data Compression Systems (1993)

Citations: | 49 - 0 self |

### BibTeX

@TECHREPORT{Howard93thedesign,

author = {Paul Glor Howard},

title = {The Design and Analysis of Efficient Lossless Data Compression Systems},

institution = {},

year = {1993}

}

### Years of Citing Articles

### OpenURL

### Abstract

Our thesis is that high compression efficiency for text and images can be obtained by using sophisticated statistical compression techniques, and that greatly increased speed can be achieved at only a small cost in compression efficiency. Our emphasis is on elegant design and mathematical as well as empirical analysis. We analyze arithmetic coding as it is commonly implemented and show rigorously that almost no compression is lost in the implementation. We show that high-efficiency lossless compression of both text and grayscale images can be obtained by using appropriate models in conjunction with arithmetic coding. We introduce a four-component paradigm for lossless image compression and present two methods that give state of the art compression efficiency. In the text compression area, we give a small improvement on the preferred method in the literature. We show that we can often obtain significantly improved throughput at the cost of slightly reduced compression. The extra speed c...

### Citations

6050 |
A mathematical theory of communication
- Shannon
- 1948
(Show Context)
Citation Context ...ic Mapper images : : : : : : : : : : : : : : : : 64 xi 1 a INTRODUCTION TO DATA COMPRESSION Data can be compressed whenever some patterns of data symbols are more likely to occur than others. Shannon =-=[67]-=- showed that for the best possible compression code (in the sense of minimum average code length), the output length contains a contribution of \Gamma log 2 p bits from the encoding of each symbol who... |

1138 | A Universal Algorithm for Sequential Data Compression
- Ziv, Lempel
- 1977
(Show Context)
Citation Context ...been done on lossless compression of text, we find that the usual techniques for text compression are not effective for lossless image compression. All good methods for text compression found to date =-=[7,9,45,79,80]-=- involve 48 CHAPTER 3. LOSSLESS IMAGE COMPRESSION some form of moderately high-order exact string matching. Images, however, are twodimensional, so the contexts are more complicated than for one-dimen... |

730 | Compression of individual sequences via variable-rate coding
- Ziv, Lempel
- 1978
(Show Context)
Citation Context ...been done on lossless compression of text, we find that the usual techniques for text compression are not effective for lossless image compression. All good methods for text compression found to date =-=[7,9,45,79,80]-=- involve 48 CHAPTER 3. LOSSLESS IMAGE COMPRESSION some form of moderately high-order exact string matching. Images, however, are twodimensional, so the contexts are more complicated than for one-dimen... |

398 |
A universal prior for integers and estimation by minimum description length
- Rissanen
- 1983
(Show Context)
Citation Context ... and an efficient way of representing (or learning) the model. (This is related to Rissanen 's minimum description length principle; he has investigated it thoroughly from a theoretical point of view =-=[56,57,58]-=-.) Most models for text compression involve estimating the probability p of a given symbol by p = weight of symbol total weight of all symbols ; which we can then encode in \Gamma log 2 p bits using e... |

285 |
Universal coding, information, prediction, and estimation
- Rissanen
- 1984
(Show Context)
Citation Context ... and an efficient way of representing (or learning) the model. (This is related to Rissanen 's minimum description length principle; he has investigated it thoroughly from a theoretical point of view =-=[56,57,58]-=-.) Most models for text compression involve estimating the probability p of a given symbol by p = weight of symbol total weight of all symbols ; which we can then encode in \Gamma log 2 p bits using e... |

197 | An introduction to arithmetic coding
- Langdon
- 1984
(Show Context)
Citation Context ... always longer than 1 = 4 . Mechanisms for incremental transmission and fixed precision arithmetic have been developed through the years by Pasco [49], Rissanen [60], Rubin [65], Rissanen and Langdon =-=[61]-=-, Guazzo [23], and Witten, Neal, and Cleary [77]. The bit-stuffing idea of Langdon and others at IBM that limits the propagation of carries in the additions is roughly equivalent to the follow-on proc... |

108 |
Generalized Kraft inequality and arithmetic coding
- Rissanen
(Show Context)
Citation Context ... times, so the current interval size is always longer than 1 = 4 . Mechanisms for incremental transmission and fixed precision arithmetic have been developed through the years by Pasco [49], Rissanen =-=[60]-=-, Rubin [65], Rissanen and Langdon [61], Guazzo [23], and Witten, Neal, and Cleary [77]. The bit-stuffing idea of Langdon and others at IBM that limits the propagation of carries in the additions is r... |

104 |
Universal modeling and coding
- Rissanen, Langdon
- 1984
(Show Context)
Citation Context ...d model of the source of the data so that the statistical coder can work with accurate probabilities. The separation of the compression process into coding and modeling is due to Rissanen and Langdon =-=[59]-=-. In this chapter we give a brief overview of both modeling and coding. We introduce arithmetic codes and distinguish them from prefix codes; we describe and mathematically analyze arithmetic codes in... |

87 | Design and analysis of dynamic huffman codes
- Vitter
- 1987
(Show Context)
Citation Context ... number of events coded per input symbol occurs when the tree is a Huffman tree, since such trees have minimum average weighted path length; however, maintaining such trees dynamically is complicated =-=[10, 36,73,74]-=-. 2.2 Quasi-arithmetic coding The primary disadvantage of arithmetic coding is its slowness. Since small errors in probability estimates cause very small increases in code length, we expect that by in... |

84 |
An overview of the basic principles of the Q-coder adaptive binary arithmetic coder
- Pennebaker, Mitchell, et al.
- 1988
(Show Context)
Citation Context ... obtain good compression. Historically, much of the arithmetic coding research by Rissanen, Langdon, and others at IBM has focused on bilevel images [39]. The 16 CHAPTER 2. STATISTICAL CODING Q-Coder =-=[1,37,41,50,51,52]-=- is a binary arithmetic coder; work by Rissanen and Mohiuddin [62,63], Chevion et al. [6] , and Feygin et al. [16] extends some of the Q-Coder ideas to multi-symbol alphabets. The quasi-arithmetic cod... |

66 |
Some practical universal noiseless coding techniques
- Rice
- 1979
(Show Context)
Citation Context ...mpute n mod m and output this value using a binary code, adjusted as described above so that we sometimes use blog 2 mc bits and sometimes dlog 2 me bits. Rice coding, developed independently by Rice =-=[53,54,55]-=-, is the same as Golomb coding except that only a subset of the parameter values may be used, namely the powers of 2. The Rice code with parameter k is exactly the same as the Golomb code with paramet... |

55 |
Picture coding: A review
- Netravali, Limb
- 1980
(Show Context)
Citation Context ... the decoder as well. 3.2 Error modeling and coding It has long been accepted that, for most images, prediction errors can be closely approximated by a Laplace (or symmetric exponential) distribution =-=[24,34,47,48]-=-. In particular, the distribution of prediction errors is sharply peaked at zero, which is characteristic of the Laplace distribution but not of the normal distribution (see Figure 3.1). For our predi... |

52 |
Source coding algorithms for fast data compression
- Pasco
- 1976
(Show Context)
Citation Context ...d any number of times, so the current interval size is always longer than 1 = 4 . Mechanisms for incremental transmission and fixed precision arithmetic have been developed through the years by Pasco =-=[49]-=-, Rissanen [60], Rubin [65], Rissanen and Langdon [61], Guazzo [23], and Witten, Neal, and Cleary [77]. The bit-stuffing idea of Langdon and others at IBM that limits the propagation of carries in the... |

38 |
Progressive image transmission: A review and comparison
- Tzou
- 1987
(Show Context)
Citation Context ...g to decode much of the encoded data; if more detail is desired, the image can be successively refined as more of the encoded data is decoded. An excellent survey of progressive techniques appears in =-=[71]-=-. 3.4. MLP: A MULTI-LEVEL PROGRESSIVE METHOD 55 g g g g g g g g g g g g g g g g g g g g g g g g g g g g g g g g g g g g g g g g t t t t t t t t t t t t t t t t Figure 3.3: MLP last level prediction ne... |

28 |
Probability estimation for the q-coder
- Pennebaker, Mitchell
- 1988
(Show Context)
Citation Context ... obtain good compression. Historically, much of the arithmetic coding research by Rissanen, Langdon, and others at IBM has focused on bilevel images [39]. The 16 CHAPTER 2. STATISTICAL CODING Q-Coder =-=[1,37,41,50,51,52]-=- is a binary arithmetic coder; work by Rissanen and Mohiuddin [62,63], Chevion et al. [6] , and Feygin et al. [16] extends some of the Q-Coder ideas to multi-symbol alphabets. The quasi-arithmetic cod... |

24 |
Reversible Intraframe Compression of Medical Images
- Roos, Viergever, et al.
- 1988
(Show Context)
Citation Context ...idpoint polynomial interpolation. Precursors of MLP, which use much simpler predictors and less sophisticated variance estimation, are developed in [21,35,70]. A rotating coordinate system appears in =-=[14,64]-=-. 3.4.1 Description of the MLP algorithm In the MLP algorithm, the pixels in an image are divided into levels, each level having twice as many pixels as the preceding one. The pixels in a level are ar... |

24 |
Arithmetic stream coding using fixed precision registers
- Rubin
- 1979
(Show Context)
Citation Context ...he current interval size is always longer than 1 = 4 . Mechanisms for incremental transmission and fixed precision arithmetic have been developed through the years by Pasco [49], Rissanen [60], Rubin =-=[65]-=-, Rissanen and Langdon [61], Guazzo [23], and Witten, Neal, and Cleary [77]. The bit-stuffing idea of Langdon and others at IBM that limits the propagation of carries in the additions is roughly equiv... |

24 |
The zero frequency problem: Estimating the probabilities of novel events in adaptative text compression
- Witten, Bell
- 1991
(Show Context)
Citation Context ...nonzero, but in an adaptive code we have no way of estimating the probability of a symbol before it has occurred for the first time. This is the zero-frequency problem, discussed in detail in [4] and =-=[76]-=-. For large files with small alphabets and simple models, all solutions to this problem give roughly the same compression. In this section we adopt 4.1. ADAPTIVE AND SEMI-ADAPTIVE MODELS FOR TEXT COMP... |

20 |
Software implementation of the Q-coder
- Mitchell, Pennebaker
- 1988
(Show Context)
Citation Context ... obtain good compression. Historically, much of the arithmetic coding research by Rissanen, Langdon, and others at IBM has focused on bilevel images [39]. The 16 CHAPTER 2. STATISTICAL CODING Q-Coder =-=[1,37,41,50,51,52]-=- is a binary arithmetic coder; work by Rissanen and Mohiuddin [62,63], Chevion et al. [6] , and Feygin et al. [16] extends some of the Q-Coder ideas to multi-symbol alphabets. The quasi-arithmetic cod... |

13 |
Predictive quantizing differential pulse code modulation for the transmission of television signals
- O’Neal
- 1966
(Show Context)
Citation Context ... the decoder as well. 3.2 Error modeling and coding It has long been accepted that, for most images, prediction errors can be closely approximated by a Laplace (or symmetric exponential) distribution =-=[24,34,47,48]-=-. In particular, the distribution of prediction errors is sharply peaked at zero, which is characteristic of the Laplace distribution but not of the normal distribution (see Figure 3.1). For our predi... |

13 | Dynamic Huffman coding
- Vitter
- 1989
(Show Context)
Citation Context ... number of events coded per input symbol occurs when the tree is a Huffman tree, since such trees have minimum average weighted path length; however, maintaining such trees dynamically is complicated =-=[10, 36,73,74]-=-. 2.2 Quasi-arithmetic coding The primary disadvantage of arithmetic coding is its slowness. Since small errors in probability estimates cause very small increases in code length, we expect that by in... |

13 |
Arithmetic Coding for Data
- Witten, Neal, et al.
- 1987
(Show Context)
Citation Context ...d with quasi-arithmetic codes. 2.1 Arithmetic coding In this section we explain how arithmetic coding works and give implementation details; our treatment is based on that of Witten, Neal, and Cleary =-=[77]-=-. We point out the usefulness of binary arithmetic coding, that is, coding with a two-symbol alphabet. Our focus is on encoding, but the decoding process is similar. 2.1.1 Basic algorithm for arithmet... |

11 |
Modeling by Shortest Data Description," Automatica 14
- Rissanen
- 1978
(Show Context)
Citation Context ... and an efficient way of representing (or learning) the model. (This is related to Rissanen 's minimum description length principle; he has investigated it thoroughly from a theoretical point of view =-=[56,57,58]-=-.) Most models for text compression involve estimating the probability p of a given symbol by p = weight of symbol total weight of all symbols ; which we can then encode in \Gamma log 2 p bits using e... |

8 |
Data compression by means of a ’book stack
- Ryabko
- 1980
(Show Context)
Citation Context ...ool find that it gives good results when growing large dynamic Markov models [9]. ffl Using a sliding window on the text [36]. This requires excessive computational resources. ffl Recency rank coding =-=[5,12,66]-=-. This is simple but corresponds to a rather coarse model of recency. ffl Exponential aging (giving exponentially increasing weights to successive symbols) [10,46]. This is moderately difficult to imp... |

3 |
A Very High Speed Lossless Compression/Decompression Chip Set
- Venbrux, Liu, et al.
- 1991
(Show Context)
Citation Context ...ompute n mod 2 k and output this value using a k bit binary code. The resulting codes give less compression efficiency than Golomb codes, but they are even easier to implement, especially in hardware =-=[72]-=-, since we can compute bn=2 k c by shifting and n mod 2 k by masking out all but the k low order bits. Our parameter estimation method for Golomb codes applies to Rice codes too. Table 2.8 shows the b... |

3 |
On the Optimality of Code Options for a Universal Noiseless Coder
- Yeh, Rice, et al.
- 1991
(Show Context)
Citation Context ...the parameter m produces an optimal prefix code for the distribution [20]. Rice coding has been used as the basis for a lossless hardware compressor [72]. Its compression effectiveness is analyzed in =-=[78]-=-. 2.3.3 Selection of Golomb or Rice coding parameter We now describe an on-line algorithm for estimating the coding parameter for Golomb and Rice codes, and prove a bound on its effectiveness. For sim... |

2 |
On Encoding of Commas Between Strings
- Stone
- 1979
(Show Context)
Citation Context ... the last byte. 2 An alternative, transmitting the length of the original file before its encoding, reduces the cost to between log 2 t and 2 log 2 t bits by using an appropriate encoding of integers =-=[13,69,75]-=-, but requires the file length to be known before encoding can begin. The end-of-file cost using either of these methods is negligible for a typical file, less than 0.01 bit per input symbol. 2.1.5 Bi... |

2 |
System for Lossless Digital Image Compression
- Torbey, Meadows
(Show Context)
Citation Context ...6 Table 3.2: Coefficients used in MLP for 16-point midpoint polynomial interpolation. Precursors of MLP, which use much simpler predictors and less sophisticated variance estimation, are developed in =-=[21,35,70]-=-. A rotating coordinate system appears in [14,64]. 3.4.1 Description of the MLP algorithm In the MLP algorithm, the pixels in an image are divided into levels, each level having twice as many pixels a... |

2 |
Almost Asymptotically Optimal Flag Encoding of the Integers
- Wang
- 1988
(Show Context)
Citation Context ... the last byte. 2 An alternative, transmitting the length of the original file before its encoding, reduces the cost to between log 2 t and 2 log 2 t bits by using an appropriate encoding of integers =-=[13,69,75]-=-, but requires the file length to be known before encoding can begin. The end-of-file cost using either of these methods is negligible for a typical file, less than 0.01 bit per input symbol. 2.1.5 Bi... |

1 |
Mohiuddin, "A Multiplication-Free Multialphabet Arithmetic Code
- Rissanen, M
- 1989
(Show Context)
Citation Context ...y Rissanen, Langdon, and others at IBM has focused on bilevel images [39]. The 16 CHAPTER 2. STATISTICAL CODING Q-Coder [1,37,41,50,51,52] is a binary arithmetic coder; work by Rissanen and Mohiuddin =-=[62,63]-=-, Chevion et al. [6] , and Feygin et al. [16] extends some of the Q-Coder ideas to multi-symbol alphabets. The quasi-arithmetic coder discussed in Section 2.2 is formulated as a binary coder, though i... |

1 |
Two-Dimensional Encoding by Finite State Encoders
- Sheinwald, Lempel, et al.
- 1990
(Show Context)
Citation Context ...arely found in the data. Lempel and Ziv [40] have presented a method for two-dimensional dictionary based coding that uses a space-filling curve for its pixel sequence, and Sheinwald, Lempel, and Ziv =-=[68]-=- give another dictionary method that covers the image with repeated rectangles of various sizes. These methods can be proven asymptotically optimal for images generated by finite state sources, but th... |