## Data Compression Using Adaptive Coding and Partial String Matching (1984)

Venue: | IEEE Transactions on Communications |

Citations: | 332 - 20 self |

### BibTeX

@ARTICLE{Cleary84datacompression,

author = {John G. Cleary and Ian and Ian H. Witten},

title = {Data Compression Using Adaptive Coding and Partial String Matching},

journal = {IEEE Transactions on Communications},

year = {1984},

volume = {32},

pages = {396--402}

}

### Years of Citing Articles

### OpenURL

### Abstract

The recently developed technique of arithmetic coding, in conjunction with a Markov model of the source, is a powerful method of data compression in situations where a linear treatment is inappropriate. Adaptive coding allows the model to be constructed dynamically by both encoder and decoder during the course of the transmission, and has been shown to incur a smaller coding overhead than explicit transmission of the model's statistics. But there is a basic conflict between the desire to use high-order Markov models and the need to have them formed quickly as the initial part of the message is sent. This paper describes how the conflict can be resolved with partial string matching, and reports experimental results which show that mixed-case English text can be coded in as little as 2.2 bits/ character with no prior knowledge of the source.

### Citations

739 | Compression of Individual Sequences via Variable-Rate Coding
- Ziv, Lempel
- 1978
(Show Context)
Citation Context ...een used by Langdon and Rissanen [ 12] with fixed-order Markov models. Roberts [20] also discusses similar techniques applied to encoding and authorship identification of English text. Ziv and Lempel =-=[24] have-=- proposed a coding technique which involves the adaptive matching of variable length strings. Superficial/y, this technique appears to be very different from the "arithmetic coding plus .adaptive... |

107 | Universal modeling and coding - Rissanen, Langdon - 1981 |

59 |
Enumerative source encoding
- Cover
- 1973
(Show Context)
Citation Context ...can simply enumerate all messages which can be generated by the model and allocate a part of the code space to each whose size depends on the message probability. This procedure of enumerative coding =-=[6]-=- unfortunately becomes impractical for models of any complexity. However, the recent invention of arithmetic coding [15] has provided a method which is guaranteed to transmit a message in a number of ... |

27 |
International digital facsimile coding standards
- Hunter, Robinson
- 1980
(Show Context)
Citation Context ...are a fixed model which governs the coding of all messages. While it may be appropriate in some tightly defined circumstances, such as special-purpose machines for facsimile transmission of documents =-=[9]-=-, it will not work well for a variety of different types of message. For example, imagine an encoder embedded in a general-purpose modem or a computer disk channel. The most appropriate model to use f... |

10 |
A general minimum-redundency source-coding algorithm
- Guazzo
- 1980
(Show Context)
Citation Context ...al definition of how character probabilities are estimated using partial string matching. II. T.E CODING METnOr) Arithmetic Coding Arithmetic coding has been discussed recently by a number of authors =-=[7], [10], [1-=-5], [19]. Imagine a sequence of 0090-6778/84/0400-039$01.00s1984 IEEE CLEARY AND WITTEN: DATA COMPRESSION 397 symbols X1X 2 '" X N is to be encoded as another sequence 1 2 '" M. After a sequ... |

6 | Arithmetic codings as number representations - Rissanen - 1979 |

5 |
Printed English compression by dictionary encoding
- White
- 1967
(Show Context)
Citation Context ... source in a small number of bits [ 19]. For the first part, Markov modeling is generally employed, although the use of language-dependent word dictionaries in data compression has also been explored =-=[21]-=-. In either case the problem of transmitting the model must be faced. The usual procedure is to arrange that when the transmission system is set up, both encoder and decoder share a general model of t... |

3 |
Non-Deterministic Modelling of Behaviour Sequences
- Witten
- 1977
(Show Context)
Citation Context ...enever more than one prediction is seen o emanate from it. Another possibility is to construct a nondterministic automaton model of the message string, and store a reduced form as described by Witten =-=[22]-=-. However, we are not overly concerned about the amount of storage that the method consumes. After all, only untilled storage *is needed. W{th the continued imprbvement in integrated circuit technolog... |

2 |
An associative and impressible computer
- Cleary
- 1980
(Show Context)
Citation Context ...e scheme uses no prestored statistics; the required memory is empty initially. There are complicated tradeoffs between space, time, and implementation complexity in partial string matching algorithms =-=[2]-=-, [4]. Our experimental implementation stores the Markov model in.a tree structure (as must any implementation whose execution t/me grows at most linearly with o and which occupies a reasonable space)... |

1 | 13ahl et al., "'Recognition of a continuotisly read natural corpus - R - 1978 |

1 |
Aithmetic, enumerative and adaptive coding,'}:lEEE Trans
- Cleary, Witten
(Show Context)
Citation Context ...is to arrange that both sender and receiver adapt the model dynamically to the message statistics as the transmission proceeds. This is called "ad,aptive coding" [12]. It has been shown theo=-=retically [5]-=- that for some models, adaptive coding is never significantly worse than a two-pass approach and can be significantly better. This paper verifies these results in practice for adaptive coding using a ... |

1 |
Experinint.?.ith linear prediction in television
- Harrison
- 1952
(Show Context)
Citation Context ...ng the coding scheme perform much better-l.923 bits/pixel, or 48 perce. nt of the unencoded value. We suspect that this may be better than could be achieved using techniques such as linear prediction =-=[8]-=-. Selection of the Escape Probability We have investigated the use of two algorithms for calculating the escape probability, that is, the probability that a character will occur in l context in which ... |

1 |
An efficient coding%ystem for long soutee sequevices
- Jones
- 1981
(Show Context)
Citation Context ...finition of how character probabilities are estimated using partial string matching. II. T.E CODING METnOr) Arithmetic Coding Arithmetic coding has been discussed recently by a number of authors [7], =-=[10], [15], [1-=-9]. Imagine a sequence of 0090-6778/84/0400-039$01.00s1984 IEEE CLEARY AND WITTEN: DATA COMPRESSION 397 symbols X1X 2 '" X N is to be encoded as another sequence 1 2 '" M. After a sequence X... |

1 |
Compression of black-while images. witfl' arithmetic coding
- Rissanen
- 1981
(Show Context)
Citation Context ...t has been seen. The obvious solution is to arrange that both sender and receiver adapt the model dynamically to the message statistics as the transmission proceeds. This is called "ad,aptive cod=-=ing" [12]-=-. It has been shown theoretically [5] that for some models, adaptive coding is never significantly worse than a two-pass approach and can be significantly better. This paper verifies these results in ... |

1 | 3, note on the ZiwLempel model for compressing individual'seqi]'nces - Langdon - 1983 |

1 |
Lesk, "Refer
- E
- 1979
(Show Context)
Citation Context ...cters. The fifth is a short extract from a bibliography file,.which contains authors' names, titles, and reference details in a' structured manner suitable for computer indexing and keyword retrieval =-=[14]-=-. These first five samples are all represented as ASCII text, requiring 7 bits/character. The next three samples each use 8 bits/character. Sample 6 is a data file containing geophysical. information ... |

1 |
Sburce coding algotlthms for fast data compression
- Pasco
- 1976
(Show Context)
Citation Context ...se size depends on the message probability. This procedure of enumerative coding [6] unfortunately becomes impractical for models of any complexity. However, the recent invention of arithmetic coding =-=[15]-=- has provided a method which is guaranteed to transmit a message in a number of bits which can be made arbitrarily close to its entropy with respect to the model which is used. The method can be thoug... |

1 | The probbill[y of induction," in The World of - Piere - 1956 |

1 | Decision making in'Markov chains applied to the problem of p//te.rn recogniti6h - Raviv - 1967 |

1 |
Principles of Computei- Speech
- Witten
- 1982
(Show Context)
Citation Context ... 1, the shortest, is an abstract of a technical paper. It includes some formatting controls as well as a ewlfe> character at the end of each line. Sample 2, the longest, is a complete 11-chapter book =-=[23]-=-. Notice that this sample contains over half a million characters. Prior to coding, we removed the formatting controls and mathematical expressions automatically, which left some. rather anomalous gap... |