## Semantically Motivated Improvements for PPM Variants (1997)

Venue: | The Computer Journal |

Citations: | 25 - 3 self |

### BibTeX

@ARTICLE{Bunton97semanticallymotivated,

author = {Suzanne Bunton},

title = {Semantically Motivated Improvements for PPM Variants},

journal = {The Computer Journal},

year = {1997},

volume = {40},

pages = {76--93}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper explains how to significantly improve the compression performance of any PPM variant

### Citations

700 |
Arithmetic coding for data compression
- Witten, Neal, et al.
- 1987
(Show Context)
Citation Context ...of the various mechanisms on compression performance. All probability estimates were coded using a floating-point m-ary arithmetic coder that we based upon the popular integer implementation given in =-=[19, 20]-=-, and no frequency scaling was required. All results were verified by decompressing the encoded file and comparing it with the original. In Tables 1--4 we give compression performance, summarize model... |

546 |
Stochastic Complexity
- Rissanen
- 1989
(Show Context)
Citation Context ... provide the best probability estimate?' 6.1. Stochastic complexity The stochastic complexity of a string is the length of its optimal off-line encoding, that is, its minimum description length (MDL) =-=[10]-=-. A string's MDL is the sum of the lengths of an encoding of a model plus the encoding of the string with respect to that model such that the total encoding length is minimal over all possible models ... |

362 | Data compression using adaptive coding and partial string matching
- Cleary, Witten
- 1984
(Show Context)
Citation Context ...; revised March, 1997 1. INTRODUCTION The on-line modelling algorithm, `prediction by partial matching' (PPM) has set the standard in lossless data compression research since its introduction in 1984 =-=[1]-=-. PPM constructs a bounded-order Markov model that estimates probabilities of each symbol in an input sequence using a technique called blending. Blending combines several distinct frequency distribut... |

348 | On-line construction of suffix tree
- Ukkonen
- 1995
(Show Context)
Citation Context ...eveloped in the parent work overviewed in this paper [3]. However, credit for publishing the algorithm first belongs elsewhere: Ukkonen published an on-line suffix-tree construction algorithm in 1995 =-=[21]-=-, and Larsson applied it to a sliding-window variant of PPM # in 1996 [22]. 9. CONCLUSIONS In this paper, we cumulatively applied three optimizations to PPM and PPM # : update exclusions, mixtures and... |

249 |
The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression
- Witten, Bell
- 1991
(Show Context)
Citation Context ...applying the combination of update-excluded mixtures and state selection to PPM variants, without adding distracting details. Blending has been defined for PPM in terms of several `escape' mechanisms =-=[9]-=-, of which the simplest and bestperforming are commonly called `C' [4] and `D' [5]. Both are easily described in terms of the weighting function W (s), if we define it as W (s) = count(s) count(s) + #... |

213 | Arithmetic coding - Rissanen, Langdon - 1979 |

175 |
A universal data compression system
- Rissanen
- 1983
(Show Context)
Citation Context ... coding penalty to each model state. The penalty is a lower bound on the number of bits required to encode that state. Most other treatments of the state-selection approach to on-line modelling (e.g. =-=[11, 12]-=-) require that a refinement (e.g. the children or deeper descendants) of a state be selected in preference to that state if the refinement's improvement to the performance of the model frontier contai... |

129 | Implementing the PPM data compression scheme - Moffat - 1990 |

117 | Unbounded length contexts for PPM
- Cleary, Teahan
- 1997
(Show Context)
Citation Context ...nto one probability measure. Recently, a new PPM variant, PPM # , was developed that eliminated PPM's order bound with a linear-space model that employed a buffer containing the entire input sequence =-=[2]-=-. PPM # is the only published on-line modelling technique that discards absolutely no information about the past input sequence. However, its original compression performance was not as good as that o... |

115 | Generalized Kraft inequality and arithmetic coding - Rissanen - 1976 |

83 | Data compression using dynamic Markov modelling
- Cormack, Horspool
- 1987
(Show Context)
Citation Context ...ed that there exist useful finite-context Markov models that are not FSMX [3, Chapter 4]. Those models allow arbitrarily long extensions to state contexts and include dynamic Markov compression (DMC) =-=[8]-=- models as a special case. They require explicit destination pointers because their conditioning contexts cannot be described by a single string. 2.4. Model semantics I: conditioning context partition... |

57 |
Complexity of strings in the class of Markov sources
- Rissanen
- 1986
(Show Context)
Citation Context ...ssage. The fact that the improvements developed and tested later in this work apply to FSMX models is important because FSMX models are ubiquitous in the information-theoretic literature. FSMX models =-=[7]-=- are suffix-tree context models with single-symbol minimal extensions such that the next state given by the transition function on a given state and input symbol can have Markov order that differs arb... |

57 | Source Coding Algorithms for Fast Data Compression - Pasco - 1976 |

55 | The design and analysis of efficient lossless data compression systems
- Howard
- 1993
(Show Context)
Citation Context ... the original implementation [2]. We reduce the memory requirements of PPM and improve its performance by 12% over the standard reference [4], and by 5% over the best of all previously known variants =-=[5]-=-. This paper is organized as follows. Section 2 introduces terminology, suffix-tree context models, and describes PPM and PPM # models. Section 3 transforms PPM # and PPM into a single suffix-tree dat... |

38 | Extended application of suffix trees to data compression
- Larsson
- 1996
(Show Context)
Citation Context ...for publishing the algorithm first belongs elsewhere: Ukkonen published an on-line suffix-tree construction algorithm in 1995 [21], and Larsson applied it to a sliding-window variant of PPM # in 1996 =-=[22]-=-. 9. CONCLUSIONS In this paper, we cumulatively applied three optimizations to PPM and PPM # : update exclusions, mixtures and MDLbased state selection. This work introduced a new stateselection techn... |

33 |
A sequential algorithm for the universal coding of finite memory sources
- Weinberger, Lempel, et al.
- 1992
(Show Context)
Citation Context ... coding penalty to each model state. The penalty is a lower bound on the number of bits required to encode that state. Most other treatments of the state-selection approach to on-line modelling (e.g. =-=[11, 12]-=-) require that a refinement (e.g. the children or deeper descendants) of a state be selected in preference to that state if the refinement's improvement to the performance of the model frontier contai... |

25 | Arithmetic stream coding using fixed precision registers - Rubin - 1979 |

23 |
A comparison of enumerative and adaptive coding
- Cleary, Witten
- 1984
(Show Context)
Citation Context ...elected for coding without any side-information about the actual model. . In on-line modelling, the coding penalty is incorporated into the inaccurate probability estimates from early in the sequence =-=[13]-=-. Thus, refinement coding penalties are incorporated into the records of past performance based on those estimates. . Automated experiments with various parametrizations of our executable taxonomy, wh... |

19 |
An empirical evaluation of coding methods for multi-symbol alphabets
- Moffat, Sharman, et al.
- 1993
(Show Context)
Citation Context ...of the various mechanisms on compression performance. All probability estimates were coded using a floating-point m-ary arithmetic coder that we based upon the popular integer implementation given in =-=[19, 20]-=-, and no frequency scaling was required. All results were verified by decompressing the encoded file and comparing it with the original. In Tables 1--4 we give compression performance, summarize model... |

15 | On-Line Stochastic Processes in Data Compression
- Bunton
- 1996
(Show Context)
Citation Context ...herited probability P e (a|suffix(s))? The performance and tradeoffs of mixtures that are defined using a variety of weighting functions and a spectrum of inheritance evaluation times are explored in =-=[3]-=-. Here we discuss mixtures in terms of a weighting function and an inheritance evaluation time which were selected for this discussion because they are simple and they perform well. Thus, they allow a... |

12 |
Text Compression. Advanced Reference Series
- BELL, WITTEN, et al.
- 1990
(Show Context)
Citation Context ...essential for correctly applying these techniques to any universal suffix-tree model. Finally, in Section 7, all improvements are measured empirically as compression performance on the Calgary Corpus =-=[6]-=-, using our executable taxonomy of on-line sequence modelling algorithms [3], which completely controls all model features in each experiment. 2. SUFFIX-TREE MODELS 2.1. Notation and terminology Broad... |

8 |
An enhancement to universal modeling algorithm `context' for real-time applications to image compression
- Furlan
- 1991
(Show Context)
Citation Context ... its siblings. To select, for example, the excited state with the lowest expected codelength, or the minimumorder excited state with expected codelength which is better than that of its excited child =-=[14]-=-, is incorrect, not merely suboptimal. This is because the children of a state s (i.e. those nodes with contexts which correspond to minimal extensions of the state's context) may have better performa... |