## Ensemble Prediction by Partial Matching

### Cached

### Download Links

### BibTeX

@MISC{Knoll_ensembleprediction,

author = {Byron Knoll},

title = {Ensemble Prediction by Partial Matching},

year = {}

}

### OpenURL

### Abstract

Prediction by Partial Matching (PPM) is a lossless compression algorithm which consistently performs well on text compression benchmarks. This paper introduces a new PPM implementation called PPM-Ens which uses unbounded context lengths and ensemble voting to combine multiple contexts. The algorithm is evaluated on the Calgary corpus. The results indicate that combining multiple contexts leads to an improvement in the compression performance of PPM-Ens, although it does not outperform state of the art compression techniques. 1

### Citations

1053 |
A method for construction of minimum-redundancy codes
- Huffman
- 1952
(Show Context)
Citation Context ...robability distribution for the prediction of each character. The second is to encode these probability distributions into a file using a coding scheme such as arithmetic coding [2] or Huffman coding =-=[3]-=-. PPM is concerned with the first task of generating a probability distribution for the prediction of the next character in a sequence. Consider the alphabet of lower case English characters and the i... |

694 |
Arithmetic Coding for Data Compression
- Witten, Neal, et al.
- 1987
(Show Context)
Citation Context ... first is creating a probability distribution for the prediction of each character. The second is to encode these probability distributions into a file using a coding scheme such as arithmetic coding =-=[2]-=- or Huffman coding [3]. PPM is concerned with the first task of generating a probability distribution for the prediction of the next character in a sequence. Consider the alphabet of lower case Englis... |

644 |
Text Compression
- Bell, Cleary, et al.
- 1990
(Show Context)
Citation Context .... It uses unbounded length contexts and ensemble voting to mix context models. Much of the development of PPM-Ens was influenced by empirical performance evaluations on data 2from the Calgary corpus =-=[9]-=-, a standard dataset used for comparing lossless compression algorithms. Table 1 gives a summary of the Calgary corpus files. PPM-Ens has the advantage of linear memory usage (in terms of context leng... |

504 |
Individual comparisons by ranking methods
- Wilcoxon
- 1945
(Show Context)
Citation Context ...ig with an average cross entropy rate of 2.23. However, another PPM variant called cPPMII-64 [12] outperforms all of these algorithms with a cross entropy of 2.04. Using the Wilcoxon signed-rank test =-=[13]-=-, we can calculated whether there is a significant performance difference between PPM-Orig and PPM*C. Performing this test results in a p-value of 0.001618, indicating that there is a significant diff... |

361 | Completely derandomized selfadaptation in evolution strategies
- Hansen, Ostermeier
- 2001
(Show Context)
Citation Context ...ues used were 2, 0.0001, 0.2, 0.999, and 1 for the first through fifth parameters respectively. Automated parameter tuning was performed using Covariance Matrix Adaptation Evolution Strategy (CMA-ES) =-=[11]-=-. CMA-ES is known to be effective at optimizing a small number of continuous parameters. In addition, CMA-ES does not require the use of user supplied meta-parameters. A Java implementation of CMA-ES ... |

356 | Data compression using adaptive coding and partial string matching
- Cleary, Witten
- 1984
(Show Context)
Citation Context ...ontexts leads to an improvement in the compression performance of PPM-Ens, although it does not outperform state of the art compression techniques. 1 Introduction Prediction by Partial Matching (PPM) =-=[1]-=- is a lossless compression algorithm which consistently performs well on text compression benchmarks. There are a variety of PPM implementations with different performance properties. This paper intro... |

116 | Unbounded length contexts for PPM
- Cleary, Teahan
- 1997
(Show Context)
Citation Context ... memoizer [7]. 3 Algorithm Development The maximum context size of PPM is usually bounded in order to improve prediction accuracy and avoid exponential memory usage. A PPM implementation called PPM*C =-=[8]-=- demonstrates how unbounded length contexts can be used to improve prediction accuracy. PPM-Ens was created based on this work. It uses unbounded length contexts and ensemble voting to mix context mod... |

68 |
An estimate of an upper bound for the entropy of english
- Brown, Pietra
- 1992
(Show Context)
Citation Context ...The trigram model was used on a large corpus of one million English words to achieve a perplexity score of 247 per word, corresponding to a cross entropy of 7.95 bits per word or 1.75 bits per letter =-=[10]-=-. On this corpus, ASCII coding has a cross entropy of 8 bits per character, Huffman coding has 4.46, and the UNIX command compress has 4.43. On more specialized corpora it is possible to achieve lower... |

39 |
PPM: One step to practicality
- Shkarin
- 2002
(Show Context)
Citation Context ... average cross entropy of 2.28. This is better than the 2.34 achieved by PPM*C. PPM-Ens outperformed PPM-Orig with an average cross entropy rate of 2.23. However, another PPM variant called cPPMII-64 =-=[12]-=- outperforms all of these algorithms with a cross entropy of 2.04. Using the Wilcoxon signed-rank test [13], we can calculated whether there is a significant performance difference between PPM-Orig an... |

16 |
Adaptive Weighing of Context Models for Lossless Data Compression
- Mahoney
- 2005
(Show Context)
Citation Context ...r state of the art algorithms which outperform it. One example of a compression benchmark is the Hutter Prize [5]. This is a contest to compress the first 100MiB of Wikipedia. An algorithm called PAQ =-=[6]-=- currently dominates the contest. PAQ is closely related to PPM, improving on it by combining contexts which are arbitrary functions of the input history. Another example of an algorithm which achieve... |

12 | A stochastic memoizer for sequence data
- Wood, Archambeau, et al.
- 2009
(Show Context)
Citation Context ...combining contexts which are arbitrary functions of the input history. Another example of an algorithm which achieves state of the art cross entropy rates on other datasets is the stochastic memoizer =-=[7]-=-. 3 Algorithm Development The maximum context size of PPM is usually bounded in order to improve prediction accuracy and avoid exponential memory usage. A PPM implementation called PPM*C [8] demonstra... |

11 |
Human knowledge compression prize
- Hutter
- 2006
(Show Context)
Citation Context ...ld be noted that although PPM performs well on text compression benchmarks, there are other state of the art algorithms which outperform it. One example of a compression benchmark is the Hutter Prize =-=[5]-=-. This is a contest to compress the first 100MiB of Wikipedia. An algorithm called PAQ [6] currently dominates the contest. PAQ is closely related to PPM, improving on it by combining contexts which a... |

2 |
Prediction by partial approximate matching for lossless image compression
- Zhang, Adjeroh
- 2008
(Show Context)
Citation Context ...applications there may be a benefit to allowing a certain number of errors in a context match. This has led to the development of an algorithm called Prediction by Partial Approximate Matching (PPAM) =-=[4]-=-. PPAM was developed to perform lossless image compression. The pixels of an image tend to contain more noise than some other domains, such as the characters in a text document. PPAM was shown to have... |