## Lossless compression based on the Sequence Memoizer (2010)

### Cached

### Download Links

Venue: | In Data Compression Conference 2010 |

Citations: | 11 - 4 self |

### BibTeX

@INPROCEEDINGS{Gasthaus10losslesscompression,

author = {Jan Gasthaus and Frank Wood and Yee Whye Teh},

title = {Lossless compression based on the Sequence Memoizer},

booktitle = {In Data Compression Conference 2010},

year = {2010},

pages = {337--345}

}

### OpenURL

### Abstract

In this work we describe a sequence compression method based on combining a Bayesian nonparametric sequence model with entropy encoding. The model, a hierarchy of Pitman-Yor processes of unbounded depth previously proposed by Wood et al. [2009] in the context of language modelling, allows modelling of long-range dependencies by allowing conditioning contexts of unbounded length. We show that incremental approximate inference can be performed in this model, thereby allowing it to be used in a text compression setting. The resulting compressor reliably outperforms several PPM variants on many types of data, but is particularly effective in compressing data that exhibits power law properties. 1

### Citations

666 |
Arithmetic coding for data compression
- Witten, Neal, et al.
- 1987
(Show Context)
Citation Context ...the sequence memoizer (SM) [16], a nonparametric Bayesian model for sequences of unbounded complexity. This model is combined with an entropy coder, for instance the arithmetic coder of Witten et al. =-=[15]-=-, to yield a method for compressing sequence data. The main contribution of this paper is to develop an efficient approximate incremental inference algorithm for the SM that renders it suitable for us... |

220 | The two-parameter poisson-dirichlet distribution derived from a stable subordinator. Annals of Probability, 25(2):855–900
- Pitman, Yor
- 1997
(Show Context)
Citation Context ...bution P (x) with which we can compress x. Section 4 describes how the algorithm in Section 2 is an approximation of this ideal Bayesian approach. 3.1 Pitman-Yor Process The Pitman-Yor process (PYP) [=-=Pitman and Yor, 1997-=-], denoted PY(d, H), is a distribution over probability vectors. 1 It is parameterized by a discount parameter d ∈ (0, 1) and a probability vector H called the base vector. If G ∼ PY(d, H) is a Pitman... |

122 |
Combinatorial stochastic processes
- Pitman
- 2006
(Show Context)
Citation Context ... them very successful in modelling data with such properties (such as natural languages) [Teh, 2006]. A better understanding of the PYP can be obtained by way of the Chinese restaurant process (CRP) [=-=Pitman, 2002-=-]. Initialize two sets of counts {cs, ts}s∈Σ to 0, and consider the following generative process: Draw y1 ∼ H and set cy1 = ty1 = 1; for n = 2, 3, . . . and each s ∈ Σ, with probability cs−dts set yn ... |

116 | Coalescents with multiple collisions
- Pitman
- 1999
(Show Context)
Citation Context ...ed when d = 0 and the concentration parameter is positive. Here the concentration parameter is set to 0 instead, this is sometimes referred to as the normalized stable process.to a theorem of Pitman =-=[10]-=-, which in our situation simply states: if G1|G0 ∼ PY(d1, G0) and G2|G1 ∼ PY(d2, G1) then marginally G2|G0 ∼ PY(d1d2, G0) is a Pitman-Yor process as well with modified discount parameters. Further det... |

113 | Autoencoders, minimum description length, and helmholtz free energy,” Advances in neural information processing systems
- Hinton, Zemel
- 1994
(Show Context)
Citation Context ...a long tradition, with PPM, CTW and PAQ being stellar examples. Latent variables that capture regularities in the data, as applied in this paper, have previously been used in the compression setting [=-=Hinton and Zemel, 1994-=-], but have seen less application due to the computational demand of approximating often intractable posterior distributions. Fortunately, in this paper we were able to derive efficient and effective ... |

111 | Unbounded length contexts for PPM
- Cleary, Teahan
- 1997
(Show Context)
Citation Context ...itable for use as a compressor’s predictive model. At the algorithmic level, the proposed method is somewhat similar to the unbounded context length variants of the well known PPM and CTW algorithms [=-=Cleary and Teahan, 1997-=-; Willems, 1998]. At a conceptual level, however, our approach is quite different: We take a Bayesian approach, treating the distributions over next symbols as latent variables on which we place a hie... |

84 | A hierarchical bayesian language model based on Pitman-Yor processes
- Teh
- 2006
(Show Context)
Citation Context ...ametric model composed of Pitman-Yor processes originally conceived of as a model for languages (sequences of words). In this section we briefly describe the model and refer the interested reader to [=-=Teh, 2006-=-; Wood et al., 2009]. The model describes the conditional probability of each symbol s following each context u using a latent variable Gu(s). Collecting the variables into a vector, Gu = [Gu(s)]s∈Σ i... |

55 | A corpus for the evaluation of lossless compression algorithms
- Arnold, Bell
- 1997
(Show Context)
Citation Context ... paper), but no improvement on the other files. In addition to the experiments on the Calgary corpus, compression performance was also evaluated on two other benchmark corpora: the Canterbury corpus [=-=Arnold and Bell, 1997-=-]and the 100 MB excerpt of an XML text dump of the English version of Wikipedia used in the Large Text Compression Benchmark [Mahoney, 2009] and the Hutter Prize compression challange [Hutter, 2006].... |

37 | The context-tree weighting method : Extensions - Willems - 1998 |

25 | Semantically motivated improvements for PPM variants
- Bunton
- 1997
(Show Context)
Citation Context ... assumptions. SM enables DEPLUMP to use the information available in the unbounded length contexts effectively, whereas for PPM* the extension to unbounded depth did not yield consistent improvement [=-=Bunton, 1997-=-]. While we show that DEPLUMP surpasses the compression performance of PPM’s best variants, it should should be noted that PPM has also recently been surpassed by the contextmixing PAQ family of compr... |

15 |
Adaptive Weighing of Context Models for Lossless Data Compression
- Mahoney
- 2005
(Show Context)
Citation Context ... we show that DEPLUMP surpasses the compression performance of PPM’s best variants, it should should be noted that PPM has also recently been surpassed by the contextmixing PAQ family of compressors [=-=Mahoney, 2005-=-] and PAQ compression performance currently exceeds (in general) that of DEPLUMP as well. Context-mixing is a term used by the PAQ community; we would suggest that the phrase predictive model mixing m... |

11 | A stochastic memoizer for sequence data
- Wood, Archambeau, et al.
- 2009
(Show Context)
Citation Context ...ces using a predictive model that incrementally estimates a distribution over what symbol comes next from the preceding sequence of symbols. As our predictive model we use the sequence memoizer (SM) [=-=Wood et al., 2009-=-], a nonparametric Bayesian model for sequences of unbounded complexity. This model is combined with an entropy coder, for instance the arithmetic coder of Witten et al. [1987], to yield a method for ... |

3 |
CTW website
- Willems
- 2009
(Show Context)
Citation Context ...rage log-losses obtained using a modified version of PPMZ 9.1 under Linux [Peltola and Tarhio, 2002] (which differ slightly from the published compression rates). The results for CTW were taken from [=-=Willems, 2009-=-]. on all files. While 1PF has a slight advantage over UKN on the larger text files book1 and book2, this advantage is mostly lost on other file types. This means that the most computationally efficie... |

2 |
Prize for compression human knowledge. URL: http://prize. hutter1.net
- Hutter
- 2006
(Show Context)
Citation Context ...and Bell, 1997]and the 100 MB excerpt of an XML text dump of the English version of Wikipedia used in the Large Text Compression Benchmark [Mahoney, 2009] and the Hutter Prize compression challange [=-=Hutter, 2006-=-]. On the Canterbury corpus, the results were consistently better then the best reported results, with the exception of two binary files. On the Wikipedia excerpt, the UKN algorithm (without context m... |

2 |
A new PPM variant for Chinese text compression
- Wu, Teahan
(Show Context)
Citation Context ... representative text file, the Chinese Union version the the bible, we achieved a log-loss of 4.35 bits per Chinese character, which is significantly better than the results reported by Wu and Teahan =-=[17]-=- (5.44 bits). 6. Discussion We presented a new compression algorithm based on the predictive probabilities of a hierarchical Bayesian nonparametric model called the sequence memoizer (SM). We showed t... |

1 |
Large text compression benchmark. URL: http://www.mattmahoney.net/ text/text.html
- Mahoney
- 2009
(Show Context)
Citation Context ...two other benchmark corpora: the Canterbury corpus [Arnold and Bell, 1997]and the 100 MB excerpt of an XML text dump of the English version of Wikipedia used in the Large Text Compression Benchmark [=-=Mahoney, 2009-=-] and the Hutter Prize compression challange [Hutter, 2006]. On the Canterbury corpus, the results were consistently better then the best reported results, with the exception of two binary files. On t... |

1 |
Prize for compression human knowledge
- Hutter
- 2006
(Show Context)
Citation Context ...a: the Canterbury corpus [1] and the 100 MB excerpt of an XML text dump of the English version of Wikipedia used in the Large Text Compression Benchmark [6] and the Hutter Prize compression challange =-=[5]-=-. On the Canterbury corpus, the results were consistently better then the best reported results, with the exception of two binary files. On the Wikipedia excerpt, the UKN algorithm (without context mi... |