Symbol-driven compression of burrows wheeler transformed text (2000)
| Citations: | 3 - 0 self |
BibTeX
@TECHREPORT{Wirth00symbol-drivencompression,
author = {Anthony Ian Wirth},
title = {Symbol-driven compression of burrows wheeler transformed text},
institution = {},
year = {2000}
}
OpenURL
Abstract
Despite the enormous growth in storage capacity in recent years, the search for fast and effi-cient text compression algorithms continues. As processor speed is increasing at a higher rate than disk access time is decreasing, there is now even more reason to store information in a compressed form than there was previously. Prediction by Partial Matching (PPM), first published in 1984, was a significant step forward in the quest for efficient text compression. The Burrows Wheeler transform (BWT), introduced ten years later, has been the next significant breakthrough; its best implementations rank along-side those of PPM. In most BWT implementations, transformed text is converted to a string of ranks with a move-to-front (MTF) or similar mechanism before being compressed. Ranks are then encoded with an Order- model or a hierarchy of such models, with some substrings of repeated ranks encoded as run lengths. Although these rank based methods perform very well, the transfor-mation to MTF numbers blurs the distinction between individual symbols and is a possible cause of ineffectiveness. Instead of relying on symbol ranking, we examine the problem of modelling the transformed text as a sequence of segments with iid symbols, using three different techniques.







