## The Burrows-Wheeler compression algorithm is even better than what you have thought (2005)

### BibTeX

@MISC{Landau05theburrows-wheeler,

author = {Shir Landau and Elad Verbin},

title = {The Burrows-Wheeler compression algorithm is even better than what you have thought},

year = {2005}

}

### OpenURL

### Abstract

The best compression algorithm today for English text is based on the Burrows-Wheeler transform. This algorithm (whose common implementation is bzip2) consists of the following three essential steps: 1) Obtain the Burrows-Wheeler transform of the text, 2) Convert the transform into a sequence of integers using the move-to-front algorithm, 3) Encode the integers using arithmetic code or any order-0 encoding (possibly with run length encoding). In this paper we achieve a strong bound on the worst-case compression ratio of this algorithm, that is significantly better than bounds known to date and is obtained via simple analytical techniques. Specifically, for any input string s, and µ> 1, the length of the compressed string is bounded by µ · |s|Hk(s) + log(ζ(µ)) · |s | + gk where Hk is the k-th order empirical entropy, gk is a constant depending only on k and on the size of the alphabet, and ζ(µ) = 1 1 µ + 1 2 µ +... is the standard zeta function. In fact we prove a stronger result: That this bound without the additive term gk holds when we replace Hk(s) by the sum of the logarithms of the integers obtain by the move-to-front encoding of the transform. This refined bound is tight and close to the actual compression achieved in practice. To obtain this result we prove a tight result on the compressibility of integer sequences, which is of independent interest. 1