Block Sorting Text Compression - Final Report (1996) [16 citations — 3 self]
http://www.cs.auckland.ac.nz/~peter-f/ftplink/Tech
ftp://ftp.cs.auckland.ac.nz/out/peter-f/TechRep130
DBLP
CACHED:
Abstract:
A recent development in text compression is a "block sorting" algorithm which permutes the input text according to a special sort procedure and then processes the permuted text with Move-to-Front and a final statistical compressor. The technique combines good speed with excellent compression performance. This report investigates the block sorting compression algorithm, in particular trying to understand its operation and limitations. Various approaches are investigated in an attempt to improve the compression with block sorting, most of which involve a hierarchy of coding models to allow fast adaptation to local contexts. The best technique involves a new "structured" coding model, especially designed for compressing data with skew symbol distributions. Block sorting compression is found to be related to work by Shannon in 1951 on the prediction of English text. The work confirms block-sorting as a good text compression technique, with a compression approaching that of the currently be...

