@TECHREPORT{Fenwick96symbolranking, author = {Peter Fenwick}, title = {Symbol Ranking Text Compression}, institution = {}, year = {1996} }
Share
OpenURL
Abstract
In his work on the information content of English text in 1951, Shannon described a method of recoding the input text, a technique which has apparently lain dormant for the ensuing 45 years. Whereas traditional compressors exploit symbol frequencies and symbol contexts, Shannon's method adds the concept of "symbol ranking", as in `the next symbol is the one 3rd most likely in the present context'. This report describes an implementation of his method and shows that it forms the basis of a good text compressor. 1 The recent "acb" compressor of Buynovsky is shown to belong to the general class of symbol ranking compressors. Keywords text compression, Shannon, symbol ranking 1 This report has been submitted as a paper to the Journal of Universal Computer Science. It is available by anonymous ftp from ftp.cs.auckland.ac.nz /out/peter-f/TechRep132 1. Introduction In 1951 C.E. Shannon published his classic paper on the information content of English text, establishing the well-known bo...