Results 1  10
of
10
A simpler analysis of BurrowsWheeler based compression
 In Proc. of the 17th Symposium on Combinatorial Pattern Matching (CPM ’06). SpringerVerlag LNCS
, 2006
"... In this paper we present a new technique for worstcase analysis of compression algorithms which are based on the BurrowsWheeler Transform. We deal mainly with the algorithm proposed by Burrows and Wheeler in their first paper on the subject [6], called bw0. This algorithm consists of the following ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
In this paper we present a new technique for worstcase analysis of compression algorithms which are based on the BurrowsWheeler Transform. We deal mainly with the algorithm proposed by Burrows and Wheeler in their first paper on the subject [6], called bw0. This algorithm consists of the following three essential steps: 1) Obtain the BurrowsWheeler Transform of the text, 2) Convert the transform into a sequence of integers using the movetofront algorithm, 3) Encode the integers using Arithmetic code or any order0 encoding (possibly with runlength encoding). We achieve a strong upper bound on the worstcase compression ratio of this algorithm. This bound is significantly better than bounds known to date and is obtained via simple analytical techniques. Specifically, we show that for any input string s, and µ> 1, the length of the compressed string is bounded by µ · sHk(s)+ log(ζ(µ)) · s  + µgk + O(log n) where Hk is the kth order empirical entropy, gk is a constant depending only on k and on the size of the alphabet, and ζ(µ) = 1 1 1 µ+ 2 µ+... is the standard zeta function. As part of the analysis we prove a result on the compressibility of integer sequences, which is of independent interest. Finally, we apply our techniques to prove a worstcase bound on the compression ratio of a compression algorithm based on the BurrowsWheeler Transform followed by distance coding, for which worstcase guarantees have never been given. We prove that the length of the compressed string is bounded by 1.7286 · sHk(s) + gk + O(log n). This bound is better than the bound we give for bw0.
The Responsa Storage and Retrieval System  Whither?
, 1996
"... p. 173). We did develop such a tool [CCDFS1971]. As each of these methods has certain advantages and disadvantages, we ended up by merging  2  them into a joint analysissynthesis method; a global analysis of all words in the database is done, but without prepositions (otiyot shimush), in order ..."
Abstract
 Add to MetaCart
p. 173). We did develop such a tool [CCDFS1971]. As each of these methods has certain advantages and disadvantages, we ended up by merging  2  them into a joint analysissynthesis method; a global analysis of all words in the database is done, but without prepositions (otiyot shimush), in order to end up with a database of manageable size; the prepositions are left to the synthesis phase. See [AFCS1972] for full details. I also set up a "Committee for the Mechanization in Jewish Law Research" whose first members were, I think, Dr. Choueka, Mr. Asa Kasher, later professor of Philosophy at Tel Aviv University, Mr. Joseph Dueck, a young lawyer and research assistant at the IRJL, who served as their representative, and assistants, to formulate procedures for preediting and postediting texts to be inputted, and various algorithms needed for the work. (Many other persons, such as Mr. Reuven Mirkin of the Academy of the Hebrew Language, and research students, joined later.) I also felt ...
The BurrowsWheeler compression algorithm is even better than what you have thought
, 2005
"... The best compression algorithm today for English text is based on the BurrowsWheeler transform. This algorithm (whose common implementation is bzip2) consists of the following three essential steps: 1) Obtain the BurrowsWheeler transform of the text, 2) Convert the transform into a sequence of int ..."
Abstract
 Add to MetaCart
The best compression algorithm today for English text is based on the BurrowsWheeler transform. This algorithm (whose common implementation is bzip2) consists of the following three essential steps: 1) Obtain the BurrowsWheeler transform of the text, 2) Convert the transform into a sequence of integers using the movetofront algorithm, 3) Encode the integers using arithmetic code or any order0 encoding (possibly with run length encoding). In this paper we achieve a strong bound on the worstcase compression ratio of this algorithm, that is significantly better than bounds known to date and is obtained via simple analytical techniques. Specifically, for any input string s, and µ> 1, the length of the compressed string is bounded by µ · sHk(s) + log(ζ(µ)) · s  + gk where Hk is the kth order empirical entropy, gk is a constant depending only on k and on the size of the alphabet, and ζ(µ) = 1 1 µ + 1 2 µ +... is the standard zeta function. In fact we prove a stronger result: That this bound without the additive term gk holds when we replace Hk(s) by the sum of the logarithms of the integers obtain by the movetofront encoding of the transform. This refined bound is tight and close to the actual compression achieved in practice. To obtain this result we prove a tight result on the compressibility of integer sequences, which is of independent interest. 1
Working with Compressed Concordances
"... Abstract. A combination of new compression methods is suggested in order to compress the concordance of a large Information Retrieval system. The methods are aimed at allowing most of the processing directly on the compressed file, requesting decompression, if at all, only for small parts of the acc ..."
Abstract
 Add to MetaCart
Abstract. A combination of new compression methods is suggested in order to compress the concordance of a large Information Retrieval system. The methods are aimed at allowing most of the processing directly on the compressed file, requesting decompression, if at all, only for small parts of the accessed data, saving I/O operations and CPU time.
Huffman Coding with NonSorted Frequencies
"... Abstract. A standard way of implementing Huffman’s optimal code construction algorithm is by using a sorted sequence of frequencies. Several aspects of the algorithm are investigated as to the consequences of relaxing the requirement of keeping the frequencies in order. Using only partial order may ..."
Abstract
 Add to MetaCart
Abstract. A standard way of implementing Huffman’s optimal code construction algorithm is by using a sorted sequence of frequencies. Several aspects of the algorithm are investigated as to the consequences of relaxing the requirement of keeping the frequencies in order. Using only partial order may speed up the code construction, which is important in some applications, at the cost of increasing the size of the encoded file. 1.
On the Usefulness of Fibonacci Compression Codes
, 2004
"... Recent publications advocate the use of various variable length codes for which each codeword consists of an integral number of bytes in compression applications using large alphabets. This paper shows that another tradeoff with similar properties can be obtained by Fibonacci codes. These are fixed ..."
Abstract
 Add to MetaCart
Recent publications advocate the use of various variable length codes for which each codeword consists of an integral number of bytes in compression applications using large alphabets. This paper shows that another tradeoff with similar properties can be obtained by Fibonacci codes. These are fixed codeword sets, using binary representations of integers based on Fibonacci numbers of order m ≥ 2. Fibonacci codes have been used before, and this paper extends previous work presenting several novel features. In particular, the compression efficiency is analyzed and compared to that of dense codes, and various tabledriven decoding routines are suggested.
EFFICIENT ALGORITHMS FOR ZECKENDORF ARITHMETIC
"... Abstract. We study the problem of addition and subtraction using the Zeckendorf representation of integers. We show that both operations can be performed in linear time; in fact they can be performed by combinational logic networks with linear size and logarithmic depth. The implications of these re ..."
Abstract
 Add to MetaCart
Abstract. We study the problem of addition and subtraction using the Zeckendorf representation of integers. We show that both operations can be performed in linear time; in fact they can be performed by combinational logic networks with linear size and logarithmic depth. The implications of these results for multiplication, division and squareroot extraction are also discussed. 1.