Results 1 
4 of
4
The wavelet matrix: An efficient wavelet tree for large alphabets
 Information Systems
"... The wavelet tree is a flexible data structure that permits representing sequences S[1, n] of symbols over an alphabet of size σ, within compressed space and supporting a wide range of operations on S. When σ is significant compared to n, current wavelet tree representations incur in noticeable space ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
The wavelet tree is a flexible data structure that permits representing sequences S[1, n] of symbols over an alphabet of size σ, within compressed space and supporting a wide range of operations on S. When σ is significant compared to n, current wavelet tree representations incur in noticeable space or time overheads. In this article we introduce the wavelet matrix, an alternative representation for large alphabets that retains all the properties of wavelet trees but is significantly faster. We also show how the wavelet matrix can be compressed up to the zeroorder entropy of the sequence without sacrificing, and actually improving, its time performance. Our experimental results show that the wavelet matrix outperforms all the wavelet tree variants along the space/time tradeoff map. 1
1Efficient and Compact Representations of Prefix Codes
"... Abstract—Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix code itself can be an issue. In this paper we introduc ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix code itself can be an issue. In this paper we introduce and compare several techniques to store prefix codes. Let N be the sequence length and n be the alphabet size. Then a naive storage of an optimal prefix code uses O(n logn) bits. Our first technique shows how to use O(n log log(N/n)) bits to store the optimal prefix code. Then we introduce an approximate technique that, for any 0 < < 1/2, takes O(n log log(1/)) bits to store a prefix code with average codeword length within an additive of the minimum. Finally, a second approximation takes, for any constant c> 1, O
Efficient and Compact Representations of Prefix Codes ∗ † ‡
"... Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix code itself can be an issue. In this paper we introduce and co ..."
Abstract
 Add to MetaCart
(Show Context)
Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix code itself can be an issue. In this paper we introduce and compare several techniques to store prefix codes. Let N be the sequence length and n be the alphabet size. Then a naive storage of an optimal prefix code uses O(n log n) bits. Our first technique shows how to use O(n log log(N/n)) bits to store the optimal prefix code. Then we introduce an approximate technique that, for any 0 < < 1/2, takes O(n log log(1/)) bits to store a prefix code with average codeword length within an additive of the minimum. Finally, a second approximation takes, for any constant c> 1, O(n1/c log n) bits to store a prefix code with average codeword length at most c times the minimum. In all cases, our data structures allow encoding and decoding of any symbol in O(1) time. We implement all those techniques and compare their space/time performance against classical alternatives, showing significant practical improvements. 1
Efficient Compressed Wavelet Trees over Large Alphabets ∗
"... The wavelet tree is a flexible data structure that permits representing sequences S[1, n] of symbols over an alphabet of size σ, within compressed space and supporting a wide range of operations on S. When σ is significant compared to n, current wavelet tree representations incur in noticeable space ..."
Abstract
 Add to MetaCart
(Show Context)
The wavelet tree is a flexible data structure that permits representing sequences S[1, n] of symbols over an alphabet of size σ, within compressed space and supporting a wide range of operations on S. When σ is significant compared to n, current wavelet tree representations incur in noticeable space or time overheads. In this article we introduce the wavelet matrix, an alternative representation for large alphabets that retains all the properties of wavelet trees but is significantly faster. We also show how the wavelet matrix can be compressed up to the zeroorder entropy of the sequence without sacrificing, and actually improving, its time performance. Our experimental results show that the wavelet matrix outperforms all the wavelet tree variants along the space/time tradeoff map. 1