Results 1 
2 of
2
Alphabet Partitioning for Compressed Rank/Select and Applications
"... Abstract. We present a data structure that stores a string s[1..n] over the alphabet [1..σ] in nH0(s) + o(n)(H0(s)+1) bits, where H0(s) is the zeroorder entropy of s. This data structure supports the queries access and rank in time O (lg lg σ), and the select query in constant time. This result imp ..."
Abstract

Cited by 18 (13 self)
 Add to MetaCart
Abstract. We present a data structure that stores a string s[1..n] over the alphabet [1..σ] in nH0(s) + o(n)(H0(s)+1) bits, where H0(s) is the zeroorder entropy of s. This data structure supports the queries access and rank in time O (lg lg σ), and the select query in constant time. This result improves on previously known data structures using nH0(s) + o(n lg σ) bits, where on highly compressible instances the redundancy o(n lg σ) cease to be negligible compared to the nH0(s) bits that encode the data. The technique is based on combining previous results through an ingenious partitioning of the alphabet, and practical enough to be implementable. It applies not only to strings, but also to several other compact data structures. For example, we achieve (i) faster search times and lower redundancy for the smallest existing fulltext selfindex; (ii) compressed permutations π with times for π() and π −1 () improved to loglogarithmic; and (iii) the first compressed representation of dynamic collections of disjoint sets. 1
Efficient FullyCompressed Sequence Representations
, 2010
"... We present a data structure that stores a sequence s[1..n] over alphabet [1..σ] in nH0(s) + o(n)(H0(s)+1) bits, where H0(s) is the zeroorder entropy of s. This structure supports the queries access, rank and select, which are fundamental building blocks for many other compressed data structures, in ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We present a data structure that stores a sequence s[1..n] over alphabet [1..σ] in nH0(s) + o(n)(H0(s)+1) bits, where H0(s) is the zeroorder entropy of s. This structure supports the queries access, rank and select, which are fundamental building blocks for many other compressed data structures, in worstcase time O (lg lg σ) and average time O (lg H0(s)). The worstcase complexity matches the best previous results, yet these had been achieved with data structures using nH0(s) + o(n lg σ) bits. On highly compressible sequences the o(n lg σ) bits of the redundancy may be significant compared to the the nH0(s) bits that encode the data. Our representation, instead, compresses the redundancy as well. Moreover, our averagecase complexity is unprecedented. Our technique is based on partitioning the alphabet into characters of similar frequency. The subsequence corresponding to each group can then be encoded using fast uncompressed representations without harming the overall compression ratios, even in the redundancy. The result also improves upon the best current compressed representations of several other data structures. For example, we achieve (i) compressed redundancy, retaining the best time complexities, for the smallest existing fulltext selfindexes; (ii) compressed permutations π with times for π() and π −1 () improved to loglogarithmic; and (iii) the first compressed representation of dynamic collections of disjoint sets. We also point out various applications to inverted indexes, suffix arrays, binary relations, and data compressors. Our structure is practical on large alphabets. Our experiments show that, as predicted by theory, it dominates the space/time tradeoff map of all the sequence representations, both in synthetic and application scenarios.