## Alphabet Partitioning for Compressed Rank/Select and Applications

### Cached

### Download Links

Citations: | 18 - 13 self |

### BibTeX

@MISC{Barbay_alphabetpartitioning,

author = {Jérémy Barbay and Travis Gagie and Gonzalo Navarro and Yakov Nekrich},

title = {Alphabet Partitioning for Compressed Rank/Select and Applications},

year = {}

}

### OpenURL

### Abstract

Abstract. We present a data structure that stores a string s[1..n] over the alphabet [1..σ] in nH0(s) + o(n)(H0(s)+1) bits, where H0(s) is the zero-order entropy of s. This data structure supports the queries access and rank in time O (lg lg σ), and the select query in constant time. This result improves on previously known data structures using nH0(s) + o(n lg σ) bits, where on highly compressible instances the redundancy o(n lg σ) cease to be negligible compared to the nH0(s) bits that encode the data. The technique is based on combining previous results through an ingenious partitioning of the alphabet, and practical enough to be implementable. It applies not only to strings, but also to several other compact data structures. For example, we achieve (i) faster search times and lower redundancy for the smallest existing full-text self-index; (ii) compressed permutations π with times for π() and π −1 () improved to log-logarithmic; and (iii) the first compressed representation of dynamic collections of disjoint sets. 1

### Citations

591 | A block sorting lossless data compression algorithm
- Burrows, Wheeler
- 1994
(Show Context)
Citation Context ...hese also represent a sequence, but they support other operations related to text searching. A well known self-index [8] achieves k-th order entropy space by partitioning the BurrowsWheeler transform =-=[6]-=- of the sequence and encoding each partition to its zeroorder entropy. Those partitions must support queries access and rank. By using Theorem 1(i) to represent such partitions, we achieve the followi... |

199 | High-order entropy-compressed text indexes - Grossi, Gupta, et al. |

199 | Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets
- Raman, Raman, et al.
(Show Context)
Citation Context ...we can map characters from Σ ′ to elements of [1..σ] by replacing each a ∈ Σ ′ with its rank in Σ ′ . All elements of Σ ′ are stored in the indexed dictionary data structure described by Raman et al. =-=[20]-=-, so that the following queries are supported in constant time: for any a ∈ Σ ′ its rank in Σ ′ can be found (for any a ̸∈ Σ ′ the answer is −1); for any i ∈ [1..σ] the i-th smallest element in Σ ′ ca... |

179 | Compressed full-text indexes
- Navarro, Mäkinen
(Show Context)
Citation Context ...tring in order to support the queries in less time. The most important queries serve as primitives to implement many other operations, in particular pattern matching in fulltext databases (see, e.g., =-=[18, 7, 14, 19]-=- for recent discussions): given a string s, s.access(i) returns the ith character of s, which we denote s[i]; s.ranka(i) returns the number of occurrences of the character a up to position i; and s.se... |

133 | An analysis of the Burrows-Wheeler transform
- Manzini
(Show Context)
Citation Context ...lg lg σ) O (lg lg σ) O (1) Thm 1 nH0(s) + o(n)(H0(s) + 1) O (1) O (lg lg σ lg lg lg σ) O (lg lg σ) of s (i.e., the minimum self-information of s with respect to a kth-order Markov source; see Manzini =-=[15]-=- for a definition and discussion). The challenge of compressing the string while still supporting the queries efficiently was also achieved, using as little as nH0(s) + o(n lg σ) [11, 8, 3] and even n... |

112 | Compressed representations of sequences and full-text indexes
- Ferragina, Manzini, et al.
(Show Context)
Citation Context ...rce; see Manzini [15] for a definition and discussion). The challenge of compressing the string while still supporting the queries efficiently was also achieved, using as little as nH0(s) + o(n lg σ) =-=[11, 8, 3]-=- and even nHk(s) + o(n lg σ) bits [3] (for any k = o(logσ n)) while retaining the time complexities. One problem with such space is that, on highly compressible data, the o(n lg σ) bits of the index a... |

111 |
Leeuwen. Worst-case analysis of set union algorithms
- Tarjan, van
- 1984
(Show Context)
Citation Context ...oof. We first use Theorem 1 to store the string s[1..n] in which each s[i] is the representative of the set containing i. We then store the representatives in a standard disjoint-set data structure D =-=[22]-=-. Together, our data structures take nH(sets(C)) + O (|C| lg n) + o(n)(H(sets(C)) + 1) bits. We can perform a query find(i) on C by performing D.find(s[i]), and perform a union(i, j) operation on C by... |

89 |
New Text Indexing Functionalities of the Compressed Suffix Arrays
- Sadakane
(Show Context)
Citation Context ...that it does not need to use π−1 ()). ⊓⊔ As an example, given a constant ɛ > 0 and a value t ≤ n, we can combine Corollary 2 and Theorem 5 to obtain a data structure that stores Sadakane’s Ψ function =-=[21]-=- for s in nH0(s) + o(n)(H0(s) + 1) + O (σn ɛ + (n/t) lg n) bits and supports Ψ k () and Ψ −k () queries in O (1/ɛ + t) time; these queries are useful when working on compressed suffix arrays and trees... |

62 |
Rank/select operations on large alphabets: a tool for text indexing
- Golynski, Munro, et al.
(Show Context)
Citation Context ...dexing space in o(n lg σ) is considered asymptotically “negligible” compared to the n lg σ bits required to hold the main data, while providing support for the queries in time O (lg σ). Later results =-=[10]-=- improved the times to O (lg lg σ). Regularities in the string permit further reductions in the space, from n lg σ bits down to nHk(s) bits, where Hk(s) denotes the kth-order empirical entropy ⋆ Funde... |

44 |
Squeezing Succinct Data Structures into Entropy Bounds
- Sadakane, Grossi
- 2006
(Show Context)
Citation Context ...r entropies If we are willing to give up support of rank and select queries, then we can compress s well in terms of nHk(s). We note this result is not new: it was first proven by Sadakane and Grossi =-=[14]-=-, simplified by González and Navarro [7] and then further simplified by Ferragina and Venturini [5]. The next theorem can be seen as yet a further simplification. Theorem 2. We can store s in nHk(s) +... |

43 |
Index compression through document reordering
- Blandford, Blelloch
- 2002
(Show Context)
Citation Context ...t traversal, writing a special symbol (e.g. $) at each change of row. This improves the space of previously known data structures [2], and improves the time complexity of previous compression results =-=[5]-=-. Naturally, the next challenge ahead is to obtain a data structure using space nHk(s)+o(n)(Hk(s)+1) bits rather than nHk(s)+o(n) lg σ, while still supporting the queries access, rank, and select, in ... |

41 | Succinct indexes for strings, binary relations and multi-labeled trees
- Barbay, He, et al.
(Show Context)
Citation Context ...rce; see Manzini [15] for a definition and discussion). The challenge of compressing the string while still supporting the queries efficiently was also achieved, using as little as nH0(s) + o(n lg σ) =-=[11, 8, 3]-=- and even nHk(s) + o(n lg σ) bits [3] (for any k = o(logσ n)) while retaining the time complexities. One problem with such space is that, on highly compressible data, the o(n lg σ) bits of the index a... |

39 |
Succinct representations of permutations
- Munro, Raman, et al.
- 2003
(Show Context)
Citation Context ...ation, a compressed function and a compressed dynamic collection of disjoint sets, while supporting a rich set of operations on those. This improves or gives alternatives to the best previous results =-=[4, 17, 12]-=-. We have approached these applications in such a way that an improvement to our main result, however achieved, translates into improved bounds for them as well. 2 Alphabet partitioning Let s[1..n] be... |

33 | A simple storage scheme for strings achieving entropy bounds
- Ferragina, Venturini
- 2007
(Show Context)
Citation Context ... = o(n), and it becomes nH0(s) + o(n) when σ = log O(1) n. In the times for our results, σ can be changed to min(σ,n/occ (a,s)), where a is the character involved. space (bits) access rank select [1]+=-=[5]-=- nHk(s) + o(n) log σ + n o(log σ) O (1) O ( log log σ(log log log σ) 2) O (log log σ log log log σ) [4] nH0(s) + o(n) log σ O ( ) 1 + O ( ) 1 + O ( ) 1 + log σ log log n log σ log log n [6] nlog σ + n... |

32 | Rank and select revisited and extended
- Mäkinen, Navarro
(Show Context)
Citation Context ...tring in order to support the queries in less time. The most important queries serve as primitives to implement many other operations, in particular pattern matching in fulltext databases (see, e.g., =-=[18, 7, 14, 19]-=- for recent discussions): given a string s, s.access(i) returns the ith character of s, which we denote s[i]; s.ranka(i) returns the number of occurrences of the character a up to position i; and s.se... |

31 | Practical rank/select queries over arbitrary sequences
- CLAUDE, NAVARRO
(Show Context)
Citation Context ...tring in order to support the queries in less time. The most important queries serve as primitives to implement many other operations, in particular pattern matching in fulltext databases (see, e.g., =-=[18, 7, 14, 19]-=- for recent discussions): given a string s, s.access(i) returns the ith character of s, which we denote s[i]; s.ranka(i) returns the number of occurrences of the character a up to position i; and s.se... |

27 | Adaptive searching in succinctly encoded binary relations and tree-structured documents
- Barbay, Golynski, et al.
(Show Context)
Citation Context ...chieved by encoding the string of labels encountered during a row-first traversal, writing a special symbol (e.g. $) at each change of row. This improves the space of previously known data structures =-=[2]-=-, and improves the time complexity of previous compression results [5]. Naturally, the next challenge ahead is to obtain a data structure using space nHk(s)+o(n)(Hk(s)+1) bits rather than nHk(s)+o(n) ... |

26 | Statistical encoding of succinct data structures
- GONZÁLEZ, NAVARRO
- 2006
(Show Context)
Citation Context ...support of rank and select queries, then we can compress s well in terms of nHk(s). We note this result is not new: it was first proven by Sadakane and Grossi [14], simplified by González and Navarro =-=[7]-=- and then further simplified by Ferragina and Venturini [5]. The next theorem can be seen as yet a further simplification. Theorem 2. We can store s in nHk(s) + o(n)log σ bits for all k = o(log σ n) a... |

20 | Compressed representations of permutations, and applications
- Barbay, Navarro
(Show Context)
Citation Context ...er sampling, which they could also use. 4 Compressing permutations We now show how to use access/rank/select data structures to store a compressed permutation. We follow Barbay and Navarro’s notation =-=[4]-=- and improve their space and, especially, their time performance. They measure the compressibility of a permutation π in terms of the entropy of the distribution of the lengths of runs of different ki... |

8 |
Sorting presorted files
- Mehlhorn
- 1979
(Show Context)
Citation Context ...the compressibility of a permutation π in terms of the entropy of the distribution of the lengths of runs of different kinds. Let π be covered by ρ runs (using any of the previous definitions of runs =-=[13, 4, 16]-=-) of lengths runs(π) = 〈n1, . . . , nρ〉. Then H(runs(π)) = ∑ ni n ≤ lg ρ is called the entropy of the runs (and, because ni ≥ 1, it also holds nH(runs(π)) ≥ (ρ − 1) lg n). We first consider permutatio... |

7 |
Sorting and searching revisited
- Andersson
- 1996
(Show Context)
Citation Context ...re a predecessor data structure containing the runs’ minima as keys with their positions in the array as auxiliary information. The predecessor data structure is based on Lemma 4 of Andersson’s paper =-=[1]-=-. It is an nɛ-ary trie where the keys are sought considering ɛ lg n bits per trie node, and hence found in O (1/ɛ) time. Each of the ρ elements may require O ((1/ɛ)nɛ lg n) bit space for the nɛ-size c... |

7 |
Sorting shuffled monotone sequences
- Levcopoulos, Petersson
- 1994
(Show Context)
Citation Context ...the compressibility of a permutation π in terms of the entropy of the distribution of the lengths of runs of different kinds. Let π be covered by ρ runs (using any of the previous definitions of runs =-=[13, 4, 16]-=-) of lengths runs(π) = 〈n1, . . . , nρ〉. Then H(runs(π)) = ∑ ni n ≤ lg ρ is called the entropy of the runs (and, because ni ≥ 1, it also holds nH(runs(π)) ≥ (ρ − 1) lg n). We first consider permutatio... |

3 | Fast and compact prefix codes
- Gagie, Navarro, et al.
(Show Context)
Citation Context ...l work on several improvements and further applications. First, we can reduce the dependence on the alphabet size from O (σ lg lg n) to O (σ) by storing a length-restricted Shannon code in O (σ) bits =-=[9]-=- instead of the data structure M. To avoid the O (1) extra redundancy per character associated with using a length-restricted prefix code, we replace each character in s whose codeword length is at mo... |

2 | Storing a compressed function with constant time access
- Hreinsson, Krøyer, et al.
- 2009
(Show Context)
Citation Context ...ation, a compressed function and a compressed dynamic collection of disjoint sets, while supporting a rich set of operations on those. This improves or gives alternatives to the best previous results =-=[4, 17, 12]-=-. We have approached these applications in such a way that an improvement to our main result, however achieved, translates into improved bounds for them as well. 2 Alphabet partitioning Let s[1..n] be... |

1 |
Rank and select operations on binary strings
- Rahman, Raman
- 2008
(Show Context)
Citation Context |

1 |
Succinct representations of permuations
- Munro, Raman, et al.
- 2003
(Show Context)
Citation Context ...ation, a compressed function and a compressed dynamic collection of disjoint sets, while supporting a rich set of operations on those. This improves or gives alternatives to the best previous results =-=[2,10,8]-=-. 2 Alphabet partitioning Suppose s is a sequence over effective alphabet [σ], that is, every character appears in s, thus σ ≤ n. 1 The zero-order entropy of s is H0(s) = ∑ occ(a,s) n a∈[σ] n log occ(... |