## Statistical encoding of succinct data structures (2006)

### Cached

### Download Links

- [www.dcc.uchile.cl]
- [www.dcc.uchile.cl]
- [www.dcc.uchile.cl]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proc. 17th CPM, LNCS 4009 |

Citations: | 26 - 11 self |

### BibTeX

@INPROCEEDINGS{González06statisticalencoding,

author = {Rodrigo González and Gonzalo Navarro},

title = {Statistical encoding of succinct data structures},

booktitle = {In Proc. 17th CPM, LNCS 4009},

year = {2006},

pages = {295--306}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. In recent work, Sadakane and Grossi [SODA 2006] introduced a scheme to represent any (k log σ + log log n)) bits sequence S = s1s2... sn, over an alphabet of size σ, using nHk(S) + O ( n log σ n of space, where Hk(S) is the k-th order empirical entropy of S. The representation permits extracting any substring of size Θ(log σ n) in constant time, and thus it completely replaces S under the RAM model. This is extremely important because it permits converting any succinct data structure requiring o(|S|) = o(n log σ) bits in addition to S, into another requiring nHk(S) + o(n log σ) (overall) for any k = o(log σ n). They achieve this result by using Ziv-Lempel compression, and conjecture that the result can in particular be useful to implement compressed full-text indexes. In this paper we extend their result, by obtaining the same space and time complexities using a simpler scheme based on statistical encoding. We show that the scheme supports appending symbols in constant amortized time. In addition, we prove some results on the applicability of the scheme for full-text selfindexing. 1

### Citations

639 |
Text Compression
- Bell, Cleary, et al.
- 1990
(Show Context)
Citation Context ...any substring of S of Θ(log σ n) symbols in constant time. Although any statistical encoder works, we obtain the best results (matching exactly those of Sadakane and Grossi) using Arithmetic encoding =-=[1]-=-. Furthermore, we show that we can append symbols to S without changing the space complexity, in constant amortized time per symbol. In addition, we study the applicability of this technique to full-t... |

199 | Succinct Indexable Dictionaries with Applications to Encoding k-aray Trees and Multisets
- Raman, Raman, et al.
- 2002
(Show Context)
Citation Context ...riginal text. Their aim is to represent the data using as little space as possible, yet efficiently answering queries on the represented data. Several results exist on the representation of sequences =-=[9, 3, 18, 19]-=-, trees [15, 6], graphs [15], permutations and functions [14, 16], and texts [8, 20, 4, 7,17, 5], to name a few. Several of those succinct data structures are built over a sequence of symbols S[1,n] =... |

198 | High-Order Entropy-Compressed Text Indexes
- Grossi, Gupta, et al.
- 2003
(Show Context)
Citation Context ...iciently answering queries on the represented data. Several results exist on the representation of sequences [9, 3, 18, 19], trees [15, 6], graphs [15], permutations and functions [14, 16], and texts =-=[8, 20, 4, 7,17, 5]-=-, to name a few. Several of those succinct data structures are built over a sequence of symbols S[1,n] = s1s2 ...sn, from an alphabet A of size σ, and require only o(|S|) = o(n log σ) additional bits ... |

192 | Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract - Grossi, Vitter - 2000 |

187 | Opportunistic data structures with applications
- Ferragina, Manzini
- 2000
(Show Context)
Citation Context ...iciently answering queries on the represented data. Several results exist on the representation of sequences [9, 3, 18, 19], trees [15, 6], graphs [15], permutations and functions [14, 16], and texts =-=[8, 20, 4, 7,17, 5]-=-, to name a few. Several of those succinct data structures are built over a sequence of symbols S[1,n] = s1s2 ...sn, from an alphabet A of size σ, and require only o(|S|) = o(n log σ) additional bits ... |

143 | Succinct representation of balanced parentheses and static trees
- MUNRO, RAMAN
(Show Context)
Citation Context ...m is to represent the data using as little space as possible, yet efficiently answering queries on the represented data. Several results exist on the representation of sequences [9, 3, 18, 19], trees =-=[15, 6]-=-, graphs [15], permutations and functions [14, 16], and texts [8, 20, 4, 7,17, 5], to name a few. Several of those succinct data structures are built over a sequence of symbols S[1,n] = s1s2 ...sn, fr... |

133 | An analysis of the Burrows-Wheeler transform
- Manzini
(Show Context)
Citation Context ...to a compressed data structure. More precisely, they show that S can be encoded using nHk(S) + O( n logσ n (k log σ + log log n)) bits of space4 , where Hk(S) is the k-th order empirical entropy of S =-=[12]-=-. (Hk(S) is a lower bound to the space achieved by any k-th order compressor applied to S.) Their structure permits retrieving any substring of S of Θ(logσ n) ⋆ Work supported by Mecesup Grant UCH 010... |

96 |
Compact Pat Trees
- Clark
- 1996
(Show Context)
Citation Context ...riginal text. Their aim is to represent the data using as little space as possible, yet efficiently answering queries on the represented data. Several results exist on the representation of sequences =-=[9, 3, 18, 19]-=-, trees [15, 6], graphs [15], permutations and functions [14, 16], and texts [8, 20, 4, 7,17, 5], to name a few. Several of those succinct data structures are built over a sequence of symbols S[1,n] =... |

66 | Indexing text using the Ziv-Lempel trie
- Navarro
(Show Context)
Citation Context ...iciently answering queries on the represented data. Several results exist on the representation of sequences [9, 3, 18, 19], trees [15, 6], graphs [15], permutations and functions [14, 16], and texts =-=[8, 20, 4, 7,17, 5]-=-, to name a few. Several of those succinct data structures are built over a sequence of symbols S[1,n] = s1s2 ...sn, from an alphabet A of size σ, and require only o(|S|) = o(n log σ) additional bits ... |

62 |
Succinct Static Data Structure
- Jacobson
- 1988
(Show Context)
Citation Context ...riginal text. Their aim is to represent the data using as little space as possible, yet efficiently answering queries on the represented data. Several results exist on the representation of sequences =-=[9, 3, 18, 19]-=-, trees [15, 6], graphs [15], permutations and functions [14, 16], and texts [8, 20, 4, 7,17, 5], to name a few. Several of those succinct data structures are built over a sequence of symbols S[1,n] =... |

62 | Compressed text databases with efficient query algorithms based on the compressed suffix array
- Sadakane
- 2000
(Show Context)
Citation Context |

54 | Structuring labeled trees for optimal succinctness, and beyond
- Ferragina, Luccio, et al.
- 2005
(Show Context)
Citation Context ...heir aim is to represent the data using as little space as possible, yet efficiently answering queries on the represented data. Several results exist on the representation of sequences [11,16], trees =-=[13,3,4]-=-, graphs [13], permutations and functions [12,14], texts [5,7,6,9], etc. Several of those succinct data structures are built over a sequence of symbols S[1, n] = s1s2 . . .sn, from an alphabet A of si... |

53 | Succinct suffix arrays based on run-length encoding
- Mäkinen, Navarro
(Show Context)
Citation Context ...S) and Hk(T) remains unknown. In this paper we move a step forward by proving a positive result: H1(S) ≤ Hk(T)log σ + o(1) for small k = o(log σ n). We note that, for example, the Run-Length FM-Index =-=[11]-=- achieves precisely nHk(T)log σ+O(n) bits of space by rather involved means (albeit for larger k). This result shows that (essentially) the same can be achieved by applying the new structure over S. S... |

45 | An alphabet-friendly FMindex
- Ferragina, Manzini, et al.
- 2004
(Show Context)
Citation Context ...ith our technique. Then, the question is how does Hk(X) relate to Hk(T). 5.1 The Burrows-Wheeler Transform The Burrows-Wheeler Transform, S = bwt(T), is used by many compressed full-text self-indexes =-=[8, 20, 4, 7, 17, 5, 11]-=-. We have covered it in Section 2.3. We show that there is a relationship between the k-th order entropy of a text T and the first order entropy of S = bwt(T). For this sake, we will compress S with a... |

44 |
Squeezing Succinct Data Structures into Entropy Bounds
- Sadakane, Grossi
- 2006
(Show Context)
Citation Context ...ture, which takes overall space proportional to the compressed size of S and still is able to recover any substring of S and manipulate the data structure. A very recent result by Sadakane and Grossi =-=[22]-=- gives a tool to convert any succinct data structure on sequences into a compressed data structure. More precisely, they show that S can be encoded using nHk(S) + O( n logσ n (k log σ + log log n)) bi... |

40 | A simple optimal representation for balanced parentheses
- Geary, Rahman, et al.
- 2004
(Show Context)
Citation Context ...m is to represent the data using as little space as possible, yet efficiently answering queries on the represented data. Several results exist on the representation of sequences [9, 3, 18, 19], trees =-=[15, 6]-=-, graphs [15], permutations and functions [14, 16], and texts [8, 20, 4, 7,17, 5], to name a few. Several of those succinct data structures are built over a sequence of symbols S[1,n] = s1s2 ...sn, fr... |

39 |
Succinct representations of permutations
- Munro, Raman, et al.
- 2003
(Show Context)
Citation Context ...space as possible, yet efficiently answering queries on the represented data. Several results exist on the representation of sequences [11,16], trees [13,3,4], graphs [13], permutations and functions =-=[12,14]-=-, texts [5,7,6,9], etc. Several of those succinct data structures are built over a sequence of symbols S[1, n] = s1s2 . . .sn, from an alphabet A of size σ, and require only o(|S|) = o(n log σ) additi... |

38 | Compression of low entropy strings with lempel-ziv algorithms - Kosaraju, Manzini - 1999 |

19 |
Succinct Representations of Functions
- Munro, Rao
- 2004
(Show Context)
Citation Context ...as possible, yet efficiently answering queries on the represented data. Several results exist on the representation of sequences [9, 3, 18, 19], trees [15, 6], graphs [15], permutations and functions =-=[14, 16]-=-, and texts [8, 20, 4, 7,17, 5], to name a few. Several of those succinct data structures are built over a sequence of symbols S[1,n] = s1s2 ...sn, from an alphabet A of size σ, and require only o(|S|... |

19 | Compressing and searching XML data via two zips
- Ferragina, Luccio, et al.
- 2006
(Show Context)
Citation Context ...heir aim is to represent the data using as little space as possible, yet efficiently answering queries on the represented data. Several results exist on the representation of sequences [11,16], trees =-=[13,3,4]-=-, graphs [13], permutations and functions [12,14], texts [5,7,6,9], etc. Several of those succinct data structures are built over a sequence of symbols S[1, n] = s1s2 . . .sn, from an alphabet A of si... |

18 | Low redundancy in dictionaries with O(1) worst case lookup time - Pagh - 1999 |