## Structuring labeled trees for optimal succinctness, and beyond (2005)

Venue: | In FOCS |

Citations: | 52 - 8 self |

### BibTeX

@INPROCEEDINGS{Ferragina05structuringlabeled,

author = {Paolo Ferragina and Fabrizio Luccio and Giovanni Manzini and S. Muthukrishnan},

title = {Structuring labeled trees for optimal succinctness, and beyond},

booktitle = {In FOCS},

year = {2005},

pages = {184--196}

}

### Years of Citing Articles

### OpenURL

### Abstract

Consider an ordered, static tree T on t nodes where each node has a label from alphabet set Σ. TreeTmaybeofar bitrary degree and of arbitrary shape. Say, we wish to support basic navigational operations such as find the parent of node u,theith child of u, and any child of u with label α. In a seminal work over fifteen years ago, Jacobson [15] observed that pointer-based tree representations are wasteful in space and introduced the notion of succinct data structures. He studied the special case of unlabeled trees and presented a succinct data structure of 2t+o(t) bits supporting navigational operations in O(1) time. The space used is asymptotically optimal with the information-theoretic lower bound averaged over all trees. This led to a slew of results on succinct data structures for arrays, trees, strings

### Citations

801 | Managing Gigabytes: Compressing and Indexing Documents and Images
- Witten, Moffat, et al.
- 1999
(Show Context)
Citation Context ...applications of trees in Computer Science, be they for representing data or computation, typically generate navigation problems on labeled trees. This includes applications of tries [9], dictionaries =-=[32]-=-, parse trees [25], suffix trees [31] and pixel trees [6] as well as trees used in compiler intermediate representations [3, 16, 29], execution traces [14], and mathematical proofs [24, 4]. In modern ... |

566 | A Block – Sorting Lossless Data compression Algorithm
- Burrows, Wheeler
- 1994
(Show Context)
Citation Context ...beyond succinctness to entropy-bounded data structures. Instead, we take a different approach. In particular, we were inspired by the elegant and surprising BurrowsWheeler transform (BWT) for strings =-=[2]-=-. BWT transforms the string into a permutation, nontrivially derived from sorting the suffixes. Similar substrings get grouped together in this transform. In the past few years, the BWT has been the u... |

504 | Dataguides: Enabling query formulation and optimization in semistructured databases
- Goldman, Widom
- 1997
(Show Context)
Citation Context ...the origin of p could be internal to T . Subpath query is of central interest to the XPATH query language in XML [35] and is also used as a basic block for supporting more sophisticated path searches =-=[11]-=-. Still, no prior algorithmic result is known for supporting subpath queries on trees represented succinctly. We present a succinct data structure for labeled trees basedonthexbw transform using optim... |

427 |
Linear pattern matching algorithms
- Weiner
- 1973
(Show Context)
Citation Context ...ence, be they for representing data or computation, typically generate navigation problems on labeled trees. This includes applications of tries [9], dictionaries [32], parse trees [25], suffix trees =-=[31]-=- and pixel trees [6] as well as trees used in compiler intermediate representations [3, 16, 29], execution traces [14], and mathematical proofs [24, 4]. In modern setting, XML is a tree representation... |

268 | The design and implementation of a certifying compiler
- Necula, Lee
- 1998
(Show Context)
Citation Context ..., dictionaries [32], parse trees [25], suffix trees [31] and pixel trees [6] as well as trees used in compiler intermediate representations [3, 16, 29], execution traces [14], and mathematical proofs =-=[24, 4]-=-. In modern setting, XML is a tree representation of data where each node has string labels [34]; it has become the de facto format for data storage, integration, and is beginning to have native datab... |

261 | Trie memory - Fredkin - 1960 |

193 | High-order entropy-compressed text indexes
- Grossi, Gupta, et al.
- 2003
(Show Context)
Citation Context ...oportional in number to the inherent entropy of the tree as well as the labels. For a string of symbols, the notion of entropy is well developed, understood, and exploited in indexing and compression =-=[7, 13]-=-: high-order entropy depends on frequency of occurrences of substrings of length k. For trees, there is information or entropy in the labels as well as in the subtree 2structure in the neighborhood o... |

193 | Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets - Raman, Raman, et al. |

188 | Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
- Grossi, Vitter
(Show Context)
Citation Context ...Our xbw transform may be thought of as the compressed representation for the suffix tree of the tree. There have been a number of recent results on succinctly representing the suffix tree of a string =-=[7, 12, 22, 13]-=-, but the structural properties of the string are crucially used in building, inverting and searching the suffix tree. For managing arbitrary trees with labels (not just suffix tree of a string) or go... |

188 | XMILL: An efficient compressor for XML Data
- LIEFKE, D
(Show Context)
Citation Context ...ver trees [28, 33]. From these approaches heuristic algorithms for tree compression have been derived and validated only experimentally. In XML compression, well-known softwares like XMILL and XMLPPM =-=[19, 5]-=-, group data according to the labeling path leading to them and then use specialized compressors ideal for each group. The length k of the predictive paths may be selected manually [19] or automatical... |

180 | Opportunistic Data Structures with Application
- Ferrragina, Manzini
- 2000
(Show Context)
Citation Context ...oportional in number to the inherent entropy of the tree as well as the labels. For a string of symbols, the notion of entropy is well developed, understood, and exploited in indexing and compression =-=[7, 13]-=-: high-order entropy depends on frequency of occurrences of substrings of length k. For trees, there is information or entropy in the labels as well as in the subtree 2structure in the neighborhood o... |

170 |
Space-efficient static trees and graphs
- JACOBSON
- 1989
(Show Context)
Citation Context ...y shape. Say, we wish to support basic navigational operations such as find the parent of node u,theith child of u, and any child of u with label α. In a seminal work over fifteen years ago, Jacobson =-=[15]-=- observed that pointer-based tree representations are wasteful in space and introduced the notion of succinct data structures. He studied the special case of unlabeled trees and presented a succinct d... |

150 | Linear work suffix array construction
- Kärkkäinen, Sanders, et al.
- 2006
(Show Context)
Citation Context ...mal O(t) time and uses O(t log t) bits of space. The algorithm, summarized in Fig. 2, sorts the π-components using a straightforward generalization of the skew algorithm for suffix array construction =-=[17]-=-. The only non-trivial step of this algorithm is the recursion (Step 3) in which we sort the paths starting at nodes at levels ̸≡ j (mod 3). The parameter j is chosen in such a way that the number of ... |

140 | Succinct Representation of Balanced Parentheses and Static Trees
- Munro, Raman
(Show Context)
Citation Context ...ntation of trees, without compromising the performance for navigation operations; it is also asymptotically optimal (up to lower order terms) in storage space. Nearly ten years later, Munro and Raman =-=[21]-=- extended the results with more efficient as well as a richer set of operations, including subtree size queries. Since then, a slew of results have fur1 These operations return a NIL value when the ou... |

132 | An analysis of the Burrows-Wheeler transform
- Manzini
(Show Context)
Citation Context ... Similar substrings get grouped together in this transform. In the past few years, the BWT has been the unifying tool for string compression and indexing, producing many important breakthroughs, eg., =-=[20, 8, 7, 13]-=-. The centerpiece of our technical contribution is the new xbw transform we design that, in the spirit of BWT, relies on path sorting and grouping to linearize the labeled tree T into two coordinated ... |

83 | Compressing XML with multiplexed hierarchical PPM models
- CHENEY
(Show Context)
Citation Context ...ver trees [28, 33]. From these approaches heuristic algorithms for tree compression have been derived and validated only experimentally. In XML compression, well-known softwares like XMILL and XMLPPM =-=[19, 5]-=-, group data according to the labeling path leading to them and then use specialized compressors ideal for each group. The length k of the predictive paths may be selected manually [19] or automatical... |

72 | Representing trees of higher degree
- Benoit, Demaine, et al.
(Show Context)
Citation Context ...including subtree size queries. Since then, a slew of results have fur1 These operations return a NIL value when the output is not defined.ther generalized these methods to trees with higher degrees =-=[1]-=- and ever richer sets of operations such as levelancestor queries [10]. Succinct representations have been invented for other data structures including arrays, dictionaries, strings, graphs and multis... |

55 | Space efficient suffix trees
- Munro, Raman, et al.
(Show Context)
Citation Context ...Our xbw transform may be thought of as the compressed representation for the suffix tree of the tree. There have been a number of recent results on succinctly representing the suffix tree of a string =-=[7, 12, 22, 13]-=-, but the structural properties of the string are crucially used in building, inverting and searching the suffix tree. For managing arbitrary trees with labels (not just suffix tree of a string) or go... |

42 | Succinct ordinal trees with level-ancestor queries
- Geary, Raman, et al.
(Show Context)
Citation Context ...r1 These operations return a NIL value when the output is not defined.ther generalized these methods to trees with higher degrees [1] and ever richer sets of operations such as levelancestor queries =-=[10]-=-. Succinct representations have been invented for other data structures including arrays, dictionaries, strings, graphs and multisets. Despite this flurry of activity, the fundamental problem of struc... |

40 |
Efficient tree pattern matching
- Kosaraju
- 1989
(Show Context)
Citation Context ...plication further. There are many surveys on the web including www.cs.uiowa.edu/˜rlawrenc/research/Students/SN 04 XMLCompress.pdf. 3ending into the root. This concept has found uses in tree matching =-=[18]-=-. A pointer-based representation of the suffix tree of T is simple to build, but wasteful. Our xbw transform may be thought of as the compressed representation for the suffix tree of the tree. There h... |

39 | Boosting textual compression in optimal linear time
- FERRAGINA, GIANCARLO, et al.
- 2005
(Show Context)
Citation Context ... Similar substrings get grouped together in this transform. In the past few years, the BWT has been the unifying tool for string compression and indexing, producing many important breakthroughs, eg., =-=[20, 8, 7, 13]-=-. The centerpiece of our technical contribution is the new xbw transform we design that, in the spirit of BWT, relies on path sorting and grouping to linearize the labeled tree T into two coordinated ... |

37 |
Source encoding using syntactic information source models
- Cameron
- 1988
(Show Context)
Citation Context ...ms on labeled trees. This includes applications of tries [9], dictionaries [32], parse trees [25], suffix trees [31] and pixel trees [6] as well as trees used in compiler intermediate representations =-=[3, 16, 29]-=-, execution traces [14], and mathematical proofs [24, 4]. In modern setting, XML is a tree representation of data where each node has string labels [34]; it has become the de facto format for data sto... |

34 | Compression by induction of hierarchical grammars
- Nevill-Manning, Witten, et al.
- 1994
(Show Context)
Citation Context ...ees in Computer Science, be they for representing data or computation, typically generate navigation problems on labeled trees. This includes applications of tries [9], dictionaries [32], parse trees =-=[25]-=-, suffix trees [31] and pixel trees [6] as well as trees used in compiler intermediate representations [3, 16, 29], execution traces [14], and mathematical proofs [24, 4]. In modern setting, XML is a ... |

31 |
Markov random fields on an infinite tree
- Spitzer
- 1975
(Show Context)
Citation Context ...ding to the labels of their parents, or more generally, by a descendant or ancestor subtree of height k, forsome k [3, 30, 27]. These statistical models are related to Markov random fields over trees =-=[28, 33]-=-. From these approaches heuristic algorithms for tree compression have been derived and validated only experimentally. In XML compression, well-known softwares like XMILL and XMLPPM [19, 5], group dat... |

17 | Syntax-directed compression of program files - Katajainen, Penttonen, et al. - 1986 |

14 |
An efficient algorithm for detecting patterns in traces of procedure calls
- Hamou-Lhadj, Lethbridge
- 2003
(Show Context)
Citation Context ...udes applications of tries [9], dictionaries [32], parse trees [25], suffix trees [31] and pixel trees [6] as well as trees used in compiler intermediate representations [3, 16, 29], execution traces =-=[14]-=-, and mathematical proofs [24, 4]. In modern setting, XML is a tree representation of data where each node has string labels [34]; it has become the de facto format for data storage, integration, and ... |

13 |
On the choice of grammar and parser for the compact analytical encoding of programs
- Stone
- 1986
(Show Context)
Citation Context ...ms on labeled trees. This includes applications of tries [9], dictionaries [32], parse trees [25], suffix trees [31] and pixel trees [6] as well as trees used in compiler intermediate representations =-=[3, 16, 29]-=-, execution traces [14], and mathematical proofs [24, 4]. In modern setting, XML is a tree representation of data where each node has string labels [34]; it has become the de facto format for data sto... |

12 | Succinct representation of sequences
- Navarro, Ferragina, et al.
- 2004
(Show Context)
Citation Context ...(s, q) (when s[q] = 1) and select1 queries in O(1) time using log ( ) |s| m + o(m)+ O(log log |s|) bits, where m is the number of 1’s in s. 2. For |Σ| = O(polylog(t)), the generalized wavelet tree in =-=[23]-=- supports rankc and selectc queries in O(1) time using |s|H0(s)+o(|s|) bits of space, where H0(s) denotes the 0th order empirical entropy of the sequence s. 3. For general Σ, the wavelet tree in [13] ... |

12 |
Context coding of parse trees
- Tarhio
- 1995
(Show Context)
Citation Context ...sume a parent-child model of tree generation in which node labels are generated according to the labels of their parents, or more generally, by a descendant or ancestor subtree of height k, forsome k =-=[3, 30, 27]-=-. These statistical models are related to Markov random fields over trees [28, 33]. From these approaches heuristic algorithms for tree compression have been derived and validated only experimentally.... |

12 |
Information measures for discrete random fields
- YE, BERGER
- 1998
(Show Context)
Citation Context ...ding to the labels of their parents, or more generally, by a descendant or ancestor subtree of height k, forsome k [3, 30, 27]. These statistical models are related to Markov random fields over trees =-=[28, 33]-=-. From these approaches heuristic algorithms for tree compression have been derived and validated only experimentally. In XML compression, well-known softwares like XMILL and XMLPPM [19, 5], group dat... |

9 |
Hierarchical coding of binary images
- Cohen, Landy, et al.
- 1985
(Show Context)
Citation Context ...resenting data or computation, typically generate navigation problems on labeled trees. This includes applications of tries [9], dictionaries [32], parse trees [25], suffix trees [31] and pixel trees =-=[6]-=- as well as trees used in compiler intermediate representations [3, 16, 29], execution traces [14], and mathematical proofs [24, 4]. In modern setting, XML is a tree representation of data where each ... |

8 | Statistical models for term compression
- Cheney
- 2000
(Show Context)
Citation Context ..., dictionaries [32], parse trees [25], suffix trees [31] and pixel trees [6] as well as trees used in compiler intermediate representations [3, 16, 29], execution traces [14], and mathematical proofs =-=[24, 4]-=-. In modern setting, XML is a tree representation of data where each node has string labels [34]; it has become the de facto format for data storage, integration, and is beginning to have native datab... |

6 | Smoothing and compression with stochastic k-testable tree languages, Pattern Recognition 38(2002
- Rico-Juan, Carela-Rubio, et al.
(Show Context)
Citation Context ...sume a parent-child model of tree generation in which node labels are generated according to the labels of their parents, or more generally, by a descendant or ancestor subtree of height k, forsome k =-=[3, 30, 27]-=-. These statistical models are related to Markov random fields over trees [28, 33]. From these approaches heuristic algorithms for tree compression have been derived and validated only experimentally.... |