Results 1  10
of
54
Practical EntropyCompressed Rank/Select Dictionary
 PROCEEDINGS OF ALENEX’07, ACM
, 2007
"... Rank/Select dictionaries are data structures for an ordered set S ⊂ {0, 1,..., n − 1} to compute rank(x, S) (the number of elements in S which are no greater than x), and select(i, S) (the ith smallest element in S), which are the fundamental components of succinct data structures of strings, trees ..."
Abstract

Cited by 83 (2 self)
 Add to MetaCart
(Show Context)
Rank/Select dictionaries are data structures for an ordered set S ⊂ {0, 1,..., n − 1} to compute rank(x, S) (the number of elements in S which are no greater than x), and select(i, S) (the ith smallest element in S), which are the fundamental components of succinct data structures of strings, trees, graphs, etc. In those data structures, however, only asymptotic behavior has been considered and their performance for real data is not satisfactory. In this paper, we propose novel four Rank/Select dictionaries, esp, recrank, vcode and sdarray, each of which is small if the number of elements in S is small, and indeed close to nH0(S) (H0(S) ≤ 1 is the zeroth order empirical entropy of S) in practice, and its query time is superior to the previous ones. Experimental results reveal the characteristics of our data structures and also show that these data structures are superior to existing implementations in both size and query time.
Fullyfunctional succinct trees
 In Proc. 21st SODA
, 2010
"... We propose new succinct representations of ordinal trees, which have been studied extensively. It is known that any nnode static tree can be represented in 2n + o(n) bits and a large number of operations on the tree can be supported in constant time under the wordRAM model. However existing data s ..."
Abstract

Cited by 64 (22 self)
 Add to MetaCart
(Show Context)
We propose new succinct representations of ordinal trees, which have been studied extensively. It is known that any nnode static tree can be represented in 2n + o(n) bits and a large number of operations on the tree can be supported in constant time under the wordRAM model. However existing data structures are not satisfactory in both theory and practice because (1) the lowerorder term is Ω(nlog log n / log n), which cannot be neglected in practice, (2) the hidden constant is also large, (3) the data structures are complicated and difficult to implement, and (4) the techniques do not extend to dynamic trees supporting insertions and deletions of nodes. We propose a simple and flexible data structure, called the range minmax tree, that reduces the large number of relevant tree operations considered in the literature to a few primitives, which are carried out in constant time on sufficiently small trees. The result is then extended to trees of arbitrary size, achieving 2n + O(n/polylog(n)) bits of space. The redundancy is significantly lower than in any previous proposal, and the data structure is easily implemented. Furthermore, using the same framework, we derive the first fullyfunctional dynamic succinct trees. 1
Succinct indexes for strings, binary relations, and multilabeled trees
 IN: PROC. SODA
, 2007
"... We define and design succinct indexes for several abstract data types (ADTs). The concept is to design auxiliary data structures that ideally occupy asymptotically less space than the informationtheoretic lower bound on the space required to encode the given data, and support an extended set of ope ..."
Abstract

Cited by 62 (22 self)
 Add to MetaCart
We define and design succinct indexes for several abstract data types (ADTs). The concept is to design auxiliary data structures that ideally occupy asymptotically less space than the informationtheoretic lower bound on the space required to encode the given data, and support an extended set of operations using the basic operators defined in the ADT. The main advantage of succinct indexes as opposed to succinct (integrated data/index) encodings is that we make assumptions only on the ADT through which the main data is accessed, rather than the way in which the data is encoded. This allows more freedom in the encoding of the main data. In this paper, we present succinct indexes for various data types, namely strings, binary relations and multilabeled trees. Given the support for the interface of the ADTs of these data types, we can support various useful operations efficiently by constructing succinct indexes for them. When the operators in the ADTs are supported in constant time, our results are comparable to previous results, while allowing more flexibility in the encoding of the given data. Using our techniques, we design a succinct encoding that represents a string of length n over an alphabet of size σ using nHk(S)+lgσ·o(n)+O ( nlgσ lglglgσ) bits to support access/rank/select operations in o((lglgσ)1+ɛ) time, for any fixed constant ɛ> 0. We also design a succinct text index using nH0(S)+O ( nlgσ) bits that lglgσ
Structuring labeled trees for optimal succinctness, and beyond
 In FOCS
, 2005
"... Consider an ordered, static tree T on t nodes where each node has a label from alphabet set Σ. TreeTmaybeofar bitrary degree and of arbitrary shape. Say, we wish to support basic navigational operations such as find the parent of node u,theith child of u, and any child of u with label α. In a semina ..."
Abstract

Cited by 59 (8 self)
 Add to MetaCart
(Show Context)
Consider an ordered, static tree T on t nodes where each node has a label from alphabet set Σ. TreeTmaybeofar bitrary degree and of arbitrary shape. Say, we wish to support basic navigational operations such as find the parent of node u,theith child of u, and any child of u with label α. In a seminal work over fifteen years ago, Jacobson [15] observed that pointerbased tree representations are wasteful in space and introduced the notion of succinct data structures. He studied the special case of unlabeled trees and presented a succinct data structure of 2t+o(t) bits supporting navigational operations in O(1) time. The space used is asymptotically optimal with the informationtheoretic lower bound averaged over all trees. This led to a slew of results on succinct data structures for arrays, trees, strings
Efficient Memory Representation of XML Document Trees
, 2007
"... Implementations that load XML documents and give access to them via, e.g., the DOM, suffer from huge memory demands: the space needed to load an XML document is usually many times larger than the size of the document. A considerable amount of memory is needed to store the tree structure of the XML d ..."
Abstract

Cited by 48 (12 self)
 Add to MetaCart
(Show Context)
Implementations that load XML documents and give access to them via, e.g., the DOM, suffer from huge memory demands: the space needed to load an XML document is usually many times larger than the size of the document. A considerable amount of memory is needed to store the tree structure of the XML document. In this paper, a technique is presented that allows to represent the tree structure of an XML document in an efficient way. The representation exploits the high regularity in XML documents by compressing their tree structure; the latter means to detect and remove repetitions of tree patterns. Formally, contextfree tree grammars that generate only a single tree are used for tree compression. The functionality of basic tree operations, like traversal along edges, is preserved under this compressed representation. This allows to directly execute queries (and in particular, bulk operations) without prior decompression. The complexity of certain computational problems like validation against XML types or testing equality is investigated for compressed input trees.
A simple optimal representation for balanced parentheses
 In Proc. 15th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS v. 3109 (2004
, 2004
"... b Institute of Mathematical Sciences, Chennai 600 113, India. We consider succinct, or highly spaceefficient, representations of a (static) string consisting of n pairs of balanced parentheses, that support natural operations such as finding the matching parenthesis for a given parenthesis, or find ..."
Abstract

Cited by 46 (5 self)
 Add to MetaCart
(Show Context)
b Institute of Mathematical Sciences, Chennai 600 113, India. We consider succinct, or highly spaceefficient, representations of a (static) string consisting of n pairs of balanced parentheses, that support natural operations such as finding the matching parenthesis for a given parenthesis, or finding the pair of parentheses that most tightly enclose a given pair. This problem was considered by Jacobson, [Proc. 30th FOCS, 549–554, 1989] and Munro and Raman, [SIAM J. Comput. 31 (2001), 762–776], who gave O(n)bit and 2n + o(n)bit representations, respectively, that supported the above operations in O(1) time on the RAM model of computation. This data structure is a fundamental tool in succinct representations, and has applications in representing suffix trees, ordinal trees, planar graphs and permutations. We consider the practical performance of parenthesis representations. First, we give a new 2n + o(n)bit representation that supports all the above operations in O(1) time. This representation is conceptually simpler, its space bound has a smaller o(n) term and it also has a simple and uniform o(n) time and space construction algorithm. We implement our data structure and a variant of Jacobson’s, and evaluate their practical performance (speed and memory usage), when used in a succinct representation of trees derived from XML documents. As a baseline, we compare our representations against a widelyused implementation of the standard DOM (Document Object Model) representation of XML documents. Both succinct representations use orders of magnitude less space than DOM and tree traversal operations are usually only slightly slower than in DOM. Key words: Succinct data structures, parentheses representation of trees, compressed dictionaries, XML DOM. Preprint submitted to Theoretical Computer Science 29 November 2006 1
Ultrasuccinct representation of ordered trees
 In SODA ’07: Proceedings of the eighteenth annual ACMSIAM symposium on Discrete algorithms
, 2007
"... There exist two wellknown succinct representations of ordered trees: BP (balanced parenthesis) [Munro, Raman 2001] and DFUDS (depth first unary degree sequence) [Benoit et al. 2005]. Both have size 2n+o(n) bits for nnode trees, which asymptotically matches the informationtheoretic lower bound. Ma ..."
Abstract

Cited by 43 (4 self)
 Add to MetaCart
There exist two wellknown succinct representations of ordered trees: BP (balanced parenthesis) [Munro, Raman 2001] and DFUDS (depth first unary degree sequence) [Benoit et al. 2005]. Both have size 2n+o(n) bits for nnode trees, which asymptotically matches the informationtheoretic lower bound. Many fundamental operations on trees can be done in constant time on word RAM, for example finding the parent, the first child, the next sibling, the number of descendants, etc. However there has been no single representation supporting every existing operation in constant time; BP does not support ith child, while DFUDS does not support lca (lowest common ancestor). In this paper, we give the first succinct tree representation supporting every one of the fundamental operations previously proposed for BP or DFUDS along with some new operations in constant time. Moreover, its size surpasses the informationtheoretic lower bound and matches the entropy of the tree based on the distribution of node degrees. We call this an ultrasuccinct data structure. As a consequence, a tree in which every internal node has exactly two children can be represented in n + o(n) bits. We also show applications for ultrasuccinct compressed suffix trees and labeled trees. 1
Adaptive searching in succinctly encoded binary relations and treestructured documents (Extended Abstract)
 THEORETICAL COMPUTER SCIENCE
, 2005
"... This paper deals with succinct representations of data types motivated by applications in posting lists for search engines, in querying XML documents, and in the more general setting (which extends XML) of multilabeled trees, where several labels can be assigned to each node of a tree. To find th ..."
Abstract

Cited by 42 (14 self)
 Add to MetaCart
(Show Context)
This paper deals with succinct representations of data types motivated by applications in posting lists for search engines, in querying XML documents, and in the more general setting (which extends XML) of multilabeled trees, where several labels can be assigned to each node of a tree. To find the set of references corresponding to a set of keywords, one typically intersects the list of references associated with each keyword. We view this instead as having a single list of objects [n] = {1,..., n} (the references), each of which has a subset of the labels [σ] = {1,..., σ} (the keywords) associated with it. We are able to find the objects associated with an arbitrary set of keywords in time O(δk lg lg σ) using a data structure requiring only t(lg σ +o(lg σ)) bits, where δ is the number of steps required by a nondeterministic algorithm to check the answer, k is the number of keywords in the query, σ is the size of the set from which the keywords are chosen, and t is the number of associations between references and keywords. The data structure is succinct in that it differs from the space needed to write down all t occurrences of keywords by only a lower order term. An XML document is, for our purpose, a labeled rooted tree. We deal primarily with “nonrecursive labeled trees”, where no label occurs more than once on any root to leaf path. We find the set of nodes which path from the root include a set of keywords in the same time, O(δk lg lg σ), on a representation of the tree using essentially minimum space, 2n + n(lg σ + o(lg σ)) bits, where n is the number of nodes in the tree. If we permit nodes to have multiple
Succinct Trees in Practice
"... We implement and compare the major current techniques for representing general trees in succinct form. This is important because a general tree of n nodes is usually represented in pointer form, requiring O(n log n) bits, whereas the succinct representations we study require just 2n + o(n) bits and ..."
Abstract

Cited by 38 (19 self)
 Add to MetaCart
We implement and compare the major current techniques for representing general trees in succinct form. This is important because a general tree of n nodes is usually represented in pointer form, requiring O(n log n) bits, whereas the succinct representations we study require just 2n + o(n) bits and carry out many sophisticated operations in constant time. Yet, there is no exhaustive study in the literature comparing the practical magnitudes of the o(n)space and the O(1)time terms. The techniques can be classified into three broad trends: those based on BP (balanced parentheses in preorder), those based on DFUDS (depthfirst unary degree sequence), and those based on LOUDS (levelordered unary degree sequence). BP and DFUDS require a balanced parentheses representation that supports the core operations
Succinct Orthogonal Range Search Structures on a Grid with Applications to Text Indexing
"... We present a succinct representation of a set of n points on an n × n grid using n lg n + o(nlg n) bits 3 to support orthogonal range counting in O(lg n / lg lg n) time, and range reporting in O(k lg n/lg lg n) time, where k is the size of the output. This achieves an improvement on query time by ..."
Abstract

Cited by 32 (3 self)
 Add to MetaCart
(Show Context)
We present a succinct representation of a set of n points on an n × n grid using n lg n + o(nlg n) bits 3 to support orthogonal range counting in O(lg n / lg lg n) time, and range reporting in O(k lg n/lg lg n) time, where k is the size of the output. This achieves an improvement on query time by a factor of lg lg n upon the previous result of Mäkinen and Navarro [15], while using essentially the informationtheoretic minimum space. Our data structure not only can be used as a key component in solutions to the general orthogonal range search problem to save storage cost, but also has applications in text indexing. In particular, we apply it to improve two previous spaceefficient text indexes that support substring search [7] and positionrestricted substring search [15]. We also use it to extend previous results on succinct representations of sequences of small integers, and to design succinct data structures supporting certain types of orthogonal range query in the plane.