Results 1 - 10
of
41
A new succinct representation of RMQ-information and improvements in the enhanced suffix array
- PROC. ESCAPE. LNCS
, 2007
"... The Range-Minimum-Query-Problem is to preprocess an array of length n in O(n) time such that all subsequent queries asking for the position of a minimal element between two specified indices can be obtained in constant time. This problem was first solved by Berkman and Vishkin [1], and Sadakane [2] ..."
Abstract
-
Cited by 31 (13 self)
- Add to MetaCart
The Range-Minimum-Query-Problem is to preprocess an array of length n in O(n) time such that all subsequent queries asking for the position of a minimal element between two specified indices can be obtained in constant time. This problem was first solved by Berkman and Vishkin [1], and Sadakane [2] gave the first succinct data structure that uses 4n+o(n) bits of additional space. In practice, this method has several drawbacks: it needs O(nlog n) bits of intermediate space when constructing the data structure, and it builds on previous results on succinct data structures. We overcome these problems by giving the first algorithm that never uses more than 2n + o(n) bits, and does not rely on rank- and select-queries or other succinct data structures. We stress the importance of this result by simplifying and reducing the space consumption of the Enhanced Suffix Array [3], while retaining its capability of simulating top-down-traversals of the suffix tree, used, e.g., to locate all occ positions of a pattern p in a text in optimal O(|p | + occ) time (assuming constant alphabet size). We further prove a lower bound of 2n − o(n) bits, which makes our algorithm asymptotically optimal.
Ultra-succinct representation of ordered trees
- In Proc. SODA
, 2007
"... fixed universe with cardinality L is log L bits There exist two well-known succinct representations of ordered trees: BP (balanced parenthesis) [Munro, Raman 2001] and DFUDS (depth first unary degree sequence) [Benoit et al. 2005]. Both have size 2n +o(n) bits for n-node trees, which asymptotically ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
fixed universe with cardinality L is log L bits There exist two well-known succinct representations of ordered trees: BP (balanced parenthesis) [Munro, Raman 2001] and DFUDS (depth first unary degree sequence) [Benoit et al. 2005]. Both have size 2n +o(n) bits for n-node trees, which asymptotically matches the information-theoretic lower bound. Many fundamental operations on trees can be done in constant time on word RAM, for example finding the parent, the first child, the next sibling, the number of descendants, etc. However there has been no single representation supporting every existing operation in constant time; BP does not support i-th child, while DFUDS does not support lca (lowest common ancestor). In this paper, we give the first succinct tree representation supporting every one of the fundamental operations previously proposed for BP or DFUDS along with some new operations in constant time. Moreover, its size surpasses the information-theoretic lower bound and matches the entropy of the tree based on the distribution of node degrees. We call this an ultra-succinct data structure. As a consequence, a tree in which every internal node has exactly two children can be represented in n +o(n) bits. We also show applications for ultra-succinct compressed suffix trees and labeled trees. 1
Fully-compressed suffix trees
- IN: PACS 2000. LNCS
, 2000
"... Suffix trees are by far the most important data structure in stringology, with myriads of applications in fields like bioinformatics and information retrieval. Classical representations of suffix trees require O(n log n) bits of space, for a string of size n. This is considerably more than the nlog ..."
Abstract
-
Cited by 17 (12 self)
- Add to MetaCart
Suffix trees are by far the most important data structure in stringology, with myriads of applications in fields like bioinformatics and information retrieval. Classical representations of suffix trees require O(n log n) bits of space, for a string of size n. This is considerably more than the nlog 2 σ bits needed for the string itself, where σ is the alphabet size. The size of suffix trees has been a barrier to their wider adoption in practice. Recent compressed suffix tree representations require just the space of the compressed string plus Θ(n) extra bits. This is already spectacular, but still unsatisfactory when σ is small as in DNA sequences. In this paper we introduce the first compressed suffix tree representation that breaks this linear-space barrier. Our representation requires sublinear extra space and supports a large set of navigational operations in logarithmic time. An essential ingredient of our representation is the lowest common ancestor (LCA) query. We reveal important connections between LCA queries and suffix tree navigation.
Optimal Succinctness for Range Minimum Queries
"... Abstract. For an array A of n objects from a totally ordered universe, a range minimum query rmq A(i, j) asks for the position of the minimum element in the sub-array A[i, j]. We focus on the setting where the array A is static and known in advance, and can hence be preprocessed into a scheme in ord ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
Abstract. For an array A of n objects from a totally ordered universe, a range minimum query rmq A(i, j) asks for the position of the minimum element in the sub-array A[i, j]. We focus on the setting where the array A is static and known in advance, and can hence be preprocessed into a scheme in order to answer future queries faster. We make the further assumption that the input array A cannot be used at query time. Under this assumption, a natural lower bound of 2n − Θ(log n) bits for RMQ-schemes exists. We give the first truly succinct preprocessing scheme for O(1)-RMQs. Its final space consumption is 2n + o(n) bits, thus being asymptotically optimal. We also give a simple linear-time construction algorithm for this scheme that needs only n + o(n) bits of space in addition to the 2n + o(n) bits needed for the final data structure, thereby lowering the peak space consumption of previous schemes from O(n log n) to O(n) bits. We also improve on LCA-computation in BPS- and DFUDS-encoded trees. 1
A compressed self-index using a Ziv-Lempel dictionary
- In: SPIRE. Volume 4209 of LNCS. (2006) 163–180
"... Abstract. A compressed full-text self-index for a text T, of size u, is a data structure used to search patterns P, of size m, in T that requires reduced space, i.e. that depends on the empirical entropy (Hk, H0) of T, and is, furthermore, able to reproduce any substring of T. In this paper we prese ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
Abstract. A compressed full-text self-index for a text T, of size u, is a data structure used to search patterns P, of size m, in T that requires reduced space, i.e. that depends on the empirical entropy (Hk, H0) of T, and is, furthermore, able to reproduce any substring of T. In this paper we present a new compressed self-index able to locate the occurrences of P in O((m + occ) log n) time, where occ is the number of occurrences and σ the size of the alphabet of T. The fundamental improvement over previous LZ78 based indexes is the reduction of the search time dependency on m from O(m 2) to O(m). To achieve this result we point out the main obstacle to linear time algorithms based on LZ78 data compression and expose and explore the nature of a recurrent structure in LZ-indexes, the T78 suffix tree. We show that our method is very competitive in practice by comparing it against the LZ-Index, the FM-index and a compressed suffix array. 1
Fully-functional static and dynamic succinct trees. CoRR abs/0905.0768. http://arxiv.org/abs/0905.0768. Version 4
, 2010
"... We propose new succinct representations of ordinal trees, which have been studied extensively. It is known that any n-node static tree can be represented in 2n + o(n) bits and various operations on the tree can be supported in constant time under the word-RAM model. However the data structures are c ..."
Abstract
-
Cited by 14 (9 self)
- Add to MetaCart
We propose new succinct representations of ordinal trees, which have been studied extensively. It is known that any n-node static tree can be represented in 2n + o(n) bits and various operations on the tree can be supported in constant time under the word-RAM model. However the data structures are complicated and difficult to dynamize. We propose a simple and flexible data structure, called the range min-max tree, that reduces the large number of relevant tree operations considered in the literature, to a few primitives that are carried out in constant time on sufficiently small trees. The result is extended to trees of arbitrary size, achieving 2n + O(n/polylog(n)) bits of space. The redundancy is significantly lower than any previous proposal. For the dynamic case, where insertion/deletion of nodes is allowed, the existing data structures support very limited operations. Our data structure builds on the range min-max tree to achieve 2n + O(n / log n) bits of space and O(log n) time for all the operations. We also propose an improved data structure using 2n+O(n loglog n / logn) bits and improving the time to O(log n / loglog n) for most operations. 1
Fully-functional succinct trees
- In Proc. 21st SODA
, 2010
"... We propose new succinct representations of ordinal trees, which have been studied extensively. It is known that any n-node static tree can be represented in 2n + o(n) bits and a large number of operations on the tree can be supported in constant time under the word-RAM model. However existing data s ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
We propose new succinct representations of ordinal trees, which have been studied extensively. It is known that any n-node static tree can be represented in 2n + o(n) bits and a large number of operations on the tree can be supported in constant time under the word-RAM model. However existing data structures are not satisfactory in both theory and practice because (1) the lower-order term is Ω(nlog log n / log n), which cannot be neglected in practice, (2) the hidden constant is also large, (3) the data structures are complicated and difficult to implement, and (4) the techniques do not extend to dynamic trees supporting insertions and deletions of nodes. We propose a simple and flexible data structure, called the range min-max tree, that reduces the large number of relevant tree operations considered in the literature to a few primitives, which are carried out in constant time on sufficiently small trees. The result is then extended to trees of arbitrary size, achieving 2n + O(n/polylog(n)) bits of space. The redundancy is significantly lower than in any previous proposal, and the data structure is easily implemented. Furthermore, using the same framework, we derive the first fully-functional dynamic succinct trees. 1
A linear size index for approximate pattern matching
- In Proc. 17th Annual Symposium on Combinatorial Pattern Matching
, 2006
"... Abstract. This paper revisits the problem of indexing a text S[1..n]to support searching substrings in S that match a given pattern P[1..m] with at most k errors. A naive solution either has a worst-case matching time complexity of Ω(m k)orrequiresΩ(n k) space. Devising a solution with better perfor ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Abstract. This paper revisits the problem of indexing a text S[1..n]to support searching substrings in S that match a given pattern P[1..m] with at most k errors. A naive solution either has a worst-case matching time complexity of Ω(m k)orrequiresΩ(n k) space. Devising a solution with better performance has been a challenge until Cole et al. [5] showed an O(nlog k n)-space index that can support k-error matching in O(m+occ+log k nlog log n) time, where occ is the number of occurrences. Motivated by the indexing of DNA, we investigate in this paper the feasibility of devising a linear-size index that still has a time complexity linear in m. In particular, we give an O(n)-space index that supports k-error matching in O(m + occ +(logn) k(k+1) log log n) worst-case time. Furthermore, the index can be compressed from O(n) wordsintoO(n) bits with a slight increase in the time complexity. 1
Run-length compressed indexes are superior for highly repetitive sequence collections
- In Proc. 15th SPIRE, LNCS 5280
, 2008
"... Abstract. A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N. Examples of such collections are version control data and genome sequences of individuals, where the differences can ..."
Abstract
-
Cited by 12 (9 self)
- Add to MetaCart
Abstract. A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N. Examples of such collections are version control data and genome sequences of individuals, where the differences can be expressed by lists of basic edit operations. This paper is devoted to studying ways to store massive sets of highly repetitive sequence collections in space-efficient manner so that retrieval of the content as well as queries on the content of the sequences can be provided time-efficiently. We show that the state-of-the-art entropy-bound full-text self-indexes do not yet provide satisfactory space bounds for this specific task. We engineer some new structures that use run-length encoding and give empirical evidence that these structures are superior to the current structures. 1
An(other) entropy-bounded compressed suffix tree
- In Proceedings of the 19th Annual Symposium on Combinatorial Pattern Matching, volume 5029 of LNCS
, 2008
"... Abstract. Suffix trees are among the most important data structures in stringology, with myriads of applications. Their main problem is space usage, which has triggered much research striving for compressed representations that are still functional. We present a novel compressed suffix tree. Compare ..."
Abstract
-
Cited by 11 (9 self)
- Add to MetaCart
Abstract. Suffix trees are among the most important data structures in stringology, with myriads of applications. Their main problem is space usage, which has triggered much research striving for compressed representations that are still functional. We present a novel compressed suffix tree. Compared to the existing ones, ours is the first achieving at the same time sublogarithmic complexity for the operations, and space usage which goes to zero as the entropy of the text does. Our development contains several novel ideas, such as compressing the longest common prefix information, and totally getting rid of the suffix tree topology, expressing all the suffix tree operations using range minimum queries and a new primitive called next/previous smaller value in a sequence. 1

