Results 1  10
of
38
Highorder entropycompressed text indexes
, 2003
"... We present a novel implementation of compressed suffix arrays exhibiting new tradeoffs between search time and space occupancy for a given text (or sequence) of n symbols over an alphabet Σ, where each symbol is encoded by lg Σ  bits. We show that compressed suffix arrays use just nHh + O(n lg lg ..."
Abstract

Cited by 193 (22 self)
 Add to MetaCart
We present a novel implementation of compressed suffix arrays exhibiting new tradeoffs between search time and space occupancy for a given text (or sequence) of n symbols over an alphabet Σ, where each symbol is encoded by lg Σ  bits. We show that compressed suffix arrays use just nHh + O(n lg lg n / lg Σ  n) bits, while retaining full text indexing functionalities, such as searching any pattern sequence of length m in O(m lg Σ  + polylog(n)) time. The term Hh ≤ lg Σ  denotes the hthorder empirical entropy of the text, which means that our index is nearly optimal in space apart from lowerorder terms, achieving asymptotically the empirical entropy of the text (with a multiplicative constant 1). If the text is highly compressible so that Hh = o(1) and the alphabet size is small, we obtain a text index with o(m) search time that requires only o(n) bits. Further results and tradeoffs are reported in the paper. 1
Compressed representations of sequences and fulltext indexes
 ACM Transactions on Algorithms
, 2007
"... Abstract. Given a sequence S = s1s2... sn of integers smaller than r = O(polylog(n)), we show how S can be represented using nH0(S) + o(n) bits, so that we can know any sq, as well as answer rank and select queries on S, in constant time. H0(S) is the zeroorder empirical entropy of S and nH0(S) pro ..."
Abstract

Cited by 112 (63 self)
 Add to MetaCart
Abstract. Given a sequence S = s1s2... sn of integers smaller than r = O(polylog(n)), we show how S can be represented using nH0(S) + o(n) bits, so that we can know any sq, as well as answer rank and select queries on S, in constant time. H0(S) is the zeroorder empirical entropy of S and nH0(S) provides an Information Theoretic lower bound to the bit storage of any sequence S via a fixed encoding of its symbols. This extends previous results on binary sequences, and improves previous results on general sequences where those queries are answered in O(log r) time. For larger r, we can still represent S in nH0(S) + o(n log r) bits and answer queries in O(log r / log log n) time. Another contribution of this paper is to show how to combine our compressed representation of integer sequences with an existing compression boosting technique to design compressed fulltext indexes that scale well with the size of the input alphabet Σ. Namely, we design a variant of the FMindex that indexes a string T [1, n] within nHk(T) + o(n) bits of storage, where Hk(T) is the kth order empirical entropy of T. This space bound holds simultaneously for all k ≤ α log Σ  n, constant 0 < α < 1, and Σ  = O(polylog(n)). This index counts the occurrences of an arbitrary pattern P [1, p] as a substring of T in O(p) time; it locates each pattern occurrence in O(log 1+ε n) time, for any constant 0 < ε < 1; and it reports a text substring of length ℓ in O(ℓ + log 1+ε n) time.
Breaking a TimeandSpace Barrier in Constructing FullText Indices
"... Suffix trees and suffix arrays are the most prominent fulltext indices, and their construction algorithms are well studied. It has been open for a long time whether these indicescan be constructed in both o(n log n) time and o(n log n)bit working space, where n denotes the length of the text. Int ..."
Abstract

Cited by 50 (3 self)
 Add to MetaCart
Suffix trees and suffix arrays are the most prominent fulltext indices, and their construction algorithms are well studied. It has been open for a long time whether these indicescan be constructed in both o(n log n) time and o(n log n)bit working space, where n denotes the length of the text. Inthe literature, the fastest algorithm runs in O(n) time, whileit requires O(n log n)bit working space. On the other hand,the most spaceefficient algorithm requires O(n)bit working space while it runs in O(n log n) time. This paper breaks the longstanding timeandspace barrier under the unitcost word RAM. We give an algorithm for constructing the suffix array which takes O(n) time and O(n)bit working space, for texts with constantsize alphabets. Note that both the time and the space bounds are optimal. For constructing the suffix tree, our algorithm requires O(n logffl n) time and O(n)bit working space forany 0! ffl! 1. Apart from that, our algorithm can alsobe adopted to build other existing fulltext indices, such as
Practical EntropyCompressed Rank/Select Dictionary
 PROCEEDINGS OF ALENEX’07, ACM
, 2007
"... Rank/Select dictionaries are data structures for an ordered set S ⊂ {0, 1,..., n − 1} to compute rank(x, S) (the number of elements in S which are no greater than x), and select(i, S) (the ith smallest element in S), which are the fundamental components of succinct data structures of strings, trees ..."
Abstract

Cited by 50 (1 self)
 Add to MetaCart
Rank/Select dictionaries are data structures for an ordered set S ⊂ {0, 1,..., n − 1} to compute rank(x, S) (the number of elements in S which are no greater than x), and select(i, S) (the ith smallest element in S), which are the fundamental components of succinct data structures of strings, trees, graphs, etc. In those data structures, however, only asymptotic behavior has been considered and their performance for real data is not satisfactory. In this paper, we propose novel four Rank/Select dictionaries, esp, recrank, vcode and sdarray, each of which is small if the number of elements in S is small, and indeed close to nH0(S) (H0(S) ≤ 1 is the zeroth order empirical entropy of S) in practice, and its query time is superior to the previous ones. Experimental results reveal the characteristics of our data structures and also show that these data structures are superior to existing implementations in both size and query time.
Space Efficient Hash Tables With Worst Case Constant Access Time
 In STACS
, 2003
"... We generalize Cuckoo Hashing [23] to dary Cuckoo Hashing and show how this yields a simple hash table data structure that stores n elements in (1 + ffl) n memory cells, for any constant ffl ? 0. Assuming uniform hashing, accessing or deleting table entries takes at most d = O(ln ffl ) probes ..."
Abstract

Cited by 47 (4 self)
 Add to MetaCart
We generalize Cuckoo Hashing [23] to dary Cuckoo Hashing and show how this yields a simple hash table data structure that stores n elements in (1 + ffl) n memory cells, for any constant ffl ? 0. Assuming uniform hashing, accessing or deleting table entries takes at most d = O(ln ffl ) probes and the expected amortized insertion time is constant. This is the first dictionary that has worst case constant access time and expected constant update time, works with (1 + ffl) n space, and supports satellite information. Experiments indicate that d = 4 choices suffice for ffl 0:03. We also describe variants of the data structure that allow the use of hash functions that can be evaluted in constant time.
When indexing equals compression: Experiments with compressing suffix arrays and applications
, 2004
"... We report on a new and improved version of highorder entropycompressed suffix arrays, which has theoretical performance guarantees similar to those in our earlier work [16], yet represents an improvement in practice. Our experiments indicate that the resulting text index offers stateoftheart co ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
We report on a new and improved version of highorder entropycompressed suffix arrays, which has theoretical performance guarantees similar to those in our earlier work [16], yet represents an improvement in practice. Our experiments indicate that the resulting text index offers stateoftheart compression. In particular, we require roughly 20 % of the original text size—without requiring a separate instance of the text—and support fast and powerful searches. To our knowledge, this is the best known method in terms of space for fast searching. 1
Ultrasuccinct representation of ordered trees
 In Proc. SODA
, 2007
"... fixed universe with cardinality L is log L bits There exist two wellknown succinct representations of ordered trees: BP (balanced parenthesis) [Munro, Raman 2001] and DFUDS (depth first unary degree sequence) [Benoit et al. 2005]. Both have size 2n +o(n) bits for nnode trees, which asymptotically ..."
Abstract

Cited by 36 (4 self)
 Add to MetaCart
fixed universe with cardinality L is log L bits There exist two wellknown succinct representations of ordered trees: BP (balanced parenthesis) [Munro, Raman 2001] and DFUDS (depth first unary degree sequence) [Benoit et al. 2005]. Both have size 2n +o(n) bits for nnode trees, which asymptotically matches the informationtheoretic lower bound. Many fundamental operations on trees can be done in constant time on word RAM, for example finding the parent, the first child, the next sibling, the number of descendants, etc. However there has been no single representation supporting every existing operation in constant time; BP does not support ith child, while DFUDS does not support lca (lowest common ancestor). In this paper, we give the first succinct tree representation supporting every one of the fundamental operations previously proposed for BP or DFUDS along with some new operations in constant time. Moreover, its size surpasses the informationtheoretic lower bound and matches the entropy of the tree based on the distribution of node degrees. We call this an ultrasuccinct data structure. As a consequence, a tree in which every internal node has exactly two children can be represented in n +o(n) bits. We also show applications for ultrasuccinct compressed suffix trees and labeled trees. 1
An InformationTheoretic Upper Bound on Planar Graphs Using WellOrderly Maps
, 2011
"... This chapter deals with compressed coding of graphs. We focus on planar graphs, a widely studied class of graphs. A planar graph is a graph that admits an embedding in the plane without edge crossings. Planar maps (class of embeddings of a planar graph) are easier to study than planar graphs, but a ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
This chapter deals with compressed coding of graphs. We focus on planar graphs, a widely studied class of graphs. A planar graph is a graph that admits an embedding in the plane without edge crossings. Planar maps (class of embeddings of a planar graph) are easier to study than planar graphs, but as a planar graph may admit an exponential number of maps, they give little information on graphs. In order to give an informationtheoretic upper bound on planar graphs, we introduce a definition of a quasicanonical embedding for planar graphs: wellorderly maps. This appears to be an useful tool to study and encode planar graphs. We present upper bounds on the number of unlabeled planar graphs and on the number of edges in a random planar graph. We also present an algorithm to compute wellorderly maps and implying an efficient coding of planar graphs.
Optimal Succinctness for Range Minimum Queries
"... Abstract. For an array A of n objects from a totally ordered universe, a range minimum query rmq A(i, j) asks for the position of the minimum element in the subarray A[i, j]. We focus on the setting where the array A is static and known in advance, and can hence be preprocessed into a scheme in ord ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
Abstract. For an array A of n objects from a totally ordered universe, a range minimum query rmq A(i, j) asks for the position of the minimum element in the subarray A[i, j]. We focus on the setting where the array A is static and known in advance, and can hence be preprocessed into a scheme in order to answer future queries faster. We make the further assumption that the input array A cannot be used at query time. Under this assumption, a natural lower bound of 2n − Θ(log n) bits for RMQschemes exists. We give the first truly succinct preprocessing scheme for O(1)RMQs. Its final space consumption is 2n + o(n) bits, thus being asymptotically optimal. We also give a simple lineartime construction algorithm for this scheme that needs only n + o(n) bits of space in addition to the 2n + o(n) bits needed for the final data structure, thereby lowering the peak space consumption of previous schemes from O(n log n) to O(n) bits. We also improve on LCAcomputation in BPS and DFUDSencoded trees. 1
SpaceEfficient Preprocessing Schemes for Range Minimum Queries on Static Arrays
, 2009
"... Given a static array of n totally ordered object, the range minimum query problem is to build an additional data structure that allows to answer subsequent online queries of the form “what is the position of a minimum element in the subarray ranging from i to j? ” efficiently. We focus on two sett ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
Given a static array of n totally ordered object, the range minimum query problem is to build an additional data structure that allows to answer subsequent online queries of the form “what is the position of a minimum element in the subarray ranging from i to j? ” efficiently. We focus on two settings, where (1) the input array is available at query time, and (2) the input array is only available at construction time. In setting (1), we show new data structures (a) of n c(n) (2 + o(1)) bits and query time O(c(n)), or (b) with O(nHk) + o(n) bits and O(1) query size time, where Hk denotes the empirical entropy of k’th order of the input array. In setting (2), we give a data structure of optimal size 2n + o(n) bits and query time O(1). All data structures can be constructed in linear time and almost inplace.