Results 1  10
of
12
Practical EntropyCompressed Rank/Select Dictionary
 PROCEEDINGS OF ALENEX’07, ACM
, 2007
"... Rank/Select dictionaries are data structures for an ordered set S ⊂ {0, 1,..., n − 1} to compute rank(x, S) (the number of elements in S which are no greater than x), and select(i, S) (the ith smallest element in S), which are the fundamental components of succinct data structures of strings, trees ..."
Abstract

Cited by 49 (1 self)
 Add to MetaCart
Rank/Select dictionaries are data structures for an ordered set S ⊂ {0, 1,..., n − 1} to compute rank(x, S) (the number of elements in S which are no greater than x), and select(i, S) (the ith smallest element in S), which are the fundamental components of succinct data structures of strings, trees, graphs, etc. In those data structures, however, only asymptotic behavior has been considered and their performance for real data is not satisfactory. In this paper, we propose novel four Rank/Select dictionaries, esp, recrank, vcode and sdarray, each of which is small if the number of elements in S is small, and indeed close to nH0(S) (H0(S) ≤ 1 is the zeroth order empirical entropy of S) in practice, and its query time is superior to the previous ones. Experimental results reveal the characteristics of our data structures and also show that these data structures are superior to existing implementations in both size and query time.
A simple storage scheme for strings achieving entropy bounds
, 2007
"... We propose a storage scheme for a string S[1, n], drawn from an alphabet Σ, that requires space close to the kth order empirical entropy of S, and allows to retrieve any ℓlong substring of S in optimal O(1 + ..."
Abstract

Cited by 33 (6 self)
 Add to MetaCart
We propose a storage scheme for a string S[1, n], drawn from an alphabet Σ, that requires space close to the kth order empirical entropy of S, and allows to retrieve any ℓlong substring of S in optimal O(1 +
CellProbe Lower Bounds for Succinct Partial Sums
, 2009
"... The partial sums problem in succinct data structures asks to preprocess an array A[1.. n] of bits into a data structure using as close to n bits as possible, and answer queries of the form Rank(k) = ∑ k A[i]. The problem i=1 has been intensely studied, and features as a subroutine in a number of s ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
The partial sums problem in succinct data structures asks to preprocess an array A[1.. n] of bits into a data structure using as close to n bits as possible, and answer queries of the form Rank(k) = ∑ k A[i]. The problem i=1 has been intensely studied, and features as a subroutine in a number of succinct data structures. We show that, if we answer Rank(k) queries by probing t cells of w bits, then the space of the data structure must be at least n+n/wO(t) bits. This redundancy/probe tradeoff is essentially optimal: Patrascu [FOCS’08] showed how to achieve n + n / (w/t) Ω(t) bits. We also extend our lower bound to the closely related Select queries, and to the case of sparse arrays.
A Compressed Text Index on Secondary Memory
"... Abstract. We introduce a practical diskbased compressed text index that, when the text is compressible, takes much less space than the suffix array. It provides very good I/O times for searching, which in particular improve when the text is compressible. In this aspect our index is unique, as compr ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Abstract. We introduce a practical diskbased compressed text index that, when the text is compressible, takes much less space than the suffix array. It provides very good I/O times for searching, which in particular improve when the text is compressible. In this aspect our index is unique, as compressed indexes have been slower than their classical counterparts on secondary memory. We analyze our index and show experimentally that it is extremely competitive on compressible texts. 1 Introduction and Related Work Compressed fulltext selfindexing [22] is a recent trend that builds on the discovery that traditional text indexes like suffix trees and suffix arrays can be compacted to take space proportional to the compressed text size, and moreover be able to reproduce any text context. Therefore selfindexes replace the text,
MORE HASTE, LESS WASTE: LOWERING THE REDUNDANCY IN FULLY INDEXABLE DICTIONARIES
, 2009
"... We consider the problem of representing, in a compressed format, a bitvector S of m bits with n 1s, supporting the following operations, where b ∈ {0,1}: • rankb(S, i) returns the number of occurrences of bit b in the prefix S [1..i]; • selectb(S, i) returns the position of the ith occurrence of bi ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We consider the problem of representing, in a compressed format, a bitvector S of m bits with n 1s, supporting the following operations, where b ∈ {0,1}: • rankb(S, i) returns the number of occurrences of bit b in the prefix S [1..i]; • selectb(S, i) returns the position of the ith occurrence of bit b in S. Such a data structure is called fully indexable dictionary (fid) [Raman, Raman, and Rao, 2007], and is at least as powerful as predecessor data structures. Viewing S as a set X = {x1, x2,..., xn} of n distinct integers drawn from a universe [m] = {1,..., m}, the predecessor of integer y ∈ [m] in X is given by select1(S,rank1(S, y − 1)). fids have many applications in succinct and compressed data structures, as they are often involved in the construction of succinct representation for a variety of abstract data types. Our focus is on spaceefficient fids on the ram model with word size Θ(lg m) and constant time for all operations, so that the time cost is independent of the input size. Given the bitstring S to be encoded, having length m and containing n ones, the minimal amount of information that needs to be stored is B(n, m) = ⌈log ` ´ m ⌉. The n state of the art in building a fid for S is given in [Pǎtra¸scu, 2008] using B(m, n) + O(m/((log m/t) t)) + O(m 3/4) bits, to support the operations in O(t) time. Here, we propose a parametric data structure exhibiting a time/space tradeoff such that, for any real constants 0 < δ ≤ 1/2, 0 < ε ≤ 1, and integer s> 0, it uses
Upper and Lower Bounds for Text Indexing Data Structures
"... c○Alexander Golynski 2007I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. (Alexander Golynski) The main go ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
c○Alexander Golynski 2007I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. (Alexander Golynski) The main goal of this thesis is to investigate the complexity of a variety of problems related to text indexing and text searching. We present new data structures that can be used as building blocks for fulltext indices which occupies minute space (FMindexes) and wavelet trees. These data structures also can be used to represent labeled trees and posting lists. Labeled trees are applied in XML documents, and posting lists in search engines. The main emphasis of this thesis is on lower bounds for timespace tradeoffs for the following problems: the rank/select problem, the problem of representing a string of balanced parentheses, the text retrieval problem, the problem of computing a permutation and its inverse, and the problem of representing a binary relation. These results are divided in two groups: lower bounds in the cell probe model and lower bounds in the indexing model.
Succincter
"... We can represent an array of n values from {0, 1, 2} using ⌈n log 2 3 ⌉ bits (arithmetic coding), but then we cannot retrieve a single element efficiently. Instead, we can encode every block of t elements using ⌈t log 2 3 ⌉ bits, and bound the retrieval time by t. This gives a linear tradeoff betwe ..."
Abstract
 Add to MetaCart
We can represent an array of n values from {0, 1, 2} using ⌈n log 2 3 ⌉ bits (arithmetic coding), but then we cannot retrieve a single element efficiently. Instead, we can encode every block of t elements using ⌈t log 2 3 ⌉ bits, and bound the retrieval time by t. This gives a linear tradeoff between the redundancy of the representation and the query time. In fact, this type of linear tradeoff is ubiquitous in known succinct data structures, and in data compression. The folk wisdom is that if we want to waste one bit per block, the encoding is so constrained that it cannot help the query in any way. Thus, the only thing a query can do is to read the entire block and unpack it. We break this limitation and show how to use recursion to improve redundancy. It turns out that if a block is encoded with two (!) bits of redundancy, we can decode a single element, and answer many other interesting queries, in time logarithmic in the block size. Our technique allows us to revisit classic problems in succinct data structures, and give surprising new upper bounds. We also construct a locallydecodable version of arithmetic coding.