Results 1  10
of
10
Colored Range Queries and Document Retrieval
"... Colored range queries are a wellstudied topic in computational geometry and database research that, in the past decade, have found exciting applications in information retrieval. In this paper we give improved time and space bounds for three important onedimensional colored range queries — colore ..."
Abstract

Cited by 17 (9 self)
 Add to MetaCart
Colored range queries are a wellstudied topic in computational geometry and database research that, in the past decade, have found exciting applications in information retrieval. In this paper we give improved time and space bounds for three important onedimensional colored range queries — colored range listing, colored range topk queries and colored range counting — and, thus, new bounds for various document retrieval problems on general collections of sequences. Specifically, we first describe a framework including almost all recent results on colored range listing and document listing, which suggests new combinations of data structures for these problems. For example, we give the fastest compressed data structures for colored range listing and document listing, and an efficient data structure for document listing whose size is bounded in terms of the highorder entropies of the library of documents. We then show how (approximate) colored topk queries can be reduced to (approximate) rangemode queries on subsequences, yielding the first efficient data structure for this problem. Finally, we show how a modified wavelet tree can support colored range counting in logarithmic time and space that is succinct whenever the number of colors is superpolylogarithmic in the length of the sequence.
New lower and upper bounds for representing sequences
 CoRR
"... Abstract. Sequence representations supporting queries access, select and rank are at the core of many data structures. There is a considerable gap between different upper bounds, and the few lower bounds, known for such representations, and how they interact with the space used. In this article we p ..."
Abstract

Cited by 9 (8 self)
 Add to MetaCart
Abstract. Sequence representations supporting queries access, select and rank are at the core of many data structures. There is a considerable gap between different upper bounds, and the few lower bounds, known for such representations, and how they interact with the space used. In this article we prove a strong lower bound for rank, which holds for rather permissive assumptions on the space used, and give matching upper bounds that require only a compressed representation of the sequence. Within this compressed space, operations access and select can be solved within almostconstant time. 1
CellProbe Lower Bounds for Succinct Partial Sums
, 2009
"... The partial sums problem in succinct data structures asks to preprocess an array A[1.. n] of bits into a data structure using as close to n bits as possible, and answer queries of the form Rank(k) = ∑ k A[i]. The problem i=1 has been intensely studied, and features as a subroutine in a number of s ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
The partial sums problem in succinct data structures asks to preprocess an array A[1.. n] of bits into a data structure using as close to n bits as possible, and answer queries of the form Rank(k) = ∑ k A[i]. The problem i=1 has been intensely studied, and features as a subroutine in a number of succinct data structures. We show that, if we answer Rank(k) queries by probing t cells of w bits, then the space of the data structure must be at least n+n/wO(t) bits. This redundancy/probe tradeoff is essentially optimal: Patrascu [FOCS’08] showed how to achieve n + n / (w/t) Ω(t) bits. We also extend our lower bound to the closely related Select queries, and to the case of sparse arrays.
Improved grammarbased compressed indexes
 In Proc. 19th SPIRE, LNCS 7608
, 2012
"... Abstract. We introduce the first grammarcompressed representation of a sequence that supports searches in time that depends only logarithmically on the size of the grammar. Given a text T [1..u] that is represented by a (contextfree) grammar of n (terminal and nonterminal) symbols and size N (meas ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Abstract. We introduce the first grammarcompressed representation of a sequence that supports searches in time that depends only logarithmically on the size of the grammar. Given a text T [1..u] that is represented by a (contextfree) grammar of n (terminal and nonterminal) symbols and size N (measured as the sum of the lengths of the right hands of the rules), a basic grammarbased representation of T takes N lg n bits of space. Our representation requires 2N lg n + N lg u + ɛ n lg n + o(N lg n) bits of space, for any 0 < ɛ ≤ 1. It can find the positions of the occ occurrences of a pattern of length m in T in O (m 2 /ɛ) lg lg u lg n + (m + occ) lg n time, and extract any substring of length ℓ of T in time O(ℓ + h lg(N/h)), where h is the height of the grammar tree.
Efficient FullyCompressed Sequence Representations
, 2010
"... We present a data structure that stores a sequence s[1..n] over alphabet [1..σ] in nH0(s) + o(n)(H0(s)+1) bits, where H0(s) is the zeroorder entropy of s. This structure supports the queries access, rank and select, which are fundamental building blocks for many other compressed data structures, in ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We present a data structure that stores a sequence s[1..n] over alphabet [1..σ] in nH0(s) + o(n)(H0(s)+1) bits, where H0(s) is the zeroorder entropy of s. This structure supports the queries access, rank and select, which are fundamental building blocks for many other compressed data structures, in worstcase time O (lg lg σ) and average time O (lg H0(s)). The worstcase complexity matches the best previous results, yet these had been achieved with data structures using nH0(s) + o(n lg σ) bits. On highly compressible sequences the o(n lg σ) bits of the redundancy may be significant compared to the the nH0(s) bits that encode the data. Our representation, instead, compresses the redundancy as well. Moreover, our averagecase complexity is unprecedented. Our technique is based on partitioning the alphabet into characters of similar frequency. The subsequence corresponding to each group can then be encoded using fast uncompressed representations without harming the overall compression ratios, even in the redundancy. The result also improves upon the best current compressed representations of several other data structures. For example, we achieve (i) compressed redundancy, retaining the best time complexities, for the smallest existing fulltext selfindexes; (ii) compressed permutations π with times for π() and π −1 () improved to loglogarithmic; and (iii) the first compressed representation of dynamic collections of disjoint sets. We also point out various applications to inverted indexes, suffix arrays, binary relations, and data compressors. Our structure is practical on large alphabets. Our experiments show that, as predicted by theory, it dominates the space/time tradeoff map of all the sequence representations, both in synthetic and application scenarios.
Succincter
"... We can represent an array of n values from {0, 1, 2} using ⌈n log 2 3 ⌉ bits (arithmetic coding), but then we cannot retrieve a single element efficiently. Instead, we can encode every block of t elements using ⌈t log 2 3 ⌉ bits, and bound the retrieval time by t. This gives a linear tradeoff betwe ..."
Abstract
 Add to MetaCart
We can represent an array of n values from {0, 1, 2} using ⌈n log 2 3 ⌉ bits (arithmetic coding), but then we cannot retrieve a single element efficiently. Instead, we can encode every block of t elements using ⌈t log 2 3 ⌉ bits, and bound the retrieval time by t. This gives a linear tradeoff between the redundancy of the representation and the query time. In fact, this type of linear tradeoff is ubiquitous in known succinct data structures, and in data compression. The folk wisdom is that if we want to waste one bit per block, the encoding is so constrained that it cannot help the query in any way. Thus, the only thing a query can do is to read the entire block and unpack it. We break this limitation and show how to use recursion to improve redundancy. It turns out that if a block is encoded with two (!) bits of redundancy, we can decode a single element, and answer many other interesting queries, in time logarithmic in the block size. Our technique allows us to revisit classic problems in succinct data structures, and give surprising new upper bounds. We also construct a locallydecodable version of arithmetic coding.
Succinct Sampling from Discrete Distributions
"... We revisit the classic problem of sampling from a discrete distribution: Given n nonnegative wbit integers x1,..., xn, the task is to build a data structure that allows sampling i with probability proportional to xi. The classic solution is Walker’s alias method that takes, when implemented on a W ..."
Abstract
 Add to MetaCart
We revisit the classic problem of sampling from a discrete distribution: Given n nonnegative wbit integers x1,..., xn, the task is to build a data structure that allows sampling i with probability proportional to xi. The classic solution is Walker’s alias method that takes, when implemented on a Word RAM, O(n) preprocessing time, O(1) expected query time for one sample, and n(w+2 lg n+o(1)) bits of space. Using the terminology of succinct data structures, this solution has redundancy 2n lg n + o(n) bits, i.e., it uses 2n lg n + o(n) bits in addition to the information theoretic minimum required for storing the input. In this paper, we study whether this space usage can be improved. In the systematic case, in which the input is readonly, we present a novel data structure using r + O(w) redundant
IBM Almaden
, 2009
"... The rank problem in succinct data structures asks to preprocess an array A[1.. n] of bits into a data structure using as close to n bits as possible, and answer queries of the form Rank(k) = ∑k i=1 A[i]. The problem has been intensely studied, and features as a subroutine in a majority of succinct d ..."
Abstract
 Add to MetaCart
The rank problem in succinct data structures asks to preprocess an array A[1.. n] of bits into a data structure using as close to n bits as possible, and answer queries of the form Rank(k) = ∑k i=1 A[i]. The problem has been intensely studied, and features as a subroutine in a majority of succinct data structures. We show that in the cell probe model with wbit cells, if rank takes t time, the space of the data structure must be at least n+n/w O(t) bits. This redundancy/query tradeoff is essentially optimal, matching our upper bound from [FOCS’08].
EntropyBounded Representation of Point Grids ✩
"... We give the first fully compressed representation of a set of m points on an n× n grid, taking H +o(H) bits of space, where H = lg ( n2) is the entropy of the m set. This representation supports range counting, range reporting, and point selection queries, with complexities that go from O(1) to O ( ..."
Abstract
 Add to MetaCart
We give the first fully compressed representation of a set of m points on an n× n grid, taking H +o(H) bits of space, where H = lg ( n2) is the entropy of the m set. This representation supports range counting, range reporting, and point selection queries, with complexities that go from O(1) to O ( lg 2 n / lg lg n) per answer as the entropy of the grid decreases. Operating within entropybounded space, as well as relating time complexity with entropy, opens a new line of research on an otherwise wellstudied area. Keywords: Compressed data structures, geometric grids, range queries.
Compact Binary Relation Representations with Rich Functionality
"... Binary relations are an important abstraction arising in many data representation problems. The data structures proposed so far to represent them support just a few basic operations required to fit one particular application. We identify many of those operations arising in applications and generaliz ..."
Abstract
 Add to MetaCart
Binary relations are an important abstraction arising in many data representation problems. The data structures proposed so far to represent them support just a few basic operations required to fit one particular application. We identify many of those operations arising in applications and generalize them into a wide set of desirable queries for a binary relation representation. We also identify reductions among those operations. We then introduce several novel binary relation representations, some simple and some quite sophisticated, that not only are spaceefficient but also efficiently support a large subset of the desired queries. Keywords: Structures.