Results 1 - 10
of
12
Fully-functional static and dynamic succinct trees. CoRR abs/0905.0768. http://arxiv.org/abs/0905.0768. Version 4
, 2010
"... We propose new succinct representations of ordinal trees, which have been studied extensively. It is known that any n-node static tree can be represented in 2n + o(n) bits and various operations on the tree can be supported in constant time under the word-RAM model. However the data structures are c ..."
Abstract
-
Cited by 14 (9 self)
- Add to MetaCart
We propose new succinct representations of ordinal trees, which have been studied extensively. It is known that any n-node static tree can be represented in 2n + o(n) bits and various operations on the tree can be supported in constant time under the word-RAM model. However the data structures are complicated and difficult to dynamize. We propose a simple and flexible data structure, called the range min-max tree, that reduces the large number of relevant tree operations considered in the literature, to a few primitives that are carried out in constant time on sufficiently small trees. The result is extended to trees of arbitrary size, achieving 2n + O(n/polylog(n)) bits of space. The redundancy is significantly lower than any previous proposal. For the dynamic case, where insertion/deletion of nodes is allowed, the existing data structures support very limited operations. Our data structure builds on the range min-max tree to achieve 2n + O(n / log n) bits of space and O(log n) time for all the operations. We also propose an improved data structure using 2n+O(n loglog n / logn) bits and improving the time to O(log n / loglog n) for most operations. 1
Fully-functional succinct trees
- In Proc. 21st SODA
, 2010
"... We propose new succinct representations of ordinal trees, which have been studied extensively. It is known that any n-node static tree can be represented in 2n + o(n) bits and a large number of operations on the tree can be supported in constant time under the word-RAM model. However existing data s ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
We propose new succinct representations of ordinal trees, which have been studied extensively. It is known that any n-node static tree can be represented in 2n + o(n) bits and a large number of operations on the tree can be supported in constant time under the word-RAM model. However existing data structures are not satisfactory in both theory and practice because (1) the lower-order term is Ω(nlog log n / log n), which cannot be neglected in practice, (2) the hidden constant is also large, (3) the data structures are complicated and difficult to implement, and (4) the techniques do not extend to dynamic trees supporting insertions and deletions of nodes. We propose a simple and flexible data structure, called the range min-max tree, that reduces the large number of relevant tree operations considered in the literature to a few primitives, which are carried out in constant time on sufficiently small trees. The result is then extended to trees of arbitrary size, achieving 2n + O(n/polylog(n)) bits of space. The redundancy is significantly lower than in any previous proposal, and the data structure is easily implemented. Furthermore, using the same framework, we derive the first fully-functional dynamic succinct trees. 1
Vitter: On Searching Compressed String Collections Cache-Obliviously
- PODS
"... Current data structures for searching large string collections either fail to achieve minimum space or cause too many cache misses. In this paper we discuss some edge linearizations of the classic trie data structure that are simultaneously cache-friendly and compressed. We provide new insights on f ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Current data structures for searching large string collections either fail to achieve minimum space or cause too many cache misses. In this paper we discuss some edge linearizations of the classic trie data structure that are simultaneously cache-friendly and compressed. We provide new insights on front coding [24], introduce other novel linearizations, and study how close their space occupancy is to the information-theoretic minimum. The moral is that they are not just heuristics. Our second contribution is a novel dictionary encoding scheme that builds upon such linearizations and achieves nearly optimal space, offers competitive I/O-search time, and is also conscious of the query distribution. Finally, we combine those data structures with cacheoblivious tries [2, 5] and obtain a succinct variant whose space is close to the information-theoretic minimum.
Succinct Trees in Practice
"... We implement and compare the major current techniques for representing general trees in succinct form. This is important because a general tree of n nodes is usually represented in pointer form, requiring O(n log n) bits, whereas the succinct representations we study require just 2n + o(n) bits and ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
We implement and compare the major current techniques for representing general trees in succinct form. This is important because a general tree of n nodes is usually represented in pointer form, requiring O(n log n) bits, whereas the succinct representations we study require just 2n + o(n) bits and carry out many sophisticated operations in constant time. Yet, there is no exhaustive study in the literature comparing the practical magnitudes of the o(n)-space and the O(1)-time terms. The techniques can be classified into three broad trends: those based on BP (balanced parentheses in preorder), those based on DFUDS (depth-first unary degree sequence), and those based on LOUDS (level-ordered unary degree sequence). BP and DFUDS require a balanced parentheses representation that supports the core operations
Cell-Probe Lower Bounds for Succinct Partial Sums
, 2009
"... The partial sums problem in succinct data structures asks to preprocess an array A[1.. n] of bits into a data structure using as close to n bits as possible, and answer queries of the form Rank(k) = ∑ k A[i]. The problem i=1 has been intensely studied, and features as a subroutine in a number of s ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
The partial sums problem in succinct data structures asks to preprocess an array A[1.. n] of bits into a data structure using as close to n bits as possible, and answer queries of the form Rank(k) = ∑ k A[i]. The problem i=1 has been intensely studied, and features as a subroutine in a number of succinct data structures. We show that, if we answer Rank(k) queries by probing t cells of w bits, then the space of the data structure must be at least n+n/wO(t) bits. This redundancy/probe trade-off is essentially optimal: Patrascu [FOCS’08] showed how to achieve n + n / (w/t) Ω(t) bits. We also extend our lower bound to the closely related Select queries, and to the case of sparse arrays.
Upper and Lower Bounds for Text Indexing Data Structures
"... c○Alexander Golynski 2007I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. (Alexander Golynski) The main go ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
c○Alexander Golynski 2007I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. (Alexander Golynski) The main goal of this thesis is to investigate the complexity of a variety of problems related to text indexing and text searching. We present new data structures that can be used as building blocks for full-text indices which occupies minute space (FM-indexes) and wavelet trees. These data structures also can be used to represent labeled trees and posting lists. Labeled trees are applied in XML documents, and posting lists in search engines. The main emphasis of this thesis is on lower bounds for time-space tradeoffs for the following problems: the rank/select problem, the problem of representing a string of balanced parentheses, the text retrieval problem, the problem of computing a permutation and its inverse, and the problem of representing a binary relation. These results are divided in two groups: lower bounds in the cell probe model and lower bounds in the indexing model.
MORE HASTE, LESS WASTE: LOWERING THE REDUNDANCY IN FULLY INDEXABLE DICTIONARIES
, 2009
"... We consider the problem of representing, in a compressed format, a bitvector S of m bits with n 1s, supporting the following operations, where b ∈ {0,1}: • rankb(S, i) returns the number of occurrences of bit b in the prefix S [1..i]; • selectb(S, i) returns the position of the ith occurrence of bi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We consider the problem of representing, in a compressed format, a bitvector S of m bits with n 1s, supporting the following operations, where b ∈ {0,1}: • rankb(S, i) returns the number of occurrences of bit b in the prefix S [1..i]; • selectb(S, i) returns the position of the ith occurrence of bit b in S. Such a data structure is called fully indexable dictionary (fid) [Raman, Raman, and Rao, 2007], and is at least as powerful as predecessor data structures. Viewing S as a set X = {x1, x2,..., xn} of n distinct integers drawn from a universe [m] = {1,..., m}, the predecessor of integer y ∈ [m] in X is given by select1(S,rank1(S, y − 1)). fids have many applications in succinct and compressed data structures, as they are often involved in the construction of succinct representation for a variety of abstract data types. Our focus is on space-efficient fids on the ram model with word size Θ(lg m) and constant time for all operations, so that the time cost is independent of the input size. Given the bitstring S to be encoded, having length m and containing n ones, the minimal amount of information that needs to be stored is B(n, m) = ⌈log ` ´ m ⌉. The n state of the art in building a fid for S is given in [Pǎtra¸scu, 2008] using B(m, n) + O(m/((log m/t) t)) + O(m 3/4) bits, to support the operations in O(t) time. Here, we propose a parametric data structure exhibiting a time/space trade-off such that, for any real constants 0 < δ ≤ 1/2, 0 < ε ≤ 1, and integer s> 0, it uses
Wavelet Trees for All
"... The wavelet tree is a versatile data structure that serves a number of purposes, from string processing to geometry. It can be regarded as a device that represents a sequence, a reordering, or a grid of points. In addition, its space adapts to various entropy measures of the data it encodes, enabli ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The wavelet tree is a versatile data structure that serves a number of purposes, from string processing to geometry. It can be regarded as a device that represents a sequence, a reordering, or a grid of points. In addition, its space adapts to various entropy measures of the data it encodes, enabling compressed representations. New competitive solutions to a number of problems, based on wavelet trees, are appearing every year. In this survey we give an overview of wavelet trees and the surprising number of applications in which we have found them useful: basic and weighted point grids, sets of rectangles, strings, permutations, binary relations, graphs, inverted indexes, document retrieval indexes, full-text indexes, XML indexes, and general numeric sequences.
An Alternative to Arithmetic Coding with Local Decodability
, 2010
"... We describe a simple, but powerful local encoding technique, implying two surprising results: 1. We show how to represent a vector of n values from Σ using ⌈n log2 Σ ⌉ bits, such that reading or writing any entry takes O(1) time. This demonstrates, for instance, an “equivalence” between decimal and ..."
Abstract
- Add to MetaCart
We describe a simple, but powerful local encoding technique, implying two surprising results: 1. We show how to represent a vector of n values from Σ using ⌈n log2 Σ ⌉ bits, such that reading or writing any entry takes O(1) time. This demonstrates, for instance, an “equivalence” between decimal and binary computers, and has been a central toy problem in the field of succinct data structures. Previous solutions required space of n log2 Σ + n / lg O(1) n bits for constant access. 2. Given a stream of n bits arriving online (for any n, not known in advance), we can output a prefix-free encoding that uses n+log2 n+O(lg lg n) bits. The encoding and decoding algorithms only require O(lg n) bits of memory, and run in constant time per word. This result is interesting in cryptographic applications, as prefix-free codes are the simplest counter-measure to extensions attacks on hash functions, message authentication codes and pseudorandom functions. Our result refutes a conjecture of [Maurer, Sjödin 2005] on the hardness of online prefix-free encodings. 1
Succincter
"... We can represent an array of n values from {0, 1, 2} using ⌈n log 2 3 ⌉ bits (arithmetic coding), but then we cannot retrieve a single element efficiently. Instead, we can encode every block of t elements using ⌈t log 2 3 ⌉ bits, and bound the retrieval time by t. This gives a linear trade-off betwe ..."
Abstract
- Add to MetaCart
We can represent an array of n values from {0, 1, 2} using ⌈n log 2 3 ⌉ bits (arithmetic coding), but then we cannot retrieve a single element efficiently. Instead, we can encode every block of t elements using ⌈t log 2 3 ⌉ bits, and bound the retrieval time by t. This gives a linear trade-off between the redundancy of the representation and the query time. In fact, this type of linear trade-off is ubiquitous in known succinct data structures, and in data compression. The folk wisdom is that if we want to waste one bit per block, the encoding is so constrained that it cannot help the query in any way. Thus, the only thing a query can do is to read the entire block and unpack it. We break this limitation and show how to use recursion to improve redundancy. It turns out that if a block is encoded with two (!) bits of redundancy, we can decode a single element, and answer many other interesting queries, in time logarithmic in the block size. Our technique allows us to revisit classic problems in succinct data structures, and give surprising new upper bounds. We also construct a locally-decodable version of arithmetic coding.

