Results 1 - 10
of
20
Succinct indexes for strings, binary relations and multi-labeled trees
- In Procs ACM-SIAM SODA, 2007 (this volume
, 2007
"... We define and design succinct indexes for several abstract data types (ADTs). The concept is to design auxiliary data structures called succinct indexes that occupy asymptotically less space than the information-theoretic lower bound on the space required to encode the given data, and support an ext ..."
Abstract
-
Cited by 30 (6 self)
- Add to MetaCart
We define and design succinct indexes for several abstract data types (ADTs). The concept is to design auxiliary data structures called succinct indexes that occupy asymptotically less space than the information-theoretic lower bound on the space required to encode the given data, and support an extended set of operations using the basic operators defined in the ADT. As opposed to succinct encodings, The main advantage of succinct indexes is that we make assumptions only on the ADT through which the main data is accessed, rather than the way in which the data is encoded. This allows more freedom in the encoding of the main data. In this paper, we present succinct indexes for various data types, namely strings, binary relations and multi-labeled trees. Given the support for the interface of the ADTs of these data types, we can support various useful operations efficiently by constructing succinct indexes for them. When the operators in the ADTs are supported in constant time, our results are comparable to previous results, while allowing more flexibility in the encoding of the given data. Using our techniques, we design the first succinct encoding that represents a string of length n over alphabet [σ] using nHk + o(n lg σ) bits 1 that support access / rank / select operations in o((lg lg σ) 3) time. We also design the first succinct text index using nHk + o(n lg σ) bits that supports pattern matching queries in O(m lg lg σ + occ lg 1+ɛ n lg lg σ) time, for a given pattern of length m. Previous results on these two problems either have a lg σ factor instead of lg lg σ in terms of running time, or are not compressible, but our results do not have such problems. More results are reported in the paper. 1 We use lg n to denote ⌈log2 n⌉. 1
Compressed representations of permutations, and applications
- SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE
"... We explore various techniques to compress a permutation π over n integers, taking advantage of ordered subsequences in π, while supporting its application π(i) and the application of its inverse π −1 (i) in small time. Our compression schemes yield several interesting byproducts, in many cases mat ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
We explore various techniques to compress a permutation π over n integers, taking advantage of ordered subsequences in π, while supporting its application π(i) and the application of its inverse π −1 (i) in small time. Our compression schemes yield several interesting byproducts, in many cases matching, improving or extending the best existing results on applications such as the encoding of a permutation in order to support iterated applications π k (i) of it, of integer functions, and of inverted lists and suffix arrays.
A Fast and Compact Web Graph Representation
"... Compressed graphs representation has become an attractive research topic because of its applications in the manipulation of huge Web graphs in main memory. By far the best current result is the technique by Boldi and Vigna, which takes advantage of several particular properties of Web graphs. In t ..."
Abstract
-
Cited by 11 (10 self)
- Add to MetaCart
Compressed graphs representation has become an attractive research topic because of its applications in the manipulation of huge Web graphs in main memory. By far the best current result is the technique by Boldi and Vigna, which takes advantage of several particular properties of Web graphs. In this paper we show that the same properties can be exploited with a different and elegant technique, built on Re-Pair compression, which achieves about the same space but much faster navigation of the graph. Moreover, the technique has the potential of adapting well to secondary memory. In addition, we introduce an approximate Re-Pair version that works efficiently with limited main memory.
Succinct Orthogonal Range Search Structures on a Grid with Applications to Text Indexing ⋆
"... Abstract. We present a succinct representation of a set of n points on an n × n grid using n lg n + o(nlg n) bits 3 to support orthogonal range counting in O(lg n / lg lg n) time, and range reporting in O(k lg n/lg lg n) time, where k is the size of the output. This achieves an improvement on query ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Abstract. We present a succinct representation of a set of n points on an n × n grid using n lg n + o(nlg n) bits 3 to support orthogonal range counting in O(lg n / lg lg n) time, and range reporting in O(k lg n/lg lg n) time, where k is the size of the output. This achieves an improvement on query time by a factor of lg lg n upon the previous result of Mäkinen and Navarro [15], while using essentially the information-theoretic minimum space. Our data structure not only can be used as a key component in solutions to the general orthogonal range search problem to save storage cost, but also has applications in text indexing. In particular, we apply it to improve two previous space-efficient text indexes that support substring search [7] and position-restricted substring search [15]. We also use it to extend previous results on succinct representations of sequences of small integers, and to design succinct data structures supporting certain types of orthogonal range query in the plane. 1
Improved dynamic rank-select entropy-bound structures
- in Proc. of the Latin American Theoretical Informatics (LATIN
"... Abstract. Operations rank and select over a sequence of symbols have many applications to the design of succinct and compressed data structures to manage text collections, structured text, binary relations, trees, graphs, and so on. We are interested in the case where the collections can be updated ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Abstract. Operations rank and select over a sequence of symbols have many applications to the design of succinct and compressed data structures to manage text collections, structured text, binary relations, trees, graphs, and so on. We are interested in the case where the collections can be updated via insertions and deletions of symbols. Two current solutions stand out as the best in the tradeoff of space versus time (considering all the operations). One by Mäkinen and Navarro achieves compressed space (i.e., nH0 + o(n log σ) bits) and O(log nlog σ) worst-case time for all the operations, where n is the sequence length, σ is the alphabet size, and H0 is the zero-order entropy of the sequence. The other log σ log log n solution, by Lee and Park, achieves O(log n(1 +)) amortized time and uncompressed space, i.e. nlog σ +O(n)+o(nlog σ) bits. In this paper we show that the best of both worlds can be achieved. We log σ combine the solutions to obtain nH0+o(nlog σ) bits of space and O(log n(1+)) worst-case time log log n for all the operations. Apart from the best current solution, we obtain some byproducts that might be
Compact Rich-Functional Binary Relation Representations
"... Abstract. Binary relations are an important abstraction arising in a number of data representation problems. Each existing data structure specializes in the few basic operations required by one single application, and takes only limited advantage of the inherent redundancy of binary relations. We sh ..."
Abstract
-
Cited by 8 (7 self)
- Add to MetaCart
Abstract. Binary relations are an important abstraction arising in a number of data representation problems. Each existing data structure specializes in the few basic operations required by one single application, and takes only limited advantage of the inherent redundancy of binary relations. We show how to support more general operations efficiently, while taking better advantage of some forms of redundancy in practical instances. As a basis for a more general discussion on binary relation data structures, we list the operations of potential interest for practical applications, and give reductions between operations. We identify a set of operations that yield the support of all others. As a first contribution to the discussion, we present two data structures for binary relations, each of which achieves a distinct tradeoff between the space used to store and index the relation, the set of operations supported in sublinear time, and the time in which those operations are supported. The experimental performance of our data structures shows that they not only offer good time complexities to carry out many operations, but also take advantage of regularities that arise in practical instances in order to reduce space usage. 1
Self-indexed text compression using straight-line programs
- In Proc. 34th MFCS
, 2009
"... Abstract. Straight-line programs (SLPs) offer powerful text compression by representing a text T [1, u] in terms of a restricted context-free grammar of n rules, so that T can be recovered in O(u) time. However, the problem of operating the grammar in compressed form has not been studied much. We pr ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Abstract. Straight-line programs (SLPs) offer powerful text compression by representing a text T [1, u] in terms of a restricted context-free grammar of n rules, so that T can be recovered in O(u) time. However, the problem of operating the grammar in compressed form has not been studied much. We present a grammar representation whose size is of the same order of that of a plain SLP representation, and can answer other queries apart from expanding nonterminals. This can be of independent interest. We then extend it to achieve the first grammar representation able of extracting text substrings, and of searching the text for patterns, in time o(n). We also give byproducts on representing binary relations. 1 Introduction and Related Work Grammar-based compression is a well-known technique since at least the seventies, and still a very active area of research. From the different variants of the idea, we focus on the case where a given text T [1, u] is replaced by a context-free grammar (CFG) G that generates just the string T. Then one can store G instead
Extended Compact Web Graph Representations
"... Abstract. Many relevant Web mining tasks translate into classical algorithms on the Web graph. Compact Web graph representations allow running these tasks on larger graphs within main memory. These representations at least provide fast navigation (to the neighbors of a node), yet more sophisticated ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
Abstract. Many relevant Web mining tasks translate into classical algorithms on the Web graph. Compact Web graph representations allow running these tasks on larger graphs within main memory. These representations at least provide fast navigation (to the neighbors of a node), yet more sophisticated operations are desirable for several Web analyses. We present a compact Web graph representation that, in addition, supports reverse navigation (to the nodes pointing to the given one). The standard approach to achieve this is to represent the graph and its transpose, which basically doubles the space requirement. Our structure, instead, represents the adjacency list using a compact sequence representation that allows finding the positions where a given node v is mentioned, and answers reverse navigation using that primitive. This is combined with a previous proposal based on grammar compression of the adjacency list. The combination yields interesting algorithmic problems. As a result, we achieve the smallest graph representation reported in the
Alphabet Partitioning for Compressed Rank/Select and Applications
"... Abstract. We present a data structure that stores a string s[1..n] over the alphabet [1..σ] in nH0(s) + o(n)(H0(s)+1) bits, where H0(s) is the zero-order entropy of s. This data structure supports the queries access and rank in time O (lg lg σ), and the select query in constant time. This result imp ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
Abstract. We present a data structure that stores a string s[1..n] over the alphabet [1..σ] in nH0(s) + o(n)(H0(s)+1) bits, where H0(s) is the zero-order entropy of s. This data structure supports the queries access and rank in time O (lg lg σ), and the select query in constant time. This result improves on previously known data structures using nH0(s) + o(n lg σ) bits, where on highly compressible instances the redundancy o(n lg σ) cease to be negligible compared to the nH0(s) bits that encode the data. The technique is based on combining previous results through an ingenious partitioning of the alphabet, and practical enough to be implementable. It applies not only to strings, but also to several other compact data structures. For example, we achieve (i) faster search times and lower redundancy for the smallest existing full-text self-index; (ii) compressed permutations π with times for π() and π −1 () improved to log-logarithmic; and (iii) the first compressed representation of dynamic collections of disjoint sets. 1
Rank/Select on Dynamic Compressed Sequences and Applications ⋆
"... Operations rank and select over a sequence of symbols have many applications to the design of succinct and compressed data structures managing text collections, structured text, binary relations, trees, graphs, and so on. We are interested in the case where the collections can be updated via inserti ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Operations rank and select over a sequence of symbols have many applications to the design of succinct and compressed data structures managing text collections, structured text, binary relations, trees, graphs, and so on. We are interested in the case where the collections can be updated via insertions and deletions of symbols. Two current solutions stand out as the best in the tradeoff of space versus time (when considering all the operations). One solution, by Mäkinen and Navarro, achieves compressed space (i.e., nH0 +o(n log σ) bits) and O(log n log σ) worst-case time for all the operations, where n is the sequence length, σ is the alphabet size, and H0 is the zero-order entropy of the sequence. The other solution, by Lee and log σ Park, achieves O(log n(1 + log log n)) amortized time and uncompressed space, i.e. n log2 σ +O(n)+o(n log σ) bits. In this paper we show that the best of both worlds can be achieved. We combine the solutions to obtain nH0 + o(n log σ) bits of space log σ log log n and O(log n(1 +)) worst-case time for all the operations. Apart from the best current solution to the problem, we obtain several byproducts of independent interest applicable to partial sums, text indexes, suffix arrays, the Burrows-Wheeler transform, and others.

