Results 1 - 10
of
23
Compressed representations of permutations, and applications
- SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE
"... We explore various techniques to compress a permutation π over n integers, taking advantage of ordered subsequences in π, while supporting its application π(i) and the application of its inverse π −1 (i) in small time. Our compression schemes yield several interesting byproducts, in many cases mat ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
We explore various techniques to compress a permutation π over n integers, taking advantage of ordered subsequences in π, while supporting its application π(i) and the application of its inverse π −1 (i) in small time. Our compression schemes yield several interesting byproducts, in many cases matching, improving or extending the best existing results on applications such as the encoding of a permutation in order to support iterated applications π k (i) of it, of integer functions, and of inverted lists and suffix arrays.
A Fast and Compact Web Graph Representation
"... Compressed graphs representation has become an attractive research topic because of its applications in the manipulation of huge Web graphs in main memory. By far the best current result is the technique by Boldi and Vigna, which takes advantage of several particular properties of Web graphs. In t ..."
Abstract
-
Cited by 11 (10 self)
- Add to MetaCart
Compressed graphs representation has become an attractive research topic because of its applications in the manipulation of huge Web graphs in main memory. By far the best current result is the technique by Boldi and Vigna, which takes advantage of several particular properties of Web graphs. In this paper we show that the same properties can be exploited with a different and elegant technique, built on Re-Pair compression, which achieves about the same space but much faster navigation of the graph. Moreover, the technique has the potential of adapting well to secondary memory. In addition, we introduce an approximate Re-Pair version that works efficiently with limited main memory.
Faster Entropy-Bounded Compressed Suffix Trees
, 2009
"... Suffix trees are among the most important data structures in stringology, with a number of applications in flourishing areas like bioinformatics. Their main problem is space usage, which has triggered much research striving for compressed representations that are still functional. A smaller suffix t ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
Suffix trees are among the most important data structures in stringology, with a number of applications in flourishing areas like bioinformatics. Their main problem is space usage, which has triggered much research striving for compressed representations that are still functional. A smaller suffix tree representation could fit in a faster memory, outweighing by far the theoretical slowdown brought by the space reduction. We present a novel compressed suffix tree, which is the first achieving at the same time sublogarithmic complexity for the operations, and space usage that asymptotically goes to zero as the entropy of the text does. The main ideas in our development are compressing the longest common prefix information, totally getting rid of the suffix tree topology, and expressing all the suffix tree operations using range minimum queries and a novel primitive called next/previous smaller value in a sequence. Our solutions to those operations are of independent interest.
Directly Addressable Variable-Length Codes ⋆
"... Abstract. We introduce a symbol reordering technique that implicitly synchronizes variable-length codes, such that it is possible to directly access the i-th codeword without need of any sampling method. The technique is practical and has many applications to the representation of ordered sets, spar ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
Abstract. We introduce a symbol reordering technique that implicitly synchronizes variable-length codes, such that it is possible to directly access the i-th codeword without need of any sampling method. The technique is practical and has many applications to the representation of ordered sets, sparse bitmaps, partial sums, and compressed data structures for suffix trees, arrays, and inverted indexes, to name just a few. We show experimentally that the technique offers a competitive alternative to other data structures that handle this problem. 1
Succinct Trees in Practice
"... We implement and compare the major current techniques for representing general trees in succinct form. This is important because a general tree of n nodes is usually represented in pointer form, requiring O(n log n) bits, whereas the succinct representations we study require just 2n + o(n) bits and ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
We implement and compare the major current techniques for representing general trees in succinct form. This is important because a general tree of n nodes is usually represented in pointer form, requiring O(n log n) bits, whereas the succinct representations we study require just 2n + o(n) bits and carry out many sophisticated operations in constant time. Yet, there is no exhaustive study in the literature comparing the practical magnitudes of the o(n)-space and the O(1)-time terms. The techniques can be classified into three broad trends: those based on BP (balanced parentheses in preorder), those based on DFUDS (depth-first unary degree sequence), and those based on LOUDS (level-ordered unary degree sequence). BP and DFUDS require a balanced parentheses representation that supports the core operations
Extended Compact Web Graph Representations
"... Abstract. Many relevant Web mining tasks translate into classical algorithms on the Web graph. Compact Web graph representations allow running these tasks on larger graphs within main memory. These representations at least provide fast navigation (to the neighbors of a node), yet more sophisticated ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
Abstract. Many relevant Web mining tasks translate into classical algorithms on the Web graph. Compact Web graph representations allow running these tasks on larger graphs within main memory. These representations at least provide fast navigation (to the neighbors of a node), yet more sophisticated operations are desirable for several Web analyses. We present a compact Web graph representation that, in addition, supports reverse navigation (to the nodes pointing to the given one). The standard approach to achieve this is to represent the graph and its transpose, which basically doubles the space requirement. Our structure, instead, represents the adjacency list using a compact sequence representation that allows finding the positions where a given node v is mentioned, and answers reverse navigation using that primitive. This is combined with a previous proposal based on grammar compression of the adjacency list. The combination yields interesting algorithmic problems. As a result, we achieve the smallest graph representation reported in the
Alphabet Partitioning for Compressed Rank/Select and Applications
"... Abstract. We present a data structure that stores a string s[1..n] over the alphabet [1..σ] in nH0(s) + o(n)(H0(s)+1) bits, where H0(s) is the zero-order entropy of s. This data structure supports the queries access and rank in time O (lg lg σ), and the select query in constant time. This result imp ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
Abstract. We present a data structure that stores a string s[1..n] over the alphabet [1..σ] in nH0(s) + o(n)(H0(s)+1) bits, where H0(s) is the zero-order entropy of s. This data structure supports the queries access and rank in time O (lg lg σ), and the select query in constant time. This result improves on previously known data structures using nH0(s) + o(n lg σ) bits, where on highly compressible instances the redundancy o(n lg σ) cease to be negligible compared to the nH0(s) bits that encode the data. The technique is based on combining previous results through an ingenious partitioning of the alphabet, and practical enough to be implementable. It applies not only to strings, but also to several other compact data structures. For example, we achieve (i) faster search times and lower redundancy for the smallest existing full-text self-index; (ii) compressed permutations π with times for π() and π −1 () improved to log-logarithmic; and (iii) the first compressed representation of dynamic collections of disjoint sets. 1
Practical Compressed Document Retrieval ⋆
"... Abstract. Recent research on document retrieval for general texts has established the virtues of explicitly representing the so-called document array, which stores the document each pointer of the suffix array belongs to. While it makes document retrieval faster, this array occupies a significative ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Abstract. Recent research on document retrieval for general texts has established the virtues of explicitly representing the so-called document array, which stores the document each pointer of the suffix array belongs to. While it makes document retrieval faster, this array occupies a significative amount of redundant space and is not easily compressible. In this paper we present the first practical proposal to compress the document array. We show that the resulting structureis significatively smaller than the uncompressed counterpart, and than alternatives to the document array proposed in the literature. We also compare various known algorithms for document listing and top-k retrieval, and find that the most useful combinations of algorithms run over our new compressed document arrays. 1
A New Point Access Method based on Wavelet Trees
, 2009
"... The development of index structures that allow efficient retrieval of spatial objects has been a topic of interest in the last decades. Most of these structures have been designed for secondary memory. However, in the last years the price of memory has decreased drastically. Nowadays it is feasible ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
The development of index structures that allow efficient retrieval of spatial objects has been a topic of interest in the last decades. Most of these structures have been designed for secondary memory. However, in the last years the price of memory has decreased drastically. Nowadays it is feasible to place complete spatial indexes in main memory. In this paper we focus in a subcategory of spatial indexes named Point Access Methods. These indexes are designed to solve the problem of indexing points. We present a new index structure designed for two dimensions and main memory that keeps a good trade-off between the space needed to store the index and its search efficiency. Our structure is based on a wavelet tree, which was originally designed to represent sequences, but has been successfully used as an index in areas like information retrieval or image compression.
Practical Compressed Suffix Trees
"... Abstract. The suffix tree is an extremely important data structure for stringology, with a wealth of applications in bioinformatics. Classical implementations require much space, which renders them useless for large problems. Recent research has yielded two implementations offering widely different ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. The suffix tree is an extremely important data structure for stringology, with a wealth of applications in bioinformatics. Classical implementations require much space, which renders them useless for large problems. Recent research has yielded two implementations offering widely different space-time tradeoffs. However, each of them has practicality problems regarding either space or time requirements. In this paper we implement a recent theoretical proposal and show it yields an extremely interesting structure that lies in between, offering both practical times and affordable space. The implementation of the theoretical proposal is by no means trivial and involves significant algorithm engineering. 1

