Results 1 -
6 of
6
Wavelet Trees for All
"... The wavelet tree is a versatile data structure that serves a number of purposes, from string processing to geometry. It can be regarded as a device that represents a sequence, a reordering, or a grid of points. In addition, its space adapts to various entropy measures of the data it encodes, enabli ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The wavelet tree is a versatile data structure that serves a number of purposes, from string processing to geometry. It can be regarded as a device that represents a sequence, a reordering, or a grid of points. In addition, its space adapts to various entropy measures of the data it encodes, enabling compressed representations. New competitive solutions to a number of problems, based on wavelet trees, are appearing every year. In this survey we give an overview of wavelet trees and the surprising number of applications in which we have found them useful: basic and weighted point grids, sets of rectangles, strings, permutations, binary relations, graphs, inverted indexes, document retrieval indexes, full-text indexes, XML indexes, and general numeric sequences.
Improved grammar-based compressed indexes
- In Proc. 19th SPIRE, LNCS 7608
, 2012
"... Abstract. We introduce the first grammar-compressed representation of a sequence that supports searches in time that depends only logarithmically on the size of the grammar. Given a text T [1..u] that is represented by a (context-free) grammar of n (terminal and nonterminal) symbols and size N (meas ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. We introduce the first grammar-compressed representation of a sequence that supports searches in time that depends only logarithmically on the size of the grammar. Given a text T [1..u] that is represented by a (context-free) grammar of n (terminal and nonterminal) symbols and size N (measured as the sum of the lengths of the right hands of the rules), a basic grammar-based representation of T takes N lg n bits of space. Our representation requires 2N lg n + N lg u + ɛ n lg n + o(N lg n) bits of space, for any 0 < ɛ ≤ 1. It can find the positions of the occ occurrences of a pattern of length m in T in O (m 2 /ɛ) lg lg u lg n + (m + occ) lg n time, and extract any substring of length ℓ of T in time O(ℓ + h lg(N/h)), where h is the height of the grammar tree.
Sorted Range Reporting
"... Abstract. In this paper we consider a variant of the orthogonal range reporting problem when all points should be reported in the sorted order of their x-coordinates. We show that reporting two-dimensional points with this additional condition can be organized (almost) as efficiently as the standard ..."
Abstract
- Add to MetaCart
Abstract. In this paper we consider a variant of the orthogonal range reporting problem when all points should be reported in the sorted order of their x-coordinates. We show that reporting two-dimensional points with this additional condition can be organized (almost) as efficiently as the standard range reporting. Moreover, our results generalize and improve the previously known results for the orthogonal range successor problem and can be used to obtain better solutions for some stringology problems. 1
Two-Dimensional Range Diameter Queries
"... Abstract. Given a set of n points in the plane, range diameter queries ask for the furthest pair of points in a given axis-parallel rectangular range. We provide evidence for the hardness of designing space-efficient data structures that support range diameter queries by giving a reduction from the ..."
Abstract
- Add to MetaCart
Abstract. Given a set of n points in the plane, range diameter queries ask for the furthest pair of points in a given axis-parallel rectangular range. We provide evidence for the hardness of designing space-efficient data structures that support range diameter queries by giving a reduction from the set intersection problem. The difficulty of the latter problem is widely acknowledged and is conjectured to require nearly quadratic space in order to obtain constant query time, which is matched by known data structures for both problems, up to polylogarithmic factors. We strengthen the evidence by giving a lower bound for an important subproblem arising in solutions to the range diameter problem: computing the diameter of two convex polygons, that are separated by a vertical line and are preprocessed independently, requires almost linear time in the number of vertices of the smaller polygon, no matter how much space is used. We also show that range diameter queries can be answered much more efficiently for the case of points in convex position by describing a data structure of size O(n log n) that supports queries in O(log n) time. 1
Space-Efficient Data-Analysis Queries on Grids
"... We consider various data-analysis queries on two-dimensional points. We give new space/time tradeoffs over previous work on geometric queries such as dominance and rectangle visibility, and on semigroup and group queries such as sum, average, variance, minimum and maximum. We also introduce new solu ..."
Abstract
- Add to MetaCart
We consider various data-analysis queries on two-dimensional points. We give new space/time tradeoffs over previous work on geometric queries such as dominance and rectangle visibility, and on semigroup and group queries such as sum, average, variance, minimum and maximum. We also introduce new solutions to queries less frequently considered in the literature such as two-dimensional quantiles, majorities, successor/predecessor, mode, and various top-k queries, considering static and dynamic scenarios.
Indexing Highly Repetitive Collections
"... Abstract. The need to index and search huge highly repetitive sequence collections is rapidly arising in various fields, including computational biology, software repositories, versioned collections, and others. In this short survey we briefly describe the progress made along three research lines to ..."
Abstract
- Add to MetaCart
Abstract. The need to index and search huge highly repetitive sequence collections is rapidly arising in various fields, including computational biology, software repositories, versioned collections, and others. In this short survey we briefly describe the progress made along three research lines to address the problem: compressed suffix arrays, grammar compressed indexes, and Lempel-Ziv compressed indexes. 1

