Results 1  10
of
14
Wavelet Trees for All
"... The wavelet tree is a versatile data structure that serves a number of purposes, from string processing to geometry. It can be regarded as a device that represents a sequence, a reordering, or a grid of points. In addition, its space adapts to various entropy measures of the data it encodes, enabli ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
The wavelet tree is a versatile data structure that serves a number of purposes, from string processing to geometry. It can be regarded as a device that represents a sequence, a reordering, or a grid of points. In addition, its space adapts to various entropy measures of the data it encodes, enabling compressed representations. New competitive solutions to a number of problems, based on wavelet trees, are appearing every year. In this survey we give an overview of wavelet trees and the surprising number of applications in which we have found them useful: basic and weighted point grids, sets of rectangles, strings, permutations, binary relations, graphs, inverted indexes, document retrieval indexes, fulltext indexes, XML indexes, and general numeric sequences.
Improved grammarbased compressed indexes
 In Proc. 19th SPIRE, LNCS 7608
, 2012
"... Abstract. We introduce the first grammarcompressed representation of a sequence that supports searches in time that depends only logarithmically on the size of the grammar. Given a text T [1..u] that is represented by a (contextfree) grammar of n (terminal and nonterminal) symbols and size N (meas ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Abstract. We introduce the first grammarcompressed representation of a sequence that supports searches in time that depends only logarithmically on the size of the grammar. Given a text T [1..u] that is represented by a (contextfree) grammar of n (terminal and nonterminal) symbols and size N (measured as the sum of the lengths of the right hands of the rules), a basic grammarbased representation of T takes N lg n bits of space. Our representation requires 2N lg n + N lg u + ɛ n lg n + o(N lg n) bits of space, for any 0 < ɛ ≤ 1. It can find the positions of the occ occurrences of a pattern of length m in T in O (m 2 /ɛ) lg lg u lg n + (m + occ) lg n time, and extract any substring of length ℓ of T in time O(ℓ + h lg(N/h)), where h is the height of the grammar tree.
Higherdimensional orthogonal range reporting and rectangle stabbing in the pointer machine model
 In Proc. 28th ACM Symposium on Computational Geometry
, 2012
"... In this paper, we consider two fundamental problems in the pointer machine model of computation, namely orthogonal range reporting and rectangle stabbing. Orthogonal range reporting is the problem of storing a set of n points in ddimensional space in a data structure, such that the t points in an a ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
In this paper, we consider two fundamental problems in the pointer machine model of computation, namely orthogonal range reporting and rectangle stabbing. Orthogonal range reporting is the problem of storing a set of n points in ddimensional space in a data structure, such that the t points in an axisaligned query rectangle can be reported efficiently. Rectangle stabbing is the “dual ” problem where a set of n axisaligned rectangles should be stored in a data structure, such that the t rectangles that contain a query point can be reported efficiently. Very recently an optimal O(log n + t) query time pointer machine data structure was developed for the threedimensional version of the orthogonal range reporting problem. However, in four dimensions the best known query bound of O(log 2 n / log log n + t) has not been improved for decades. We describe an orthogonal range reporting data structure that is the first structure to achieve significantly less than O(log 2 n + t) query time in four dimensions. More precisely, we develop a structure that uses O(n(log n / log log n) d) space and can answer ddimensional orthogonal range reporting queries (for d ≥ 4) in O(log n(log n / log log n) d−4+1/(d−2) +t) time. Ignoring log log n factors, this speeds up the best previous query time by a log 1−1/(d−2) n factor. For the rectangle stabbing problem, we show that any data structure that uses nh space must use Ω(log n(log n / log h) d−2 + t) time
Indexing Highly Repetitive Collections
"... Abstract. The need to index and search huge highly repetitive sequence collections is rapidly arising in various fields, including computational biology, software repositories, versioned collections, and others. In this short survey we briefly describe the progress made along three research lines to ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. The need to index and search huge highly repetitive sequence collections is rapidly arising in various fields, including computational biology, software repositories, versioned collections, and others. In this short survey we briefly describe the progress made along three research lines to address the problem: compressed suffix arrays, grammar compressed indexes, and LempelZiv compressed indexes. 1
Nearoptimal range reporting structures for categorical data
 In Proc. 24th ACM/SIAM Symposium on Discrete Algorithms
, 2013
"... Range reporting on categorical (or colored) data is a wellstudied generalization of the classical range reporting problem in which each of the N input points has an associated color (category). A query then asks to report the set of colors of the points in a given rectangular query range, which may ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Range reporting on categorical (or colored) data is a wellstudied generalization of the classical range reporting problem in which each of the N input points has an associated color (category). A query then asks to report the set of colors of the points in a given rectangular query range, which may be far smaller than the set of all points in the query range. We study twodimensional categorical range reporting in both the wordRAM and I/Omodel. For the I/Omodel, we present two alternative data structures for threesided queries. The first answers queries in optimal O(lgB N + K/B) I/Os using O(N lg ∗ N) space, where K is the number of distinct colors in the output, B is the disk block size, and lg ∗ N is the iterated logarithm of N. Our second data structure uses linear space and answers queries in O(lgB N + lg (h) N + K/B) I/Os for any constant integer h ≥ 1. Here lg (1) N = lg N and lg (h) N = lg(lg (h−1) N) when h> 1. Both solutions use only comparisons on the coordinates. We also show that the lgB N terms in the query costs can be reduced to optimal lg lgB U when the input points lie on a U × U grid and we allow wordlevel manipulations of the coordinates. We further reduce the query time to just O(1) if the points are given on an N × N grid. Both solutions also lead to improved data structures for foursided queries. For the wordRAM, we obtain optimal data structures for threesided range reporting, as well as improved upper bounds for foursided range reporting. Finally, we show a tight lower bound on onedimensional categorical range counting using an elegant reduction from (standard) twodimensional range counting. 1
Sorted Range Reporting
"... Abstract. In this paper we consider a variant of the orthogonal range reporting problem when all points should be reported in the sorted order of their xcoordinates. We show that reporting twodimensional points with this additional condition can be organized (almost) as efficiently as the standard ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. In this paper we consider a variant of the orthogonal range reporting problem when all points should be reported in the sorted order of their xcoordinates. We show that reporting twodimensional points with this additional condition can be organized (almost) as efficiently as the standard range reporting. Moreover, our results generalize and improve the previously known results for the orthogonal range successor problem and can be used to obtain better solutions for some stringology problems. 1
TwoDimensional Range Diameter Queries
"... Abstract. Given a set of n points in the plane, range diameter queries ask for the furthest pair of points in a given axisparallel rectangular range. We provide evidence for the hardness of designing spaceefficient data structures that support range diameter queries by giving a reduction from the ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. Given a set of n points in the plane, range diameter queries ask for the furthest pair of points in a given axisparallel rectangular range. We provide evidence for the hardness of designing spaceefficient data structures that support range diameter queries by giving a reduction from the set intersection problem. The difficulty of the latter problem is widely acknowledged and is conjectured to require nearly quadratic space in order to obtain constant query time, which is matched by known data structures for both problems, up to polylogarithmic factors. We strengthen the evidence by giving a lower bound for an important subproblem arising in solutions to the range diameter problem: computing the diameter of two convex polygons, that are separated by a vertical line and are preprocessed independently, requires almost linear time in the number of vertices of the smaller polygon, no matter how much space is used. We also show that range diameter queries can be answered much more efficiently for the case of points in convex position by describing a data structure of size O(n log n) that supports queries in O(log n) time. 1
SpaceEfficient DataAnalysis Queries on Grids
"... We consider various dataanalysis queries on twodimensional points. We give new space/time tradeoffs over previous work on geometric queries such as dominance and rectangle visibility, and on semigroup and group queries such as sum, average, variance, minimum and maximum. We also introduce new solu ..."
Abstract
 Add to MetaCart
We consider various dataanalysis queries on twodimensional points. We give new space/time tradeoffs over previous work on geometric queries such as dominance and rectangle visibility, and on semigroup and group queries such as sum, average, variance, minimum and maximum. We also introduce new solutions to queries less frequently considered in the literature such as twodimensional quantiles, majorities, successor/predecessor, mode, and various topk queries, considering static and dynamic scenarios.
Models and Techniques for Proving Data Structure Lower Bounds
, 2013
"... In this dissertation, we present a number of new techniques and tools for proving lower bounds on the operational time of data structures. These techniques provide new lines of attack for proving lower bounds in both the cell probe model, the group model, the pointer machine model and the I/Omodel. ..."
Abstract
 Add to MetaCart
In this dissertation, we present a number of new techniques and tools for proving lower bounds on the operational time of data structures. These techniques provide new lines of attack for proving lower bounds in both the cell probe model, the group model, the pointer machine model and the I/Omodel. In all cases, we push the frontiers further by proving lower bounds higher than what could possibly be proved using previously known techniques. For the cell probe model, our results have the following consequences: • The first Ω(lg n) query time lower bound for linear space static data structures. The highest previous lower bound for any static data structure problem peaked at Ω(lg n / lg lg n). • An Ω((lg n / lg lg n) 2) lower bound on the maximum of the update time and the query time of dynamic data structures. This is almost a quadratic improvement over the highest previous lower bound of Ω(lg n). In the group model, we establish a number of intimate connections to the fields of combinatorial discrepancy and range reporting in the pointer machine
SpaceEfficient Representations of Rectangle Datasets Supporting Orthogonal Range Querying
"... The increasing use of geographic search engines manifests the interest of Internet users in geolocated resources and, in general, in geographic information. This has emphasized the importance of the development of efficient indexes over large geographic databases. The most common simplification of ..."
Abstract
 Add to MetaCart
The increasing use of geographic search engines manifests the interest of Internet users in geolocated resources and, in general, in geographic information. This has emphasized the importance of the development of efficient indexes over large geographic databases. The most common simplification of geographic objects used for indexing purposes is a twodimensional rectangle. Furthermore, one of the primitive operations that must be supported by every geographic index structure is the orthogonal range query, which retrieves all the geographic objects that have at least one point in common with a rectangular query region. In this work, we study several spaceefficient representations of rectangle datasets that can be used in the development of geographic indexes supporting orthogonal range queries.