Results 1  10
of
50
Range quantile queries: Another virtue of wavelet trees
 In Proc. 16th SPIRE, LNCS 5721
, 2009
"... Abstract. We show how to use a balanced wavelet tree as a data structure that stores a list of numbers and supports efficient range quantile queries. A range quantile query takes a rank and the endpoints of a sublist and returns the number with that rank in that sublist. For example, if the rank is ..."
Abstract

Cited by 37 (6 self)
 Add to MetaCart
(Show Context)
Abstract. We show how to use a balanced wavelet tree as a data structure that stores a list of numbers and supports efficient range quantile queries. A range quantile query takes a rank and the endpoints of a sublist and returns the number with that rank in that sublist. For example, if the rank is half the sublist’s length, then the query returns the sublist’s median. We also show how these queries can be used to support spaceefficient coloured range reporting and document listing. 1
Wavelet Trees for All
"... The wavelet tree is a versatile data structure that serves a number of purposes, from string processing to geometry. It can be regarded as a device that represents a sequence, a reordering, or a grid of points. In addition, its space adapts to various entropy measures of the data it encodes, enabli ..."
Abstract

Cited by 32 (12 self)
 Add to MetaCart
(Show Context)
The wavelet tree is a versatile data structure that serves a number of purposes, from string processing to geometry. It can be regarded as a device that represents a sequence, a reordering, or a grid of points. In addition, its space adapts to various entropy measures of the data it encodes, enabling compressed representations. New competitive solutions to a number of problems, based on wavelet trees, are appearing every year. In this survey we give an overview of wavelet trees and the surprising number of applications in which we have found them useful: basic and weighted point grids, sets of rectangles, strings, permutations, binary relations, graphs, inverted indexes, document retrieval indexes, fulltext indexes, XML indexes, and general numeric sequences.
Faster EntropyBounded Compressed Suffix Trees
, 2009
"... Suffix trees are among the most important data structures in stringology, with a number of applications in flourishing areas like bioinformatics. Their main problem is space usage, which has triggered much research striving for compressed representations that are still functional. A smaller suffix t ..."
Abstract

Cited by 31 (15 self)
 Add to MetaCart
Suffix trees are among the most important data structures in stringology, with a number of applications in flourishing areas like bioinformatics. Their main problem is space usage, which has triggered much research striving for compressed representations that are still functional. A smaller suffix tree representation could fit in a faster memory, outweighing by far the theoretical slowdown brought by the space reduction. We present a novel compressed suffix tree, which is the first achieving at the same time sublogarithmic complexity for the operations, and space usage that asymptotically goes to zero as the entropy of the text does. The main ideas in our development are compressing the longest common prefix information, totally getting rid of the suffix tree topology, and expressing all the suffix tree operations using range minimum queries and a novel primitive called next/previous smaller value in a sequence. Our solutions to those operations are of independent interest.
Alphabet Partitioning for Compressed Rank/Select and Applications
, 2010
"... We present a data structure that stores a string s[1..n] over the alphabet [1..σ] in nH0(s) + o(n)(H0(s)+1) bits, where H0(s) is the zeroorder entropy of s. This data structure supports the queries access and rank in time O (lg lg σ), and the select query in constant time. This result improves on ..."
Abstract

Cited by 31 (20 self)
 Add to MetaCart
We present a data structure that stores a string s[1..n] over the alphabet [1..σ] in nH0(s) + o(n)(H0(s)+1) bits, where H0(s) is the zeroorder entropy of s. This data structure supports the queries access and rank in time O (lg lg σ), and the select query in constant time. This result improves on previously known data structures using nH0(s) + o(n lg σ) bits, where on highly compressible instances the redundancy o(n lg σ) cease to be negligible compared to the nH0(s) bits that encode the data. The technique is based on combining previous results through an ingenious partitioning of the alphabet, and practical enough to be implementable. It applies not only to strings, but also to several other compact data structures. For example, we achieve (i) faster search times and lower redundancy for the smallest existing fulltext selfindex; (ii) compressed permutations π with times for π() and π −1 () improved to loglogarithmic; and (iii) the first compressed representation of dynamic collections of disjoint sets.
Colored Range Queries and Document Retrieval
"... Colored range queries are a wellstudied topic in computational geometry and database research that, in the past decade, have found exciting applications in information retrieval. In this paper we give improved time and space bounds for three important onedimensional colored range queries — colore ..."
Abstract

Cited by 31 (18 self)
 Add to MetaCart
(Show Context)
Colored range queries are a wellstudied topic in computational geometry and database research that, in the past decade, have found exciting applications in information retrieval. In this paper we give improved time and space bounds for three important onedimensional colored range queries — colored range listing, colored range topk queries and colored range counting — and, thus, new bounds for various document retrieval problems on general collections of sequences. Specifically, we first describe a framework including almost all recent results on colored range listing and document listing, which suggests new combinations of data structures for these problems. For example, we give the fastest compressed data structures for colored range listing and document listing, and an efficient data structure for document listing whose size is bounded in terms of the highorder entropies of the library of documents. We then show how (approximate) colored topk queries can be reduced to (approximate) rangemode queries on subsequences, yielding the first efficient data structure for this problem. Finally, we show how a modified wavelet tree can support colored range counting in logarithmic time and space that is succinct whenever the number of colors is superpolylogarithmic in the length of the sequence.
Compressed representations of permutations, and applications
 SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE
"... We explore various techniques to compress a permutation π over n integers, taking advantage of ordered subsequences in π, while supporting its application π(i) and the application of its inverse π −1 (i) in small time. Our compression schemes yield several interesting byproducts, in many cases mat ..."
Abstract

Cited by 30 (17 self)
 Add to MetaCart
We explore various techniques to compress a permutation π over n integers, taking advantage of ordered subsequences in π, while supporting its application π(i) and the application of its inverse π −1 (i) in small time. Our compression schemes yield several interesting byproducts, in many cases matching, improving or extending the best existing results on applications such as the encoding of a permutation in order to support iterated applications π k (i) of it, of integer functions, and of inverted lists and suffix arrays.
Succinct Orthogonal Range Search Structures on a Grid with Applications to Text Indexing
"... We present a succinct representation of a set of n points on an n × n grid using n lg n + o(nlg n) bits 3 to support orthogonal range counting in O(lg n / lg lg n) time, and range reporting in O(k lg n/lg lg n) time, where k is the size of the output. This achieves an improvement on query time by ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
(Show Context)
We present a succinct representation of a set of n points on an n × n grid using n lg n + o(nlg n) bits 3 to support orthogonal range counting in O(lg n / lg lg n) time, and range reporting in O(k lg n/lg lg n) time, where k is the size of the output. This achieves an improvement on query time by a factor of lg lg n upon the previous result of Mäkinen and Navarro [15], while using essentially the informationtheoretic minimum space. Our data structure not only can be used as a key component in solutions to the general orthogonal range search problem to save storage cost, but also has applications in text indexing. In particular, we apply it to improve two previous spaceefficient text indexes that support substring search [7] and positionrestricted substring search [15]. We also use it to extend previous results on succinct representations of sequences of small integers, and to design succinct data structures supporting certain types of orthogonal range query in the plane.
SelfIndexing Based on LZ77
"... We introduce the first selfindex based on the LempelZiv 1977 compression format (LZ77). It is particularly competitive for highly repetitive text collections such as sequence databases of genomes of related species, software repositories, versioned document collections, and temporal text databases ..."
Abstract

Cited by 26 (6 self)
 Add to MetaCart
(Show Context)
We introduce the first selfindex based on the LempelZiv 1977 compression format (LZ77). It is particularly competitive for highly repetitive text collections such as sequence databases of genomes of related species, software repositories, versioned document collections, and temporal text databases. Such collections are extremely compressible but classical selfindexes fail to capture that source of compressibility. Our selfindex takes in practice a few times the space of the text compressed with LZ77 (as little as 2.5 times), extracts 1–2 million characters of the text per second, and finds patterns at a rate of 10–50 microseconds per occurrence. It is smaller (up to one half) than the best current selfindex for repetitive collections, and faster in many cases.
Selfindexed text compression using straightline programs
 In Proc. 34th MFCS
, 2009
"... Abstract. Straightline programs (SLPs) offer powerful text compression by representing a text T [1, u] in terms of a restricted contextfree grammar of n rules, so that T can be recovered in O(u) time. However, the problem of operating the grammar in compressed form has not been studied much. We pr ..."
Abstract

Cited by 19 (8 self)
 Add to MetaCart
(Show Context)
Abstract. Straightline programs (SLPs) offer powerful text compression by representing a text T [1, u] in terms of a restricted contextfree grammar of n rules, so that T can be recovered in O(u) time. However, the problem of operating the grammar in compressed form has not been studied much. We present a grammar representation whose size is of the same order of that of a plain SLP representation, and can answer other queries apart from expanding nonterminals. This can be of independent interest. We then extend it to achieve the first grammar representation able of extracting text substrings, and of searching the text for patterns, in time o(n). We also give byproducts on representing binary relations. 1 Introduction and Related Work Grammarbased compression is a wellknown technique since at least the seventies, and still a very active area of research. From the different variants of the idea, we focus on the case where a given text T [1, u] is replaced by a contextfree grammar (CFG) G that generates just the string T. Then one can store G instead