Results 1  10
of
47
SpaceEfficient Preprocessing Schemes for Range Minimum Queries on Static Arrays
, 2009
"... Given a static array of n totally ordered object, the range minimum query problem is to build an additional data structure that allows to answer subsequent online queries of the form “what is the position of a minimum element in the subarray ranging from i to j? ” efficiently. We focus on two sett ..."
Abstract

Cited by 47 (3 self)
 Add to MetaCart
(Show Context)
Given a static array of n totally ordered object, the range minimum query problem is to build an additional data structure that allows to answer subsequent online queries of the form “what is the position of a minimum element in the subarray ranging from i to j? ” efficiently. We focus on two settings, where (1) the input array is available at query time, and (2) the input array is only available at construction time. In setting (1), we show new data structures (a) of n c(n) (2 + o(1)) bits and query time O(c(n)), or (b) with O(nHk) + o(n) bits and O(1) query size time, where Hk denotes the empirical entropy of k’th order of the input array. In setting (2), we give a data structure of optimal size 2n + o(n) bits and query time O(1). All data structures can be constructed in linear time and almost inplace.
SpaceEfficient Framework for Topk String Retrieval Problems
"... Given a set D = {d1, d2,..., dD} of D strings of total length n, our task is to report the “most relevant” strings for a given query pattern P. This involves somewhat more advanced query functionality than the usual pattern matching, as some notion of “most relevant” is involved. In information retr ..."
Abstract

Cited by 41 (8 self)
 Add to MetaCart
(Show Context)
Given a set D = {d1, d2,..., dD} of D strings of total length n, our task is to report the “most relevant” strings for a given query pattern P. This involves somewhat more advanced query functionality than the usual pattern matching, as some notion of “most relevant” is involved. In information retrieval literature, this task is best achieved by using inverted indexes. However, inverted indexes work only for some predefined set of patterns. In the pattern matching community, the most popular patternmatching data structures are suffix trees and suffix arrays. However, a typical suffix tree search involves going through all the occurrences of the pattern over the entire string collection, which might be a lot more than the required relevant documents. The first formal framework to study such kind of retrieval problems was given by Muthukrishnan [25]. He considered two metrics for relevance: frequency and proximity. He took a thresholdbased approach on these metrics and gave data structures taking O(n log n) words of space. We study this problem in a slightly different framework of reporting the top k most relevant documents (in sorted order) under similar and more general relevance metrics. Our framework gives linear space data structure with optimal query times for arbitrary score functions. As a corollary, it improves the space utilization for the problems in [25] while maintaining optimal query performance. We also develop compressed variants of these data structures for several specific relevance metrics.
Range quantile queries: Another virtue of wavelet trees
 In Proc. 16th SPIRE, LNCS 5721
, 2009
"... Abstract. We show how to use a balanced wavelet tree as a data structure that stores a list of numbers and supports efficient range quantile queries. A range quantile query takes a rank and the endpoints of a sublist and returns the number with that rank in that sublist. For example, if the rank is ..."
Abstract

Cited by 39 (6 self)
 Add to MetaCart
(Show Context)
Abstract. We show how to use a balanced wavelet tree as a data structure that stores a list of numbers and supports efficient range quantile queries. A range quantile query takes a rank and the endpoints of a sublist and returns the number with that rank in that sublist. For example, if the rank is half the sublist’s length, then the query returns the sublist’s median. We also show how these queries can be used to support spaceefficient coloured range reporting and document listing. 1
Optimal Succinctness for Range Minimum Queries
"... Abstract. For an array A of n objects from a totally ordered universe, a range minimum query rmq A(i, j) asks for the position of the minimum element in the subarray A[i, j]. We focus on the setting where the array A is static and known in advance, and can hence be preprocessed into a scheme in ord ..."
Abstract

Cited by 36 (4 self)
 Add to MetaCart
(Show Context)
Abstract. For an array A of n objects from a totally ordered universe, a range minimum query rmq A(i, j) asks for the position of the minimum element in the subarray A[i, j]. We focus on the setting where the array A is static and known in advance, and can hence be preprocessed into a scheme in order to answer future queries faster. We make the further assumption that the input array A cannot be used at query time. Under this assumption, a natural lower bound of 2n − Θ(log n) bits for RMQschemes exists. We give the first truly succinct preprocessing scheme for O(1)RMQs. Its final space consumption is 2n + o(n) bits, thus being asymptotically optimal. We also give a simple lineartime construction algorithm for this scheme that needs only n + o(n) bits of space in addition to the 2n + o(n) bits needed for the final data structure, thereby lowering the peak space consumption of previous schemes from O(n log n) to O(n) bits. We also improve on LCAcomputation in BPS and DFUDSencoded trees. 1
Topk Ranked Document Search in General Text Databases
"... Abstract. Text search engines return a set of k documents ranked by similarity to a query. Typically, documents and queries are drawn from natural language text, which can readily be partitioned into words, allowing optimizations of data structures and algorithms for ranking. However, in many new se ..."
Abstract

Cited by 35 (17 self)
 Add to MetaCart
(Show Context)
Abstract. Text search engines return a set of k documents ranked by similarity to a query. Typically, documents and queries are drawn from natural language text, which can readily be partitioned into words, allowing optimizations of data structures and algorithms for ranking. However, in many new search domains (DNA, multimedia, OCR texts, Far East languages) there is often no obvious definition of words and traditional indexing approaches are not so easily adapted, or break down entirely. We present two new algorithms for ranking documents against a query without making any assumptions on the structure of the underlying text. We build on existing theoretical techniques, which we have implemented and compared empirically with new approaches introduced in this paper. Our best approach is significantly faster than existing methods in RAM, and is even three times faster than a stateoftheart inverted file implementation for English text when word queries are issued. 1
Wavelet Trees for All
"... The wavelet tree is a versatile data structure that serves a number of purposes, from string processing to geometry. It can be regarded as a device that represents a sequence, a reordering, or a grid of points. In addition, its space adapts to various entropy measures of the data it encodes, enabli ..."
Abstract

Cited by 33 (13 self)
 Add to MetaCart
(Show Context)
The wavelet tree is a versatile data structure that serves a number of purposes, from string processing to geometry. It can be regarded as a device that represents a sequence, a reordering, or a grid of points. In addition, its space adapts to various entropy measures of the data it encodes, enabling compressed representations. New competitive solutions to a number of problems, based on wavelet trees, are appearing every year. In this survey we give an overview of wavelet trees and the surprising number of applications in which we have found them useful: basic and weighted point grids, sets of rectangles, strings, permutations, binary relations, graphs, inverted indexes, document retrieval indexes, fulltext indexes, XML indexes, and general numeric sequences.
Colored Range Queries and Document Retrieval
"... Colored range queries are a wellstudied topic in computational geometry and database research that, in the past decade, have found exciting applications in information retrieval. In this paper we give improved time and space bounds for three important onedimensional colored range queries — colore ..."
Abstract

Cited by 32 (18 self)
 Add to MetaCart
(Show Context)
Colored range queries are a wellstudied topic in computational geometry and database research that, in the past decade, have found exciting applications in information retrieval. In this paper we give improved time and space bounds for three important onedimensional colored range queries — colored range listing, colored range topk queries and colored range counting — and, thus, new bounds for various document retrieval problems on general collections of sequences. Specifically, we first describe a framework including almost all recent results on colored range listing and document listing, which suggests new combinations of data structures for these problems. For example, we give the fastest compressed data structures for colored range listing and document listing, and an efficient data structure for document listing whose size is bounded in terms of the highorder entropies of the library of documents. We then show how (approximate) colored topk queries can be reduced to (approximate) rangemode queries on subsequences, yielding the first efficient data structure for this problem. Finally, we show how a modified wavelet tree can support colored range counting in logarithmic time and space that is succinct whenever the number of colors is superpolylogarithmic in the length of the sequence.
Topk document retrieval in optimal time and linear space
 In Proc. 22nd Annual ACMSIAM Symposium on Discrete Algorithms (SODA 2012
, 2012
"... We describe a data structure that uses O(n)word space and reports k most relevant documents that contain a query pattern P in optimal O(P  + k) time. Our construction supports an ample set of important relevance measures, such as the frequency of P in a document and the minimal distance between t ..."
Abstract

Cited by 29 (17 self)
 Add to MetaCart
(Show Context)
We describe a data structure that uses O(n)word space and reports k most relevant documents that contain a query pattern P in optimal O(P  + k) time. Our construction supports an ample set of important relevance measures, such as the frequency of P in a document and the minimal distance between two occurrences of P in a document. We show how to reduce the space of the data structure from O(n log n) to O(n(log σ+log D+log log n)) bits, where σ is the alphabet size and D is the total number of documents. 1
New algorithms on wavelet trees and applications to information retrieval
 Theoretical Computer Science
, 2012
"... ar ..."
(Show Context)
New lower and upper bounds for representing sequences
 CoRR
"... Abstract. Sequence representations supporting queries access, select and rank are at the core of many data structures. There is a considerable gap between different upper bounds, and the few lower bounds, known for such representations, and how they interact with the space used. In this article we p ..."
Abstract

Cited by 22 (14 self)
 Add to MetaCart
(Show Context)
Abstract. Sequence representations supporting queries access, select and rank are at the core of many data structures. There is a considerable gap between different upper bounds, and the few lower bounds, known for such representations, and how they interact with the space used. In this article we prove a strong lower bound for rank, which holds for rather permissive assumptions on the space used, and give matching upper bounds that require only a compressed representation of the sequence. Within this compressed space, operations access and select can be solved within almostconstant time. 1