Results 1 - 10
of
18
Colored Range Queries and Document Retrieval
"... Colored range queries are a well-studied topic in computational geometry and database research that, in the past decade, have found exciting applications in information retrieval. In this paper we give improved time and space bounds for three important one-dimensional colored range queries — colore ..."
Abstract
-
Cited by 31 (18 self)
- Add to MetaCart
Colored range queries are a well-studied topic in computational geometry and database research that, in the past decade, have found exciting applications in information retrieval. In this paper we give improved time and space bounds for three important one-dimensional colored range queries — colored range listing, colored range top-k queries and colored range counting — and, thus, new bounds for various document retrieval problems on general collections of sequences. Specifically, we first describe a framework including almost all recent results on colored range listing and document listing, which suggests new combinations of data structures for these problems. For example, we give the fastest compressed data structures for colored range listing and document listing, and an efficient data structure for document listing whose size is bounded in terms of the high-order entropies of the library of documents. We then show how (approximate) colored top-k queries can be reduced to (approximate) range-mode queries on subsequences, yielding the first efficient data structure for this problem. Finally, we show how a modified wavelet tree can support colored range counting in logarithmic time and space that is succinct whenever the number of colors is superpolylogarithmic in the length of the sequence.
Spaces, trees and colors: The algorithmic landscape of document retrieval on sequences
- CoRR
"... Document retrieval is one of the best established information retrieval activities since the sixties, pervading all search engines. Its aim is to obtain, from a collection of text documents, those most relevant to a pattern query. Current technology is mostly oriented to “natural language” text coll ..."
Abstract
-
Cited by 14 (9 self)
- Add to MetaCart
Document retrieval is one of the best established information retrieval activities since the sixties, pervading all search engines. Its aim is to obtain, from a collection of text documents, those most relevant to a pattern query. Current technology is mostly oriented to “natural language” text collections, where inverted indexes are the preferred solution. As successful as this paradigm has been, it fails to properly handle various East Asian languages and other scenarios where the “natural language ” assumptions do not hold. In this survey we cover the recent research in extending the document retrieval techniques to a broader class of sequence collections, which has applications in bioinformatics, data and Web mining, chemoinformatics, software engineering, multimedia information retrieval, and many other fields. We focus on the algorithmic aspects of the techniques, uncovering a rich world of relations between document retrieval challenges and fundamental problems on trees, strings, range queries, discrete geometry, and other areas.
Space-Efficient Data-Analysis Queries on Grids
"... We consider various data-analysis queries on two-dimensional points. We give new space/time tradeoffs over previous work on semigroup and group queries such as sum, average, variance, minimum and maximum. We also introduce new solutions to queries rarely considered in the literature such as two-dime ..."
Abstract
-
Cited by 13 (9 self)
- Add to MetaCart
(Show Context)
We consider various data-analysis queries on two-dimensional points. We give new space/time tradeoffs over previous work on semigroup and group queries such as sum, average, variance, minimum and maximum. We also introduce new solutions to queries rarely considered in the literature such as two-dimensional quantiles, majorities, successor/predecessor and mode queries. We face static and dynamic scenarios.
Range Majority in Constant Time and Linear Space
, 2011
"... Given an array A of size n, we consider the problem of answering range majority queries: given a query range [i..j] where 1 ≤ i ≤ j ≤ n, return the majority element of the subarray A[i..j] if it exists. We describe a linear space data structure that answers range majority queries in constant time. W ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
(Show Context)
Given an array A of size n, we consider the problem of answering range majority queries: given a query range [i..j] where 1 ≤ i ≤ j ≤ n, return the majority element of the subarray A[i..j] if it exists. We describe a linear space data structure that answers range majority queries in constant time. We further generalize this problem by defining range α-majority queries: given a query range [i..j], return all the elements in the subarray A[i..j] with frequency greater than α(j − i + 1). We prove an upper bound on the number of α-majorities that can exist in a subarray, assuming that query ranges are restricted to be larger than a given threshold. Using this upper bound, we generalize our range majority data structure to answer range α-majority queries in O ( 1α) time using O(n lg ( 1α + 1)) space, for any fixed α ∈ (0, 1). This result is interesting since other similar range query problems based on frequency have nearly logarithmic lower bounds on query time when restricted to linear space.
Linear-space data structures for range minority query in arrays
- IN PROCEEDINGS OF THE 13TH SCANDINAVIAN SYMPOSIUM AND WORKSHOPS ON ALGORITHM THEORY (SWAT
, 2012
"... We consider range queries in arrays that search for low-frequency elements: least frequent elements and α-minorities. An α-minority of a query range has multiplicity no greater than an α fraction of the elements in the range. Our data structure for the least frequent element range query problem requ ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
We consider range queries in arrays that search for low-frequency elements: least frequent elements and α-minorities. An α-minority of a query range has multiplicity no greater than an α fraction of the elements in the range. Our data structure for the least frequent element range query problem requires O(n) space, O(n 3/2) preprocessing time, and O ( √ n) query time. A reduction from boolean matrix multiplication to this problem shows the hardness of simultaneous improvements in both preprocessing time and query time. Our data structure for the α-minority range query problem requires O(n) space, supports queries in O(1/α) time, and allows α to be specified at query time.
Better Space Bounds for Parameterized Range Majority and Minority
"... Abstract. Karpinski and Nekrich (2008) introduced the problem of parameterized range majority, which asks to preprocess a string of length n such that, given the endpoints of a range, one can quickly find all the distinct elements whose relative frequencies in that range are more than a threshold τ. ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
(Show Context)
Abstract. Karpinski and Nekrich (2008) introduced the problem of parameterized range majority, which asks to preprocess a string of length n such that, given the endpoints of a range, one can quickly find all the distinct elements whose relative frequencies in that range are more than a threshold τ. Subsequent authors have reduced their time and space bounds such that, when τ is given at preprocessing time, we need either O(n lg(1/τ)) space and optimal O(1/τ) query time or linear space and O((1/τ) lg lg σ) query time, where σ is the alphabet size. In this paper we give the first linear-space solution with optimal O(1/τ) query time. For the case when τ is given at query time, we significantly improve previous bounds, achieving either O(n lg lg σ) space and optimal O(1/τ) query time or compressed space and O ( (1/τ) lg lg(1/τ) query time. Along the lg lg n way, we consider the complementary problem of parameterized range minority that was recently introduced by Chan et al. (2012), who achieved linear space and O(1/τ) query time even for variable τ. We improve their solution to use either nearly optimally compressed space with no slowdown, or optimally compressed space with nearly no slowdown. Some of our intermediate results, such as density-sensitive query time for onedimensional range counting, may be of independent interest. 1
Linear-space data structures for range frequency queries on arrays and trees
- In Proc. MFCS, volume 8087 of LNCS
, 2013
"... Abstract. We present O(n)-space data structures to support various range frequency queries on a given array A[0: n − 1] or tree T with n nodes. Given a query consisting of an arbitrary pair of pre-order rank in-dices (i, j), our data structures return a least frequent element, mode, or α-minority of ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
(Show Context)
Abstract. We present O(n)-space data structures to support various range frequency queries on a given array A[0: n − 1] or tree T with n nodes. Given a query consisting of an arbitrary pair of pre-order rank in-dices (i, j), our data structures return a least frequent element, mode, or α-minority of the multiset of elements in the unique path with endpoints at indices i and j in A or T. We describe a data structure that sup-ports range least frequent element queries on arrays in O( n/w) time, improving the Θ( n) worst-case time required by the data structure of Chan et al. (SWAT 2012), where w ∈ Ω(logn) is the word size in bits. We describe a data structure that supports range mode queries on trees in O(log log n n/w) time, improving the Θ( n logn) worst-case time required by the data structure of Krizanc et al. (ISAAC 2003). Finally, we describe a data structure that supports range α-minority queries on trees in O(α−1 log log n) time, where α ∈ [0, 1] can be specified at query time. 1
A simple linear-space data structure for constant-time range minimum query
- In: Proc./ Conference on Space Efficient Data Structures, Streams and Algorithms, LNCS
, 2013
"... Abstract. We revisit the range minimum query problem and present a new O(n)-space data structure that supports queries in O(1) time. Although previous data structures exist whose asymptotic bounds match ours, our goal is to introduce a new solution that is simple, intuitive, and practical without in ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
(Show Context)
Abstract. We revisit the range minimum query problem and present a new O(n)-space data structure that supports queries in O(1) time. Although previous data structures exist whose asymptotic bounds match ours, our goal is to introduce a new solution that is simple, intuitive, and practical without increasing asymptotic costs for query time or space.
Encodings for Range Majority Queries
, 2014
"... We face the problem of designing a data structure that can report the majority within any range of an array A[1, n], without storing A. We show that Ω(n) bits are necessary for such a data structure, and design a structure using O(n log ∗ n) bits that answers majority queries in O(log n) time. We ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We face the problem of designing a data structure that can report the majority within any range of an array A[1, n], without storing A. We show that Ω(n) bits are necessary for such a data structure, and design a structure using O(n log ∗ n) bits that answers majority queries in O(log n) time. We extend our results to τ-majorities.
On optimal top-k string retrieval
, 2012
"... Let D = {d1, d2, d3,..., dD} be a given set of D (string) docu-ments of total length n. The top-k document retrieval prob-lem is to index D such that when a pattern P of length p, and a parameter k come as a query, the index returns the k most relevant documents to the pattern P. Hon et. al. [13] ga ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Let D = {d1, d2, d3,..., dD} be a given set of D (string) docu-ments of total length n. The top-k document retrieval prob-lem is to index D such that when a pattern P of length p, and a parameter k come as a query, the index returns the k most relevant documents to the pattern P. Hon et. al. [13] gave the first linear space framework to solve this problem in O(p + k log k) time. This was improved by Navarro and Nekrich [23] to O(p+ k). These results are powerful enough to support arbitrary relevance functions like frequency, prox-imity, PageRank, etc. In many applications like desktop or email search, the data resides on disk and hence disk-bound indexes are needed. Despite of continued progress on this problem in terms of theoretical, practical and compression aspects, any non-trivial bounds in external memory model have so far been elusive. Internal memory (or RAM) so-lution to this problem decomposes the problem into O(p) subproblems and thus incurs the additive factor of O(p). In external memory, these approaches will lead to O(p) I/Os instead of optimal O(p/B) I/O term where B is the block-size. We re-interpret the problem independent of p, as inter-val stabbing with priority over tree-shaped structure. This leads us to a linear space index in external memory sup-porting top-k queries (with unsorted outputs) in near op-timal O(p/B + logB n + log (h) n + k/B) I/Os for any con-stant h1. Then we get O(n log ∗ n) space index with optimal O(p/B+logB n+k/B) I/Os. As a corollary, we also show the result in RAM which allows sorted order retrieval in O(k) time, if the locus of pattern match is provided in advance. This gives optimal performance in many applications where finding the locus of pattern in a suffix tree can be done much faster than usual O(p).