Results 1  10
of
18
Colored Range Queries and Document Retrieval
"... Colored range queries are a wellstudied topic in computational geometry and database research that, in the past decade, have found exciting applications in information retrieval. In this paper we give improved time and space bounds for three important onedimensional colored range queries — colore ..."
Abstract

Cited by 31 (18 self)
 Add to MetaCart
Colored range queries are a wellstudied topic in computational geometry and database research that, in the past decade, have found exciting applications in information retrieval. In this paper we give improved time and space bounds for three important onedimensional colored range queries — colored range listing, colored range topk queries and colored range counting — and, thus, new bounds for various document retrieval problems on general collections of sequences. Specifically, we first describe a framework including almost all recent results on colored range listing and document listing, which suggests new combinations of data structures for these problems. For example, we give the fastest compressed data structures for colored range listing and document listing, and an efficient data structure for document listing whose size is bounded in terms of the highorder entropies of the library of documents. We then show how (approximate) colored topk queries can be reduced to (approximate) rangemode queries on subsequences, yielding the first efficient data structure for this problem. Finally, we show how a modified wavelet tree can support colored range counting in logarithmic time and space that is succinct whenever the number of colors is superpolylogarithmic in the length of the sequence.
Spaces, trees and colors: The algorithmic landscape of document retrieval on sequences
 CoRR
"... Document retrieval is one of the best established information retrieval activities since the sixties, pervading all search engines. Its aim is to obtain, from a collection of text documents, those most relevant to a pattern query. Current technology is mostly oriented to “natural language” text coll ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
Document retrieval is one of the best established information retrieval activities since the sixties, pervading all search engines. Its aim is to obtain, from a collection of text documents, those most relevant to a pattern query. Current technology is mostly oriented to “natural language” text collections, where inverted indexes are the preferred solution. As successful as this paradigm has been, it fails to properly handle various East Asian languages and other scenarios where the “natural language ” assumptions do not hold. In this survey we cover the recent research in extending the document retrieval techniques to a broader class of sequence collections, which has applications in bioinformatics, data and Web mining, chemoinformatics, software engineering, multimedia information retrieval, and many other fields. We focus on the algorithmic aspects of the techniques, uncovering a rich world of relations between document retrieval challenges and fundamental problems on trees, strings, range queries, discrete geometry, and other areas.
SpaceEfficient DataAnalysis Queries on Grids
"... We consider various dataanalysis queries on twodimensional points. We give new space/time tradeoffs over previous work on semigroup and group queries such as sum, average, variance, minimum and maximum. We also introduce new solutions to queries rarely considered in the literature such as twodime ..."
Abstract

Cited by 13 (9 self)
 Add to MetaCart
(Show Context)
We consider various dataanalysis queries on twodimensional points. We give new space/time tradeoffs over previous work on semigroup and group queries such as sum, average, variance, minimum and maximum. We also introduce new solutions to queries rarely considered in the literature such as twodimensional quantiles, majorities, successor/predecessor and mode queries. We face static and dynamic scenarios.
Range Majority in Constant Time and Linear Space
, 2011
"... Given an array A of size n, we consider the problem of answering range majority queries: given a query range [i..j] where 1 ≤ i ≤ j ≤ n, return the majority element of the subarray A[i..j] if it exists. We describe a linear space data structure that answers range majority queries in constant time. W ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
(Show Context)
Given an array A of size n, we consider the problem of answering range majority queries: given a query range [i..j] where 1 ≤ i ≤ j ≤ n, return the majority element of the subarray A[i..j] if it exists. We describe a linear space data structure that answers range majority queries in constant time. We further generalize this problem by defining range αmajority queries: given a query range [i..j], return all the elements in the subarray A[i..j] with frequency greater than α(j − i + 1). We prove an upper bound on the number of αmajorities that can exist in a subarray, assuming that query ranges are restricted to be larger than a given threshold. Using this upper bound, we generalize our range majority data structure to answer range αmajority queries in O ( 1α) time using O(n lg ( 1α + 1)) space, for any fixed α ∈ (0, 1). This result is interesting since other similar range query problems based on frequency have nearly logarithmic lower bounds on query time when restricted to linear space.
Linearspace data structures for range minority query in arrays
 IN PROCEEDINGS OF THE 13TH SCANDINAVIAN SYMPOSIUM AND WORKSHOPS ON ALGORITHM THEORY (SWAT
, 2012
"... We consider range queries in arrays that search for lowfrequency elements: least frequent elements and αminorities. An αminority of a query range has multiplicity no greater than an α fraction of the elements in the range. Our data structure for the least frequent element range query problem requ ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
We consider range queries in arrays that search for lowfrequency elements: least frequent elements and αminorities. An αminority of a query range has multiplicity no greater than an α fraction of the elements in the range. Our data structure for the least frequent element range query problem requires O(n) space, O(n 3/2) preprocessing time, and O ( √ n) query time. A reduction from boolean matrix multiplication to this problem shows the hardness of simultaneous improvements in both preprocessing time and query time. Our data structure for the αminority range query problem requires O(n) space, supports queries in O(1/α) time, and allows α to be specified at query time.
Better Space Bounds for Parameterized Range Majority and Minority
"... Abstract. Karpinski and Nekrich (2008) introduced the problem of parameterized range majority, which asks to preprocess a string of length n such that, given the endpoints of a range, one can quickly find all the distinct elements whose relative frequencies in that range are more than a threshold τ. ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
Abstract. Karpinski and Nekrich (2008) introduced the problem of parameterized range majority, which asks to preprocess a string of length n such that, given the endpoints of a range, one can quickly find all the distinct elements whose relative frequencies in that range are more than a threshold τ. Subsequent authors have reduced their time and space bounds such that, when τ is given at preprocessing time, we need either O(n lg(1/τ)) space and optimal O(1/τ) query time or linear space and O((1/τ) lg lg σ) query time, where σ is the alphabet size. In this paper we give the first linearspace solution with optimal O(1/τ) query time. For the case when τ is given at query time, we significantly improve previous bounds, achieving either O(n lg lg σ) space and optimal O(1/τ) query time or compressed space and O ( (1/τ) lg lg(1/τ) query time. Along the lg lg n way, we consider the complementary problem of parameterized range minority that was recently introduced by Chan et al. (2012), who achieved linear space and O(1/τ) query time even for variable τ. We improve their solution to use either nearly optimally compressed space with no slowdown, or optimally compressed space with nearly no slowdown. Some of our intermediate results, such as densitysensitive query time for onedimensional range counting, may be of independent interest. 1
Linearspace data structures for range frequency queries on arrays and trees
 In Proc. MFCS, volume 8087 of LNCS
, 2013
"... Abstract. We present O(n)space data structures to support various range frequency queries on a given array A[0: n − 1] or tree T with n nodes. Given a query consisting of an arbitrary pair of preorder rank indices (i, j), our data structures return a least frequent element, mode, or αminority of ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
Abstract. We present O(n)space data structures to support various range frequency queries on a given array A[0: n − 1] or tree T with n nodes. Given a query consisting of an arbitrary pair of preorder rank indices (i, j), our data structures return a least frequent element, mode, or αminority of the multiset of elements in the unique path with endpoints at indices i and j in A or T. We describe a data structure that supports range least frequent element queries on arrays in O( n/w) time, improving the Θ( n) worstcase time required by the data structure of Chan et al. (SWAT 2012), where w ∈ Ω(logn) is the word size in bits. We describe a data structure that supports range mode queries on trees in O(log log n n/w) time, improving the Θ( n logn) worstcase time required by the data structure of Krizanc et al. (ISAAC 2003). Finally, we describe a data structure that supports range αminority queries on trees in O(α−1 log log n) time, where α ∈ [0, 1] can be specified at query time. 1
A simple linearspace data structure for constanttime range minimum query
 In: Proc./ Conference on Space Efficient Data Structures, Streams and Algorithms, LNCS
, 2013
"... Abstract. We revisit the range minimum query problem and present a new O(n)space data structure that supports queries in O(1) time. Although previous data structures exist whose asymptotic bounds match ours, our goal is to introduce a new solution that is simple, intuitive, and practical without in ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
Abstract. We revisit the range minimum query problem and present a new O(n)space data structure that supports queries in O(1) time. Although previous data structures exist whose asymptotic bounds match ours, our goal is to introduce a new solution that is simple, intuitive, and practical without increasing asymptotic costs for query time or space.
Encodings for Range Majority Queries
, 2014
"... We face the problem of designing a data structure that can report the majority within any range of an array A[1, n], without storing A. We show that Ω(n) bits are necessary for such a data structure, and design a structure using O(n log ∗ n) bits that answers majority queries in O(log n) time. We ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We face the problem of designing a data structure that can report the majority within any range of an array A[1, n], without storing A. We show that Ω(n) bits are necessary for such a data structure, and design a structure using O(n log ∗ n) bits that answers majority queries in O(log n) time. We extend our results to τmajorities.
On optimal topk string retrieval
, 2012
"... Let D = {d1, d2, d3,..., dD} be a given set of D (string) documents of total length n. The topk document retrieval problem is to index D such that when a pattern P of length p, and a parameter k come as a query, the index returns the k most relevant documents to the pattern P. Hon et. al. [13] ga ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Let D = {d1, d2, d3,..., dD} be a given set of D (string) documents of total length n. The topk document retrieval problem is to index D such that when a pattern P of length p, and a parameter k come as a query, the index returns the k most relevant documents to the pattern P. Hon et. al. [13] gave the first linear space framework to solve this problem in O(p + k log k) time. This was improved by Navarro and Nekrich [23] to O(p+ k). These results are powerful enough to support arbitrary relevance functions like frequency, proximity, PageRank, etc. In many applications like desktop or email search, the data resides on disk and hence diskbound indexes are needed. Despite of continued progress on this problem in terms of theoretical, practical and compression aspects, any nontrivial bounds in external memory model have so far been elusive. Internal memory (or RAM) solution to this problem decomposes the problem into O(p) subproblems and thus incurs the additive factor of O(p). In external memory, these approaches will lead to O(p) I/Os instead of optimal O(p/B) I/O term where B is the blocksize. We reinterpret the problem independent of p, as interval stabbing with priority over treeshaped structure. This leads us to a linear space index in external memory supporting topk queries (with unsorted outputs) in near optimal O(p/B + logB n + log (h) n + k/B) I/Os for any constant h1. Then we get O(n log ∗ n) space index with optimal O(p/B+logB n+k/B) I/Os. As a corollary, we also show the result in RAM which allows sorted order retrieval in O(k) time, if the locus of pattern match is provided in advance. This gives optimal performance in many applications where finding the locus of pattern in a suffix tree can be done much faster than usual O(p).