Results 1  10
of
13
Range quantile queries: Another virtue of wavelet trees
 In Proc. 16th SPIRE, LNCS 5721
, 2009
"... Abstract. We show how to use a balanced wavelet tree as a data structure that stores a list of numbers and supports efficient range quantile queries. A range quantile query takes a rank and the endpoints of a sublist and returns the number with that rank in that sublist. For example, if the rank is ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
Abstract. We show how to use a balanced wavelet tree as a data structure that stores a list of numbers and supports efficient range quantile queries. A range quantile query takes a rank and the endpoints of a sublist and returns the number with that rank in that sublist. For example, if the rank is half the sublist’s length, then the query returns the sublist’s median. We also show how these queries can be used to support spaceefficient coloured range reporting and document listing. 1
Approximate Range Mode and Range Median Queries
 In Proceedings of the 22nd Symposium on Theoretical Aspects of Computer Science (STACS
, 2005
"... ABSTRACT. Mode and median are two of the most important statistics we use when we analyze data. In this paper, we consider data structures and algorithms for preprocessing a labelled list of length n so that, for any given i and j we can answer queries of the form: What is the mode or median label i ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
ABSTRACT. Mode and median are two of the most important statistics we use when we analyze data. In this paper, we consider data structures and algorithms for preprocessing a labelled list of length n so that, for any given i and j we can answer queries of the form: What is the mode or median label in the sequence of labels between indices i and j. Our results are on an approximate version of this problem. Using O(n/(1 − α)) space, our data structure can find in O(log log 1 n) time an element whose number α of occurrences is at least α times of that of the mode, for some userspecified parameter 0 < α < 1. Data structures are proposed to achieve constant query time for α = 1/2, 1/3 and 1/4, using storage space of n log n, n log log n and n, respectively. We also show that if the elements are comparable, an O(n/(1−α)) space, O(1) query time data structure can answer range median queries with a guaranteed accuracy of α × ⌊j − i + 1/2⌋. 1
Range Selection and Median: Tight Cell Probe Lower Bounds and Adaptive Data Structures
"... Range selection is the problem of preprocessing an input array A of n unique integers, such that given a query (i, j, k), one can report the k’th smallest integer in the subarray A[i], A[i + 1],..., A[j]. In this paper we consider static data structures in the wordRAM for range selection and severa ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
Range selection is the problem of preprocessing an input array A of n unique integers, such that given a query (i, j, k), one can report the k’th smallest integer in the subarray A[i], A[i + 1],..., A[j]. In this paper we consider static data structures in the wordRAM for range selection and several natural special cases thereof. The first special case is known as range median, which arises when k is fixed to ⌊(j − i + 1)/2⌋. The second case, denoted prefix selection, arises when i is fixed to 0. Finally, we also consider the bounded rank prefix selection problem and the fixed rank range selection problem. In the former, data structures must support prefix selection queries under the assumption that k ≤ κ for some value κ ≤ n given at construction time, while in the latter, data structures must support range selection queries where k is fixed beforehand for all queries. We prove cell probe lower bounds for range selection, prefix selection and range median, stating that any data structure that uses S words of space needs Ω(log n / log(Sw/n)) time to answer a query. In particular, any data structure that uses n log O(1) n space needs Ω(log n / log log n) time to answer a query, and any data structure that supports queries in constant time, needs n 1+Ω(1) space. For data structures that uses n log O(1) n space this matches the best known upper bound. Additionally, we present a linear space data structure that supports range selection queries in O(log k / log log n + log log n) time. Finally, we prove that any data structure that uses S space, needs Ω(log κ / log(Sw/n)) time to answer a bounded rank prefix selection query and Ω(log k / log(Sw/n)) time to answer a fixed rank range selection query. This shows that our data structure is optimal except for small values of k. 1
Range Medians
"... Abstract. We study a generalization of the classical median finding problem to batched query case: given an array of unsorted n items and k (not necessarily disjoint) intervals in the array, the goal is to determine the median in each of the intervals in the array. We give an algorithm that uses O(n ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Abstract. We study a generalization of the classical median finding problem to batched query case: given an array of unsorted n items and k (not necessarily disjoint) intervals in the array, the goal is to determine the median in each of the intervals in the array. We give an algorithm that uses O(n log k + k log k log n) comparisons and show a lower bound of Ω(n log k) comparisons for this problem. This is optimal for k = O(n / log n). 1
Counting Inversions, Offline Orthogonal Range Counting, and Related Problems
"... We give an O(n √ lg n)time algorithm for counting the number of inversions in a permutation on n elements. This improves a longstanding previous bound of O(n lg n / lg lg n) that followed from Dietz’s data structure [WADS’89], and answers a question of Andersson and Petersson [SODA’95]. As Dietz’s ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
We give an O(n √ lg n)time algorithm for counting the number of inversions in a permutation on n elements. This improves a longstanding previous bound of O(n lg n / lg lg n) that followed from Dietz’s data structure [WADS’89], and answers a question of Andersson and Petersson [SODA’95]. As Dietz’s result is known to be optimal for the related dynamic rank problem, our result demonstrates a significant improvement in the offline setting. Our new technique is quite simple: we perform a “vertical partitioning ” of a trie (akin to van Emde Boas trees), and use ideas from external memory. However, the technique finds numerous applications: for example, we obtain • in d dimensions, an algorithm to answer n offline orthogonal range counting queries in time O(n lg d−2+1/d n); • an improved construction time for online data structures for orthogonal range counting; • an improved update time for the partial sums problem; • faster Word RAM algorithms for finding the maximum depth in an arrangement of axisaligned rectangles, and for the slope selection problem. As a bonus, we also give a simple (1 + ε)approximation algorithm for counting inversions that runs in linear time, improving the previous O(n lg lg n) bound by Andersson and Petersson.
LinearSpace Data Structures for Range Mode Query in Arrays ∗
"... A mode of a multiset S is an element a ∈ S of maximum multiplicity; that is, a occurs at least as frequently as any other element in S. Given an array A[1: n] of n elements, we consider a basic problem: constructing a static data structure that efficiently answers range mode queries on A. Each query ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
A mode of a multiset S is an element a ∈ S of maximum multiplicity; that is, a occurs at least as frequently as any other element in S. Given an array A[1: n] of n elements, we consider a basic problem: constructing a static data structure that efficiently answers range mode queries on A. Each query consists of an input pair of indices (i, j) for which a mode of A[i: j] must be returned. The best previous data structure with linear space, by Krizanc, Morin, and Smid (ISAAC 2003), requires O ( √ n log log n) query time. We improve their result and present an O(n)space data structure that supports range mode queries in O ( p n / log n) worstcase time. Furthermore, we present strong evidence that a query time significantly below √ n cannot be achieved by purely combinatorial techniques; we show that boolean matrix multiplication of two √ n × √ n matrices reduces to n range mode queries in an array of size O(n). Additionally, we give linearspace data structures for orthogonal range mode in higher dimensions (queries in near O(n 1−1/2d) time) and for halfspace range mode in higher dimensions (queries in O(n 1−1/d2) time).
Data structures for rangeaggregate extent queries
 In Proc. 20th CCCG
, 2008
"... A fundamental and wellstudied problem in computational geometry is range searching, where the goal is to preprocess a set, S, of geometric objects (e.g., points in the plane) so that the subset S ′ ⊆ S that is contained in a query range (e.g., an axesparallel rectangle) can be reported efficientl ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
A fundamental and wellstudied problem in computational geometry is range searching, where the goal is to preprocess a set, S, of geometric objects (e.g., points in the plane) so that the subset S ′ ⊆ S that is contained in a query range (e.g., an axesparallel rectangle) can be reported efficiently. However, in many situations, what is of interest is to generate a more informative “summary ” of the output, obtained by applying a suitable aggregation function on S ′. Examples of such aggregation functions include count, sum, min, max, mean, median, mode, and topk that are usually computed on a set of weights defined suitably on the objects. Such rangeaggregate query problems have been the subject of much recent research in both the database and the computational geometry communities. In this paper, we further generalize this line of work by considering aggregation functions on pointsets that measure the extent or “spread ” of the objects in the retrieved set S ′. The functions considered here include closest pair, diameter, and width. The challenge here is that these aggregation functions (unlike, say, count) are not efficiently decomposable in the sense that the answer to S ′ cannot be inferred easily from answers to subsets that induce a partition
SignificantPresence Range Queries in Categorical Data
 In WADS’03, LNCS 2748
, 2004
"... In traditional colored rangesearching problems, one wants to store a set of n objects with m distinct colors for the following queries: report all colors such that there is at least one object of that color intersecting the query range. Such an object, however, could be an `outlier' in its color ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In traditional colored rangesearching problems, one wants to store a set of n objects with m distinct colors for the following queries: report all colors such that there is at least one object of that color intersecting the query range. Such an object, however, could be an `outlier' in its color class. Therefore we consider a variant of this problem where one has to report only those colors such that at least a fraction # of the objects of that color intersects the query range, for some parameter # . Our main results are on an approximate version of this problem, where we are also allowed to report those colors for which a fraction (1 #)# intersects the query range, for some fixed # > 0. We present e#cient data structures for such queries with orthogonal query ranges in sets of colored points, and for point stabbing queries in sets of colored rectangles.
New Algorithms on Wavelet Trees and Applications to Information Retrieval 1
"... Wavelet trees are widely used in the representation of sequences, permutations, text collections, binary relations, discrete points, and other succinct data structures. We show, however, that this still falls short of exploiting all of the virtues of this versatile data structure. In particular we s ..."
Abstract
 Add to MetaCart
Wavelet trees are widely used in the representation of sequences, permutations, text collections, binary relations, discrete points, and other succinct data structures. We show, however, that this still falls short of exploiting all of the virtues of this versatile data structure. In particular we show how to use wavelet trees to solve fundamental algorithmic problems such as range quantile queries, range next value queries, and range intersection queries. We explore several applications of these queries in Information Retrieval, in particular document retrieval in hierarchical and temporal documents, and in the representation of inverted lists.
Another Virtue of Wavelet Trees
, 903
"... Abstract. We show how to use a balanced wavelet tree as a data structure that stores a list of numbers and supports efficient range selection queries. A range selection query takes a rank and the endpoints of a sublist and returns the number with that rank in that sublist. For example, if the rank i ..."
Abstract
 Add to MetaCart
Abstract. We show how to use a balanced wavelet tree as a data structure that stores a list of numbers and supports efficient range selection queries. A range selection query takes a rank and the endpoints of a sublist and returns the number with that rank in that sublist. For example, if the rank is half the sublist’s length, then the query returns the sublist’s median.