## I/O-Efficient Data Structures for Colored Range and Prefix Reporting

Citations: | 3 - 2 self |

### BibTeX

@MISC{Larsen_i/o-efficientdata,

author = {Kasper Green Larsen and Rasmus Pagh},

title = {I/O-Efficient Data Structures for Colored Range and Prefix Reporting},

year = {}

}

### OpenURL

### Abstract

Motivated by information retrieval applications, we consider the one-dimensional colored range reporting problem in rank space. The goal is to build a static data structure for sets C1,..., Cm ⊆ {1,..., σ} that supports queries of the kind: Given indices a, b, report the set ⋃ a≤i≤b Ci. We study the problem in the I/O model, and show that there exists an optimal linear-space data structure that answers queries in O(1 + k/B) I/Os, where k denotes the output size and B the disk block size in words. In fact, we obtain the same bound for the harder problem of three-sided orthogonal range reporting. In this problem, we are to preprocess a set of n two-dimensional points in rank space, such that all points inside a query rectangle of the form [x1, x2]×(−∞, y] can be reported. The best previous bounds for this problem is either O(n lg 2 B n) space and O(1 + k/B) query I/Os, or O(n) space and O(lg (h) B n + k/B) query I/Os, where lg(h)

### Citations

8523 |
Introduction to Algorithms
- Cormen, Leiserson, et al.
- 2001
(Show Context)
Citation Context ...re we use the extra power of the model — all previous steps involve standard I/Os. 5 Open problems An obvious question is if our results for the I/O model can be extended to the cache-oblivious model =-=[18]-=-. Also, it would be interesting to investigate whether our results can be obtained with the indivisibility assumption, or if the problem separates the I/O model with and without the indivisibility ass... |

537 |
The input/output complexity of sorting and related problems
- Aggarwal, Vitter
- 1988
(Show Context)
Citation Context ...ed in the documents containing a term in some subtree. Again, this reduces to a colored 1D range reporting problem. 1.1 Model of Computation In this paper we study the above problems in the I/O-model =-=[6]-=- of computation. In this model, the input to a data structure problem is assumed too large to fit in main memory of the machine, and thus the data structure must reside on disk. The disk is assumed in... |

234 | Algorithms for parallel memory I: Two level memories
- Vitter, Shriver
- 1994
(Show Context)
Citation Context ...optimal bounds. We will use the notation top k(Σ) to denote the largest k elements of a set Σ (where top k(Σ) = Σ if |Σ| < k). Scatter-I/O model. We consider a special case of the parallel disk model =-=[28]-=- where there are B disks, and each block contains a single word. (Notice that we use B differently than one would for the parallel I/O model.) A single I/O operation thus consists of retrieving or wri... |

194 | Inverted Files for Text Search Engines
- Zobel, Moffat
(Show Context)
Citation Context ...ate are often resolved by computing a list of all documents satisfying it, and merging this list with similar lists for other predicates (e.g. inverted indexes). Recent overviews can be found in e.g. =-=[29, 12]-=-. To our best knowledge, existing solutions either require super-linear space (e.g. storing all answers) or report a multi-set, meaning that the same document may be reported many times if it has many... |

176 |
Priority Search Trees
- McCreight
- 1985
(Show Context)
Citation Context ... of B words, where a word is Θ(lg n) bits. 1.3 Related work The importance of three-sided range reporting is mirrored in the number of publications on the problem, see e.g. [10, 24] for the I/Omodel, =-=[22, 17]-=- for the pointer machine, [5, 9, 11, 3, 4] for the cache-oblivious and [8, 21, 14] for the word-RAM model. One of the main reason why the problem has seen so much attention stems from the fact that ra... |

145 |
Surpassing the information theoretic bound with fusion trees
- Fredman, Willard
- 1993
(Show Context)
Citation Context ...e our data structure in Section 2.2. 2.1 Preliminaries In this section, we briefly discuss two fundamental data structures that we make use of in our solutions, the Fusion Tree of Fredman and Willard =-=[16]-=- and a simple data structure that we refer to as the External Memory Priority Search Tree (EMPST). We note that the EM-PST has been used numerous times before as a basic building block, for instance i... |

94 | Cell broadband engine architecture and its first implementation – a performance view
- Chen, Raghavan, et al.
- 2007
(Show Context)
Citation Context ...ily in storage. To distinguish this from a normal I/O operation, we propose the notation sI/O (for scatter I/Os). This model abstracts (and idealizes) the memory model used by IBM’s Cell architecture =-=[15]-=-, which has been shown to alleviate memory bottlenecks for problems such as BFS [26] that are notoriously hard in the I/O model [23]. 4.1 Our data structure We construct a collection Sk consisting of ... |

79 | On TwoDimensional Indexability and Optimal Range Search Indexing
- Arge, Samoladas, et al.
- 1999
(Show Context)
Citation Context ...., each disk block consists of B words, where a word is Θ(lg n) bits. 1.3 Related work The importance of three-sided range reporting is mirrored in the number of publications on the problem, see e.g. =-=[10, 24]-=- for the I/Omodel, [22, 17] for the pointer machine, [5, 9, 11, 3, 4] for the cache-oblivious and [8, 21, 14] for the word-RAM model. One of the main reason why the problem has seen so much attention ... |

69 | New data structures for orthogonal range searching
- Alstrup, Brodal, et al.
- 2000
(Show Context)
Citation Context ...hree-sided range reporting is mirrored in the number of publications on the problem, see e.g. [10, 24] for the I/Omodel, [22, 17] for the pointer machine, [5, 9, 11, 3, 4] for the cache-oblivious and =-=[8, 21, 14]-=- for the word-RAM model. One of the main reason why the problem has seen so much attention stems from the fact that range searching with more than three sides no longer admits linear space data struct... |

47 | External-memory breadth-first search with sublinear I/O
- Mehlhorn, Meyer
- 2002
(Show Context)
Citation Context ...ts (and idealizes) the memory model used by IBM’s Cell architecture [15], which has been shown to alleviate memory bottlenecks for problems such as BFS [26] that are notoriously hard in the I/O model =-=[23]-=-. 4.1 Our data structure We construct a collection Sk consisting of prefixes of strings in S. For each p ∈ Sk we explicitly store the color set ck(p) = topk( ⋃ x∈S∩p∗ c(x)). Figure 1 shows part of a t... |

42 | Type Less, Find More: Fast Autocompletion Search with a Succinct Index
- Bast, Weber
- 2006
(Show Context)
Citation Context ...ate are often resolved by computing a list of all documents satisfying it, and merging this list with similar lists for other predicates (e.g. inverted indexes). Recent overviews can be found in e.g. =-=[29, 12]-=-. To our best knowledge, existing solutions either require super-linear space (e.g. storing all answers) or report a multi-set, meaning that the same document may be reported many times if it has many... |

31 | Efficient 3-D range searching in external memory
- Vengroff, Vitter
- 1996
(Show Context)
Citation Context ...imal O(1 + k/B) I/Os. All these data structures use only comparisons and indirect addressing. Higher-dimensional orthogonal range reporting has also received much attention in the I/O model, see e.g. =-=[27, 1, 2, 25]-=-. The best current data structures for orthogonal range reporting in d-dimensional space (d ≥ 3), where coordinates can only be compared, either answers queries in O(lgB n(lg n/ lg lgB n) d−2 + k/B) I... |

22 | Cache-oblivious data structures for orthogonal range searching
- Agarwal, Arge, et al.
- 2003
(Show Context)
Citation Context ...n) bits. 1.3 Related work The importance of three-sided range reporting is mirrored in the number of publications on the problem, see e.g. [10, 24] for the I/Omodel, [22, 17] for the pointer machine, =-=[5, 9, 11, 3, 4]-=- for the cache-oblivious and [8, 21, 14] for the word-RAM model. One of the main reason why the problem has seen so much attention stems from the fact that range searching with more than three sides n... |

17 | Optimal static range reporting in one dimension
- Alstrup, Brodal, et al.
- 2001
(Show Context)
Citation Context ... and a function c : S → 2 {1,...,σ}, support queries of the kind: Given a string p, report the set ⋃ x∈S∩p∗ c(x), where p∗ denotes the set of strings with prefix p. Building on work of Alstrup et al. =-=[7]-=-, Belazzougui et al. [13] have shown the following: Theorem 3.2. Given a collection S of n strings, there is a linear space data structure that, given a string p of length O(B), returns in O(1) I/Os: ... |

16 |
On dominance reporting in 3D
- Afshani
- 2008
(Show Context)
Citation Context ...imal O(1 + k/B) I/Os. All these data structures use only comparisons and indirect addressing. Higher-dimensional orthogonal range reporting has also received much attention in the I/O model, see e.g. =-=[27, 1, 2, 25]-=-. The best current data structures for orthogonal range reporting in d-dimensional space (d ≥ 3), where coordinates can only be compared, either answers queries in O(lgB n(lg n/ lg lgB n) d−2 + k/B) I... |

15 | Cache-oblivious planar orthogonal range searching and counting
- Arge, Danner, et al.
- 2005
(Show Context)
Citation Context ...n) bits. 1.3 Related work The importance of three-sided range reporting is mirrored in the number of publications on the problem, see e.g. [10, 24] for the I/Omodel, [22, 17] for the pointer machine, =-=[5, 9, 11, 3, 4]-=- for the cache-oblivious and [8, 21, 14] for the word-RAM model. One of the main reason why the problem has seen so much attention stems from the fact that range searching with more than three sides n... |

11 |
Efficient breadth-first search on the cell/be processor
- SCARPAZZA, VILLA, et al.
(Show Context)
Citation Context ...tion sI/O (for scatter I/Os). This model abstracts (and idealizes) the memory model used by IBM’s Cell architecture [15], which has been shown to alleviate memory bottlenecks for problems such as BFS =-=[26]-=- that are notoriously hard in the I/O model [23]. 4.1 Our data structure We construct a collection Sk consisting of prefixes of strings in S. For each p ∈ Sk we explicitly store the color set ck(p) = ... |

9 | Orthogonal range reporting in three and higher dimensions
- Afshani, Arge, et al.
- 2009
(Show Context)
Citation Context ...imal O(1 + k/B) I/Os. All these data structures use only comparisons and indirect addressing. Higher-dimensional orthogonal range reporting has also received much attention in the I/O model, see e.g. =-=[27, 1, 2, 25]-=-. The best current data structures for orthogonal range reporting in d-dimensional space (d ≥ 3), where coordinates can only be compared, either answers queries in O(lgB n(lg n/ lg lgB n) d−2 + k/B) I... |

9 | Computational geometry: generalized intersection searching
- Gupta, Janardan, et al.
(Show Context)
Citation Context ...a. In fact, in Section 2 we present an optimal solution to the harder and very well-studied three-sided orthogonal range reporting problem in two-dimensional rankspace, and then use a known reduction =-=[19]-=- to get the above result (Section 3). Given a set S of n points from the grid [n] × [n] = {1, . . . , n} × {1, . . . , n}, this problem asks to construct a data structure that is able to report all po... |

8 |
A log log n data structure for three-sided range queries
- Fries, Mehlhorn, et al.
- 1987
(Show Context)
Citation Context ... of B words, where a word is Θ(lg n) bits. 1.3 Related work The importance of three-sided range reporting is mirrored in the number of publications on the problem, see e.g. [10, 24] for the I/Omodel, =-=[22, 17]-=- for the pointer machine, [5, 9, 11, 3, 4] for the cache-oblivious and [8, 21, 14] for the word-RAM model. One of the main reason why the problem has seen so much attention stems from the fact that ra... |

4 | Cache-Oblivious Range Reporting with Optimal Queries Requires Superlinear Space
- Afshani, Hamilton, et al.
(Show Context)
Citation Context ...n) bits. 1.3 Related work The importance of three-sided range reporting is mirrored in the number of publications on the problem, see e.g. [10, 24] for the I/Omodel, [22, 17] for the pointer machine, =-=[5, 9, 11, 3, 4]-=- for the cache-oblivious and [8, 21, 14] for the word-RAM model. One of the main reason why the problem has seen so much attention stems from the fact that range searching with more than three sides n... |

4 | Simple and semi-dynamic structures for cache-oblivious planar orthogonal range searching
- Arge, Zeh
- 2006
(Show Context)
Citation Context |

2 | Dynamic 3-sided planar range queries with expected doubly logarithmic time
- Brodal, Kaporis, et al.
- 2009
(Show Context)
Citation Context ...hree-sided range reporting is mirrored in the number of publications on the problem, see e.g. [10, 24] for the I/Omodel, [22, 17] for the pointer machine, [5, 9, 11, 3, 4] for the cache-oblivious and =-=[8, 21, 14]-=- for the word-RAM model. One of the main reason why the problem has seen so much attention stems from the fact that range searching with more than three sides no longer admits linear space data struct... |

2 | Using hashing to solve the dictionary problem (in external memory
- Iacono, Pǎtra¸scu
- 2012
(Show Context)
Citation Context ...is provides for easier lower bounds, it should be clear when comparing to our results, that this approach might come at a cost of efficiency. Finally, we note that recent work by Iacono and Pǎtra¸scu =-=[20]-=- also focuses on obtaining stronger upper bounds (for dynamic dictionaries) in the I/O model by abandoning the indivisiblity assumption. 2 Three-Sided Orthogonal Range Reporting In this section we des... |

2 |
External memory range reporting on a grid
- Nekrich
- 2007
(Show Context)
Citation Context ...., each disk block consists of B words, where a word is Θ(lg n) bits. 1.3 Related work The importance of three-sided range reporting is mirrored in the number of publications on the problem, see e.g. =-=[10, 24]-=- for the I/Omodel, [22, 17] for the pointer machine, [5, 9, 11, 3, 4] for the cache-oblivious and [8, 21, 14] for the word-RAM model. One of the main reason why the problem has seen so much attention ... |

2 |
I/O-efficient point location in a set of rectangles
- Nekrich
- 2008
(Show Context)
Citation Context |

1 | Improved space bounds for cache-oblivious range reporting
- Afshani, Zeh
- 2011
(Show Context)
Citation Context |

1 | Fast prefix search in little space, with applications
- Belazzougui, Boldi, et al.
- 2010
(Show Context)
Citation Context ...2 {1,...,σ}, support queries of the kind: Given a string p, report the set ⋃ x∈S∩p∗ c(x), where p∗ denotes the set of strings with prefix p. Building on work of Alstrup et al. [7], Belazzougui et al. =-=[13]-=- have shown the following: Theorem 3.2. Given a collection S of n strings, there is a linear space data structure that, given a string p of length O(B), returns in O(1) I/Os: 1) The interval of ranks ... |

1 | Efficient processing of 3-sided range queries with probabilistic guarantees
- Kaporis, Papadopoulos, et al.
(Show Context)
Citation Context ...hree-sided range reporting is mirrored in the number of publications on the problem, see e.g. [10, 24] for the I/Omodel, [22, 17] for the pointer machine, [5, 9, 11, 3, 4] for the cache-oblivious and =-=[8, 21, 14]-=- for the word-RAM model. One of the main reason why the problem has seen so much attention stems from the fact that range searching with more than three sides no longer admits linear space data struct... |