## NEFOS: Rapid Cache-Aware Range Query Processing with Probabilistic Guarantees

### BibTeX

@MISC{Sioutas_nefos:rapid,

author = {Spyros Sioutas and Ioannis Karydis and Yannis Manolopoulos and Yannis Theodoridis},

title = {NEFOS: Rapid Cache-Aware Range Query Processing with Probabilistic Guarantees},

year = {}

}

### OpenURL

### Abstract

Abstract. We present NEFOS (NEsted FOrest of balanced treeS), a new cache-aware indexing scheme that supports insertions and deletions in O(1) worst-case block transfers for rebalancing operations (given and update position) and searching in O(log B log n) expected block transfers, (B = disk block size and n = number of stored elements). The expected search bound holds with high probability for any (unknown) realistic input distribution. Our expected search bound constitutes an improvement over the O(log B log n) expected bound for search achieved by the ISB-tree (Interpolation Search B-tree), since the latter holds with high probability for the class of smooth only input distributions. We define any unknown distribution as realistic if the smoothness doesn’t appear in the whole data set, still it may appear locally in small spatial neighborhoods. This holds for a variety of real-life non-smooth distributions like skew, zipfian, powlaw, beta e.t.c.. The latter is also verified by an accompanying experimental study. Moreover, NEFOS is a B-parametrized concrete structure, which works for both I/O and RAM model, without any kind of transformation or adaptation. Also, it is the first time an expected sub-logarithmic bound for search operation was achieved for a broad family of non-smooth input distributions. Keywords: Data Structures, Data Management Algorithms. 1

### Citations

1703 | MapReduce: Simplified data processing on large clusters - Dean, Ghemawat - 2004 |

981 | B.: The R*-tree: an efficient and robust access method for points and rectangles
- Beckmann, Kriegel, et al.
- 1990
(Show Context)
Citation Context ...2] (right) We use a relatively small page size so that the number of nodes in an index simulates realistic situations, where the data set cardinality is higher. A similar methodology was also used in =-=[4]-=-. Fig. 9 and Fig. 10 depict the efficiency of NEFOS structure on searching for real spatial one-dimensional data. In particular, in Fig. 9 we measured the number of I/Os required for search operations... |

320 | External Memory Algorithms and Data Structures: Dealing with Massive Data
- Vitter
- 1981
(Show Context)
Citation Context ...a time). A large number of variants of the B-tree have been proposed since its appearance in order to improve its performance in practice for various applications — see the excellent survey by Vitter =-=[24]-=- for an extended accounting of these and other variants and their applications — to make it parallel for use in multi-disk environments [21], to tune it for concurrency and recovery purposes [14,22], ... |

234 | Algorithms for parallel memory I: Two level memories
- Vitter, Shriver
- 1994
(Show Context)
Citation Context ...0, pp. 62–77, 2011. c○ Springer-Verlag Berlin Heidelberg 2011NEFOS: Rapid Cache-Aware Range Query Processing 63 and widely used such models, namely the two-level memory hierarchy model introduced in =-=[2,25]-=-. In this model, the memory hierarchy consists of an internal (main) memory and an arbitrarily large external memory (disk) partitioned into blocks of size B. The data from the external to the main me... |

153 | Efficient locking for concurrent operations on B-trees
- Lehman, Yao
- 1981
(Show Context)
Citation Context ...tter [24] for an extended accounting of these and other variants and their applications — to make it parallel for use in multi-disk environments [21], to tune it for concurrency and recovery purposes =-=[14,22]-=-, to extend it to cover other than the original field [9], etc. Regarding the update operation, it should be noted that an update operation consists of three consecutive phases: a search phase (to loc... |

120 | The string B-tree: a new data structure for string search in external memory and its applications
- Ferragina, Grossi
- 1999
(Show Context)
Citation Context ...nts and their applications — to make it parallel for use in multi-disk environments [21], to tune it for concurrency and recovery purposes [14,22], to extend it to cover other than the original field =-=[9]-=-, etc. Regarding the update operation, it should be noted that an update operation consists of three consecutive phases: a search phase (to locate the place of the update), an element-updating phase (... |

86 |
Linear hashing: A new tool for file and table addressing
- LITWIN
- 1980
(Show Context)
Citation Context ...tions. External data structures related to our approach are those based on hashing [18,24]. The main representatives of external memory hashing methods include: extendible hashing [8], linear hashing =-=[16]-=-, and external perfect hashing [10]. These hashing schemes and their variants need O(1) expected block transfers for answering search queries, but they share various disadvantages when compared to our... |

56 | The priority r-tree: a practically efficient and worst-case optimal r-tree
- Arge, Berg, et al.
(Show Context)
Citation Context ...f Informatics, University of Piraeus, Greece ytheod@unipi.gr Abstract. We present NEFOS (NEsted FOrest of balanced treeS), a new cache-aware indexing scheme that supports insertions and deletions in O=-=(1)-=- worst-case block transfers for rebalancing operations (given and update position) and searching in O(log B log n) expected block transfers, (B= disk block size and n= number of stored elements). The ... |

51 |
Extendible hashing-A fast access method for dynamic files
- Fagin, Nievergelt, et al.
- 1979
(Show Context)
Citation Context ...mooth input distributions. External data structures related to our approach are those based on hashing [18,24]. The main representatives of external memory hashing methods include: extendible hashing =-=[8]-=-, linear hashing [16], and external perfect hashing [10]. These hashing schemes and their variants need O(1) expected block transfers for answering search queries, but they share various disadvantages... |

40 |
Practical minimal perfect hash functions for large databases
- Fox, Chen, et al.
- 1992
(Show Context)
Citation Context ...ated to our approach are those based on hashing [18,24]. The main representatives of external memory hashing methods include: extendible hashing [8], linear hashing [16], and external perfect hashing =-=[10]-=-. These hashing schemes and their variants need O(1) expected block transfers for answering search queries, but they share various disadvantages when compared to our structure: (i) they do not support... |

35 |
Organization of large ordered indexes
- Bayer, McCreight
- 1972
(Show Context)
Citation Context ... operation was achieved for a broad family of non-smooth input distributions. Keywords: Data Structures, Data Management Algorithms. 1 Introduction More than three decades after its invention, B-tree =-=[5]-=- and its variants remain the ubiquitous external memory data structure for indexing and organizing large data sets with numerous applications, especially in database systems. Its popularity is mainly ... |

29 |
Examining computational geometry, van Emde Boas trees, and hashing from the perspective of the fusion tree
- Willard
- 2000
(Show Context)
Citation Context ...If we parametrize B and choose such a small value (f.e. B=2) so as the whole structure can fit in main memory as well as replace each lazy Btree and the B-tree of LSI structure with q*-heap machinery =-=[27]-=-, then NEFOS becomes a data structure in RAM model with the same expected complexities w.h.p. for all operations, without any kind of transformation or adaptation. 3.2 Complexity Analysis We will focu... |

24 |
Advanced Database Indexing
- Manolopoulos, Theodoridis, et al.
- 2000
(Show Context)
Citation Context ...e an expected sub-logarithmic bound for search operation was achieved for a broad family of non-smooth input distributions. External data structures related to our approach are those based on hashing =-=[18,24]-=-. The main representatives of external memory hashing methods include: extendible hashing [8], linear hashing [16], and external perfect hashing [10]. These hashing schemes and their variants need O(1... |

24 |
Multi-disk B-trees
- Seeger, P-
- 1991
(Show Context)
Citation Context ...r various applications — see the excellent survey by Vitter [24] for an extended accounting of these and other variants and their applications — to make it parallel for use in multi-disk environments =-=[21]-=-, to tune it for concurrency and recovery purposes [14,22], to extend it to cover other than the original field [9], etc. Regarding the update operation, it should be noted that an update operation co... |

17 | R.: A constant update time finger search tree - Dietz, Raman - 1994 |

15 |
Eliminating Amortization: On Data Structures with Guaranteed Response Time
- Raman
- 1992
(Show Context)
Citation Context ...t. The deletion there may cause a fusion and the latter may propagate up the tree. The Lazy B-tree. The Lazy B-tree of [12] is a simple but non-trivial externalization of the techniques introduced in =-=[20]-=-. The first level consists of an ordinary B-tree, whereas the second one consists of buckets of size O(log 2 n), where n is approximately equal to the number of elements stored in the access method. T... |

13 |
Vitter J.S. “The input/output complexity of sorting and related problems
- Aggarwal
- 1988
(Show Context)
Citation Context ...stribution, each blue cluster node satisfies the smoothness property. For this reason, we organize each blue cluster node as an ISB-tree. In this case, T2(n) becomesasfollows: T2(n) =O(logB log Θ(n)) =-=(2)-=- As a result, the total processing time requires T (n) =T1(n)+T2(n) I/Osand the theorem follows: Theorem 5. Exact-match queries in the NEFOS structure require O(logB log n) I/Os for any realistic inpu... |

13 |
A.: Dynamic interpolation search
- Mehlhorn, Tsakalidis
- 1993
(Show Context)
Citation Context ...peration are O(log B n). The expected search bound was achieved by considering a rather general scenario of μ-random insertions and random deletions, where μ is a so-called smooth probability density =-=[3,19]-=-. An insertion is μ-random if the key to be inserted is drawn randomly with density function μ; a deletion is random if every key present in the data structure is equally likely to be deleted [13]. In... |

12 |
Searching unindexed and nonuniformly generated files in log log N time
- Willard
- 1985
(Show Context)
Citation Context ...tribution does not contain sharp peaks). Smooth distributions are a superset of uniform, bounded, and several non-uniform distributions (e.g., the class of regular distributions introduced by Willard =-=[26]-=-). In this paper, we present NEFOS (NEsted FOrest of balanced treeS), a new cache-aware indexing scheme that supports insertions and deletions in O(1) worst-case block transfers for rebalancing operat... |

11 | A new method for fast data searches with keys - Litwin, Lomet - 1987 |

10 |
C.: Dynamic interpolation search in o(log log n) time
- Andersson, Mattsson
- 1993
(Show Context)
Citation Context ...peration are O(log B n). The expected search bound was achieved by considering a rather general scenario of μ-random insertions and random deletions, where μ is a so-called smooth probability density =-=[3,19]-=-. An insertion is μ-random if the key to be inserted is drawn randomly with density function μ; a deletion is random if every key present in the data structure is equally likely to be deleted [13]. In... |

10 | Deletions that preserve randomness - Knuth - 1977 |

6 |
C.: Improved Bounds for Finger Search on a
- Kaporis, Makris, et al.
- 2003
(Show Context)
Citation Context ...iven. Proof. see [12]. The ISB-tree. The ISB-tree is a two-level data structure. The upper level is a non - straightforward externalization of the Static Interpolation Search Tree (SIST) presented in =-=[11]-=-. In the definition of the (f1,f2)-smooth densities [26,19], intuitively, function f1 partitions an arbitrary subinterval [c1,c3] ⊆ [a, b] intof1 equal parts, each of length c3−c1 1 = O( f1 f1 ); that... |

6 | ISB-tree: A new indexing scheme with efficient expected behaviour
- Kaporis, Makris, et al.
- 2005
(Show Context)
Citation Context ...te operation takes Θ(log B n) block transfers, even in the case where the update position (block within which the update will take place) is given. ISB-tree (Interpolation Search B-tree) presented in =-=[12]-=-, supports search operations in O(log B log n) expected block transfers with high probability (w.h.p.) for a large class of input distributions (including both uniform and non-uniform classes) describ... |

3 | M.H.: Balanced Search Tree with O(1) Worst-case Update Time - Levcopoulos, Overmars |

3 |
Performance of B+ tree concurrency algorithms
- Srinivasan, Carey
- 1993
(Show Context)
Citation Context ...tter [24] for an extended accounting of these and other variants and their applications — to make it parallel for use in multi-disk environments [21], to tune it for concurrency and recovery purposes =-=[14,22]-=-, to extend it to cover other than the original field [9], etc. Regarding the update operation, it should be noted that an update operation consists of three consecutive phases: a search phase (to loc... |

1 |
The R-tree Portal (2003), http://www.rtreeportal.org, [Tiger1] and [Tiger2] data sets
- Theodoridis
(Show Context)
Citation Context ...imensional data taken from a real-world spatial dataset “LA rivers and railways” [Tiger1] and “LA streets” [Tiger2], containing 128971 and 131461 M inimum Bounded Rectangles (MBRs), respectively; see =-=[23]-=-. The one-dimensional data are taken by the x- andy-projections of MRBs and the values in each axis are normalized in [0,10000]. For all experiments, the disk page size is set to 512 bytes, the length... |