Results 1 - 10
of
23
External Memory Data Structures
, 2001
"... In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worst-case efficient external memory dynami ..."
Abstract
-
Cited by 78 (34 self)
- Add to MetaCart
In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worst-case efficient external memory dynamic data structures. We also briefly discuss some of the most popular external data structures used in practice.
The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree
- SIGMOD 2004 JUNE 13-18, 2004, PARIS, FRANCE
, 2004
"... We present the Priority R-tree, or PR-tree, which is the first R-tree variant that always answers a window query using O((N/B) 1−1/d + T/B) I/Os, where N is the number of ddimensional (hyper-) rectangles stored in the R-tree, B is the disk block size, and T is the output size. This is provably asymp ..."
Abstract
-
Cited by 46 (5 self)
- Add to MetaCart
We present the Priority R-tree, or PR-tree, which is the first R-tree variant that always answers a window query using O((N/B) 1−1/d + T/B) I/Os, where N is the number of ddimensional (hyper-) rectangles stored in the R-tree, B is the disk block size, and T is the output size. This is provably asymptotically optimal and significantly better than other Rtree variants, where a query may visit all N/B leaves in the tree even when T = 0. We also present an extensive experimental study of the practical performance of the PR-tree using both real-life and synthetic data. This study shows that the PR-tree performs similar to the best known R-tree variants on real-life and relatively nicely distributed data, but outperforms them significantly on more extreme data.
Implementing I/O-Efficient Data Structures Using TPIE
- In Proc. European Symposium on Algorithms
, 2002
"... In recent years, many theoretically I/O-efficient algorithms and data structures have been developed. The TPIE project at Duke University was started to investigate the practical importance of these theoretical results. The goal of this ongoing project is to provide a portable, extensible, flexib ..."
Abstract
-
Cited by 28 (6 self)
- Add to MetaCart
In recent years, many theoretically I/O-efficient algorithms and data structures have been developed. The TPIE project at Duke University was started to investigate the practical importance of these theoretical results. The goal of this ongoing project is to provide a portable, extensible, flexible, and easy to use C++ programming environment for efficiently implementing I/O-algorithms and data structures. The TPIE library has been developed in two phases. The first phase focused on supporting algorithms with a sequential I/O pattern, while the recently developed second phase has focused on supporting on-line I/O-efficient data structures, which exhibit a more random I/O pattern. This paper describes the design and implementation of the second phase of TPIE.
Optimal External Memory Interval Management
, 2002
"... In this paper we present the external interval tree, an optimal external memory data structure for answering stabbing queries on a set of dynamically maintained intervals. The external interval tree can be used in an optimal solution to the dynamic interval management problem, which is a central pro ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
In this paper we present the external interval tree, an optimal external memory data structure for answering stabbing queries on a set of dynamically maintained intervals. The external interval tree can be used in an optimal solution to the dynamic interval management problem, which is a central problem for object-oriented and temporal databases and for constraint logic programming. Part of the structure uses a novel weight-balancing technique for efficient worst-case manipulation of balanced trees of independent interest. The external interval tree, as well at our new balancing technique, have recently been used to develop several efficient external data structures.
A Framework for Index Bulk Loading and Dynamization
, 2001
"... In this paper we investigate automated methods for externalizing internal memory data structures. We consider a class of balanced trees that we call weight-balanced partitioning trees (or wp-trees) for indexing a set of points in R d . Well-known examples of wp-trees include kd- trees, BBD-tre ..."
Abstract
-
Cited by 22 (14 self)
- Add to MetaCart
In this paper we investigate automated methods for externalizing internal memory data structures. We consider a class of balanced trees that we call weight-balanced partitioning trees (or wp-trees) for indexing a set of points in R d . Well-known examples of wp-trees include kd- trees, BBD-trees, pseudo-quad-trees, and BAR-trees. Given an efficient external wp-tree construction algorithm, we present a general framework for automatically obtaining a dynamic external data structure. Using this framework together with a new general construction (bulk loading) technique of independent interest, we obtain data structures with guaranteed good update performance in terms of I/O transfers. Our approach gives considerably improved construction and update I/O bounds for e.g. external kd-trees and BBD-trees.
Range Aggregate Processing in Spatial Databases
- TKDE
, 2004
"... Abstract—A range aggregate query returns summarized information about the points falling in a hyper-rectangle (e.g., the total number of these points instead of their concrete ids). This paper studies spatial indexes that solve such queries efficiently and proposes the aggregate Point-tree (aP-tree) ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
Abstract—A range aggregate query returns summarized information about the points falling in a hyper-rectangle (e.g., the total number of these points instead of their concrete ids). This paper studies spatial indexes that solve such queries efficiently and proposes the aggregate Point-tree (aP-tree), which achieves logarithmic cost to the data set cardinality (independently of the query size) for two-dimensional data. The aP-tree requires only small modifications to the popular multiversion structural framework and, thus, can be implemented and applied easily in practice. We also present models that accurately predict the space consumption and query cost of the aP-tree and are therefore suitable for query optimization. Extensive experiments confirm that the proposed methods are efficient and practical. Index Terms—Database, spatial database, range queries, aggregation. 1
I/O-efficient point location using persistent B-trees
- In Proc. Workshop on Algorithm Engineering and Experimentation
, 2003
"... Abstract We present an external planar point location data structure that is I/O-efficient both in theory and practice. The developed structure uses linear space and answers a query in optimal O(logB N) I/Os, where B is the disk block size. It is based on a persistent B-tree, and all previously deve ..."
Abstract
-
Cited by 13 (8 self)
- Add to MetaCart
Abstract We present an external planar point location data structure that is I/O-efficient both in theory and practice. The developed structure uses linear space and answers a query in optimal O(logB N) I/Os, where B is the disk block size. It is based on a persistent B-tree, and all previously developed such structures assume a total order on the elements in the structure. As a theoretical result of independent interest, we show how to remove this assumption. Most previous theoretical I/O-efficient planer point location structures are relatively complicated and have not been implemented. Based on a bucket approach, Vahrenhold and Hinrichs therefore developed a simple and practical, but theoretically non-optimal, heuristic structure. We present an extensive experimental evaluation that shows that on a range of real-world Geographic Information Systems (GIS) data, our structure uses fewer I/Os than the structure of Vahrenhold and Hinrichs to answer a query. On a synthetically generated worst-case dataset, our structure uses significantly fewer I/Os. 1 Introduction The planar point location problem is the problem ofstoring a planar subdivision defined by N segmentssuch that the region containing a query point
Cache-Oblivious R-Trees
, 2005
"... We develop a cache-oblivious data structure for storing a set S of N axis-aligned rectangles in the plane, such that all rectangles in S intersecting a query rectangle or point can be found efficiently. Our structure is an axis-aligned bounding-box hierarchy and as such it is the first cache-oblivio ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
We develop a cache-oblivious data structure for storing a set S of N axis-aligned rectangles in the plane, such that all rectangles in S intersecting a query rectangle or point can be found efficiently. Our structure is an axis-aligned bounding-box hierarchy and as such it is the first cache-oblivious R-tree with provable performance guarantees. If no point in the plane is contained in B or more rectangles in S, the structure answers a rectangle query using O(\sqrt{N/B} + T/B) memory transfers and a point query using O((N/B)^ε) memory transfers for any ε>0, where B is the block size of memory transfers between any two levels of a multilevel memory hierarchy. We also develop a variant of our structure that achieves the same performance on input sets with arbitrary overlap among the rectangles. The rectangle query bound matches the bound of the best known linear-space cache-aware structure.
CRB-Tree: An Efficient Indexing Scheme for Range Aggregate Queries
- IN PROC. INTERNATIONAL CONFERENCE ON DATABASE THEORY
, 2003
"... We propose a new indexing scheme, called the CRB-tree, for efficiently answering range-aggregate queries. The range-aggregate problem is defined as follows: Given a set of weighted points in R , compute the aggregate of weights of points that lie inside a d-dimensional query rectangle. In this ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
We propose a new indexing scheme, called the CRB-tree, for efficiently answering range-aggregate queries. The range-aggregate problem is defined as follows: Given a set of weighted points in R , compute the aggregate of weights of points that lie inside a d-dimensional query rectangle. In this paper we focus on COUNT, SUM, AVG aggregates. First, we develop an indexing scheme for answering twodimensional range-COUNT queries that uses O(N=B) disk blocks and answers a query in O(log B N) I/Os, where N is the number of input points and B is the disk block size. This is the first optimal index structure for the 2D range-COUNT problem. The index can be extended to obtain a near-linearsize indexing structure for answering range-SUM queries using O(log B N) I/Os. We also obtain similar bounds for rectangle-intersection aggregate queries, in which the input is a set of weighted rectangles and a query asks to compute the aggregate of the weights of those input rectangles that overlap with the query rectangle. This result immediately improves a recent result on temporal-aggregate queries. Our indexing scheme can be dynamized and extended to higher dimensions. Finally, we demonstrate the practical efficiency of our index by comparing its performance against kdB-tree. For a dataset of around 100 million points, the CRB-tree query time is 8--10 times faster than the kdB-tree query time. Furthermore, unlike other indexing schemes, the query performance of CRB-tree is oblivious to the distribution of the input points and placement, shape and size of the query rectangle.
Planar Point Location For Large Data Sets: To Seek Or Not To Seek
, 2000
"... . We present an algorithm for external memory planar point location that is both effective and easy to implement. The base algorithm is an external memory variant of the bucket method by Edahiro, Kokubo and Asano that is combined with Lee and Yang's batched internal memory algorithm for planar p ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
. We present an algorithm for external memory planar point location that is both effective and easy to implement. The base algorithm is an external memory variant of the bucket method by Edahiro, Kokubo and Asano that is combined with Lee and Yang's batched internal memory algorithm for planar point location. Although our algorithm is not optimal in terms of its worst-case behavior, weshow its efficiency for both batched and single-shot queries by experiments with real-world data. The experiments show that the algorithm benefits from its mainly sequential disk access pattern and significantly outperforms the fastest algorithm for internal memory. 1 Introduction The well-known problem of planar point location consists of determining the region of a planar subdivision that contains a given query point. We assume that a planar subdivision is given by N line segments, and that each segment is labeled with the names of the two regions it separates. In this setting, a point locati...

