Results 1 - 10
of
29
External Memory Algorithms and Data Structures
, 1998
"... Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we surve ..."
Abstract
-
Cited by 360 (23 self)
- Add to MetaCart
(Show Context)
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we survey the state of the art in the design and analysis of external memory algorithms and data structures (which are sometimes referred to as "EM" or "I/O" or "out-of-core" algorithms and data structures). EM algorithms and data structures are often designed and analyzed using the parallel disk model (PDM). The three machine-independent measures of performance in PDM are the number of I/O operations, the CPU time, and the amount of disk space. PDM allows for multiple disks (or disk arrays) and parallel CPUs, and it can be generalized to handle tertiary storage and hierarchical memory. We discuss several important paradigms for how to solve batched and online problems efficiently in external memory. Programming tools and environments are available for simplifying the programming task. The TPIE system (Transparent Parallel I/O programming Environment) is both easy to use and efficient in terms of execution speed. We report on some experiments using TPIE in the domain of spatial databases. The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.
The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree
- SIGMOD 2004 JUNE 13-18, 2004, PARIS, FRANCE
, 2004
"... We present the Priority R-tree, or PR-tree, which is the first R-tree variant that always answers a window query using O((N/B) 1−1/d + T/B) I/Os, where N is the number of ddimensional (hyper-) rectangles stored in the R-tree, B is the disk block size, and T is the output size. This is provably asymp ..."
Abstract
-
Cited by 79 (9 self)
- Add to MetaCart
We present the Priority R-tree, or PR-tree, which is the first R-tree variant that always answers a window query using O((N/B) 1−1/d + T/B) I/Os, where N is the number of ddimensional (hyper-) rectangles stored in the R-tree, B is the disk block size, and T is the output size. This is provably asymptotically optimal and significantly better than other Rtree variants, where a query may visit all N/B leaves in the tree even when T = 0. We also present an extensive experimental study of the practical performance of the PR-tree using both real-life and synthetic data. This study shows that the PR-tree performs similar to the best known R-tree variants on real-life and relatively nicely distributed data, but outperforms them significantly on more extreme data.
External Memory Data Structures
, 2001
"... In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worst-case efficient external memory dynami ..."
Abstract
-
Cited by 76 (32 self)
- Add to MetaCart
In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worst-case efficient external memory dynamic data structures. We also briefly discuss some of the most popular external data structures used in practice.
The Buffer Tree: A Technique for Designing Batched External Data Structures
, 2003
"... We present a technique for designing external memory data structures that support batched operations I/O efficiently. We show how the technique can be used to develop external versions of a search tree, a priority queue, and a segment tree, and give examples of how these structures can be used to d ..."
Abstract
-
Cited by 75 (14 self)
- Add to MetaCart
We present a technique for designing external memory data structures that support batched operations I/O efficiently. We show how the technique can be used to develop external versions of a search tree, a priority queue, and a segment tree, and give examples of how these structures can be used to develop I/Oefficient algorithms. The developed algorithms are either extremely simple or straightforward generalizations of known internal memory algorithms—given the developed external data structures.
Optimal external memory interval management
- SIAM Journal on Computing
"... This work has been made available by the University of Kansas ..."
Abstract
-
Cited by 50 (7 self)
- Add to MetaCart
(Show Context)
This work has been made available by the University of Kansas
Range Aggregate Processing in Spatial Databases
- TKDE
, 2004
"... Abstract—A range aggregate query returns summarized information about the points falling in a hyper-rectangle (e.g., the total number of these points instead of their concrete ids). This paper studies spatial indexes that solve such queries efficiently and proposes the aggregate Point-tree (aP-tree) ..."
Abstract
-
Cited by 40 (3 self)
- Add to MetaCart
(Show Context)
Abstract—A range aggregate query returns summarized information about the points falling in a hyper-rectangle (e.g., the total number of these points instead of their concrete ids). This paper studies spatial indexes that solve such queries efficiently and proposes the aggregate Point-tree (aP-tree), which achieves logarithmic cost to the data set cardinality (independently of the query size) for two-dimensional data. The aP-tree requires only small modifications to the popular multiversion structural framework and, thus, can be implemented and applied easily in practice. We also present models that accurately predict the space consumption and query cost of the aP-tree and are therefore suitable for query optimization. Extensive experiments confirm that the proposed methods are efficient and practical. Index Terms—Database, spatial database, range queries, aggregation. 1
Implementing I/O-Efficient Data Structures Using TPIE
- In Proc. European Symposium on Algorithms
, 2002
"... In recent years, many theoretically I/O-efficient algorithms and data structures have been developed. The TPIE project at Duke University was started to investigate the practical importance of these theoretical results. The goal of this ongoing project is to provide a portable, extensible, flexib ..."
Abstract
-
Cited by 32 (7 self)
- Add to MetaCart
(Show Context)
In recent years, many theoretically I/O-efficient algorithms and data structures have been developed. The TPIE project at Duke University was started to investigate the practical importance of these theoretical results. The goal of this ongoing project is to provide a portable, extensible, flexible, and easy to use C++ programming environment for efficiently implementing I/O-algorithms and data structures. The TPIE library has been developed in two phases. The first phase focused on supporting algorithms with a sequential I/O pattern, while the recently developed second phase has focused on supporting on-line I/O-efficient data structures, which exhibit a more random I/O pattern. This paper describes the design and implementation of the second phase of TPIE.
A Framework for Index Bulk Loading and Dynamization
, 2001
"... In this paper we investigate automated methods for externalizing internal memory data structures. We consider a class of balanced trees that we call weight-balanced partitioning trees (or wp-trees) for indexing a set of points in R d . Well-known examples of wp-trees include kd- trees, BBD-tre ..."
Abstract
-
Cited by 20 (9 self)
- Add to MetaCart
In this paper we investigate automated methods for externalizing internal memory data structures. We consider a class of balanced trees that we call weight-balanced partitioning trees (or wp-trees) for indexing a set of points in R d . Well-known examples of wp-trees include kd- trees, BBD-trees, pseudo-quad-trees, and BAR-trees. Given an efficient external wp-tree construction algorithm, we present a general framework for automatically obtaining a dynamic external data structure. Using this framework together with a new general construction (bulk loading) technique of independent interest, we obtain data structures with guaranteed good update performance in terms of I/O transfers. Our approach gives considerably improved construction and update I/O bounds for e.g. external kd-trees and BBD-trees.
I/O-efficient Point Location using Persistent B-Trees
"... We present an external planar point location data structure that is I/O-efficient both in theory and practice. The developed structure uses linear space and answers a query in optimal O(log B N) I/Os, where B is the disk block size. It is based on a persistent B-tree, and all previously developed su ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
(Show Context)
We present an external planar point location data structure that is I/O-efficient both in theory and practice. The developed structure uses linear space and answers a query in optimal O(log B N) I/Os, where B is the disk block size. It is based on a persistent B-tree, and all previously developed such structures assume a total order on the elements in the structure. As a theoretical result of independent interest, we show how to remove this assumption. Most previous theoretical I/O-efficient planer point location structures are relatively complicated and have not been implemented. Based on a bucket approach, Vahrenhold and Hinrichs therefore developed a simple and practical, but theoretically non-optimal, heuristic structure. We present an extensive experimental evaluation that shows that on a range of real-world Geographic Information Systems (GIS) data, our structure uses fewer I/Os than the structure of Vahrenhold and Hinrichs to answer a query. On a synthetically generated worst-case dataset, our structure uses significantly fewer I/Os.
Optimal dynamic vertical ray shooting in rectilinear planar subdivisions
"... Optimal dynamic vertical ray shooting in rectilinear planar subdivisions. In this paper we consider the dynamic vertical ray shooting problem, that is the task of maintaining a dynamic set S of n non intersecting horizontal line segments in the plane subject to a query that reports the first segment ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
Optimal dynamic vertical ray shooting in rectilinear planar subdivisions. In this paper we consider the dynamic vertical ray shooting problem, that is the task of maintaining a dynamic set S of n non intersecting horizontal line segments in the plane subject to a query that reports the first segment in S intersecting a vertical ray from a query point. We develop a linear-size structure that supports queries, insertions and deletions in O(log n) worst-case time. Our structure works in the comparison model and uses a RAM.