Results 1  10
of
29
External Memory Algorithms and Data Structures
, 1998
"... Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we surve ..."
Abstract

Cited by 360 (23 self)
 Add to MetaCart
(Show Context)
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we survey the state of the art in the design and analysis of external memory algorithms and data structures (which are sometimes referred to as "EM" or "I/O" or "outofcore" algorithms and data structures). EM algorithms and data structures are often designed and analyzed using the parallel disk model (PDM). The three machineindependent measures of performance in PDM are the number of I/O operations, the CPU time, and the amount of disk space. PDM allows for multiple disks (or disk arrays) and parallel CPUs, and it can be generalized to handle tertiary storage and hierarchical memory. We discuss several important paradigms for how to solve batched and online problems efficiently in external memory. Programming tools and environments are available for simplifying the programming task. The TPIE system (Transparent Parallel I/O programming Environment) is both easy to use and efficient in terms of execution speed. We report on some experiments using TPIE in the domain of spatial databases. The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.
The Priority RTree: A Practically Efficient and WorstCase Optimal RTree
 SIGMOD 2004 JUNE 1318, 2004, PARIS, FRANCE
, 2004
"... We present the Priority Rtree, or PRtree, which is the first Rtree variant that always answers a window query using O((N/B) 1−1/d + T/B) I/Os, where N is the number of ddimensional (hyper) rectangles stored in the Rtree, B is the disk block size, and T is the output size. This is provably asymp ..."
Abstract

Cited by 79 (9 self)
 Add to MetaCart
We present the Priority Rtree, or PRtree, which is the first Rtree variant that always answers a window query using O((N/B) 1−1/d + T/B) I/Os, where N is the number of ddimensional (hyper) rectangles stored in the Rtree, B is the disk block size, and T is the output size. This is provably asymptotically optimal and significantly better than other Rtree variants, where a query may visit all N/B leaves in the tree even when T = 0. We also present an extensive experimental study of the practical performance of the PRtree using both reallife and synthetic data. This study shows that the PRtree performs similar to the best known Rtree variants on reallife and relatively nicely distributed data, but outperforms them significantly on more extreme data.
External Memory Data Structures
, 2001
"... In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worstcase efficient external memory dynami ..."
Abstract

Cited by 76 (32 self)
 Add to MetaCart
In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worstcase efficient external memory dynamic data structures. We also briefly discuss some of the most popular external data structures used in practice.
The Buffer Tree: A Technique for Designing Batched External Data Structures
, 2003
"... We present a technique for designing external memory data structures that support batched operations I/O efficiently. We show how the technique can be used to develop external versions of a search tree, a priority queue, and a segment tree, and give examples of how these structures can be used to d ..."
Abstract

Cited by 75 (14 self)
 Add to MetaCart
We present a technique for designing external memory data structures that support batched operations I/O efficiently. We show how the technique can be used to develop external versions of a search tree, a priority queue, and a segment tree, and give examples of how these structures can be used to develop I/Oefficient algorithms. The developed algorithms are either extremely simple or straightforward generalizations of known internal memory algorithms—given the developed external data structures.
Optimal external memory interval management
 SIAM Journal on Computing
"... This work has been made available by the University of Kansas ..."
Abstract

Cited by 50 (7 self)
 Add to MetaCart
(Show Context)
This work has been made available by the University of Kansas
Range Aggregate Processing in Spatial Databases
 TKDE
, 2004
"... Abstract—A range aggregate query returns summarized information about the points falling in a hyperrectangle (e.g., the total number of these points instead of their concrete ids). This paper studies spatial indexes that solve such queries efficiently and proposes the aggregate Pointtree (aPtree) ..."
Abstract

Cited by 40 (3 self)
 Add to MetaCart
(Show Context)
Abstract—A range aggregate query returns summarized information about the points falling in a hyperrectangle (e.g., the total number of these points instead of their concrete ids). This paper studies spatial indexes that solve such queries efficiently and proposes the aggregate Pointtree (aPtree), which achieves logarithmic cost to the data set cardinality (independently of the query size) for twodimensional data. The aPtree requires only small modifications to the popular multiversion structural framework and, thus, can be implemented and applied easily in practice. We also present models that accurately predict the space consumption and query cost of the aPtree and are therefore suitable for query optimization. Extensive experiments confirm that the proposed methods are efficient and practical. Index Terms—Database, spatial database, range queries, aggregation. 1
Implementing I/OEfficient Data Structures Using TPIE
 In Proc. European Symposium on Algorithms
, 2002
"... In recent years, many theoretically I/Oefficient algorithms and data structures have been developed. The TPIE project at Duke University was started to investigate the practical importance of these theoretical results. The goal of this ongoing project is to provide a portable, extensible, flexib ..."
Abstract

Cited by 32 (7 self)
 Add to MetaCart
(Show Context)
In recent years, many theoretically I/Oefficient algorithms and data structures have been developed. The TPIE project at Duke University was started to investigate the practical importance of these theoretical results. The goal of this ongoing project is to provide a portable, extensible, flexible, and easy to use C++ programming environment for efficiently implementing I/Oalgorithms and data structures. The TPIE library has been developed in two phases. The first phase focused on supporting algorithms with a sequential I/O pattern, while the recently developed second phase has focused on supporting online I/Oefficient data structures, which exhibit a more random I/O pattern. This paper describes the design and implementation of the second phase of TPIE.
A Framework for Index Bulk Loading and Dynamization
, 2001
"... In this paper we investigate automated methods for externalizing internal memory data structures. We consider a class of balanced trees that we call weightbalanced partitioning trees (or wptrees) for indexing a set of points in R d . Wellknown examples of wptrees include kd trees, BBDtre ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
In this paper we investigate automated methods for externalizing internal memory data structures. We consider a class of balanced trees that we call weightbalanced partitioning trees (or wptrees) for indexing a set of points in R d . Wellknown examples of wptrees include kd trees, BBDtrees, pseudoquadtrees, and BARtrees. Given an efficient external wptree construction algorithm, we present a general framework for automatically obtaining a dynamic external data structure. Using this framework together with a new general construction (bulk loading) technique of independent interest, we obtain data structures with guaranteed good update performance in terms of I/O transfers. Our approach gives considerably improved construction and update I/O bounds for e.g. external kdtrees and BBDtrees.
I/Oefficient Point Location using Persistent BTrees
"... We present an external planar point location data structure that is I/Oefficient both in theory and practice. The developed structure uses linear space and answers a query in optimal O(log B N) I/Os, where B is the disk block size. It is based on a persistent Btree, and all previously developed su ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
(Show Context)
We present an external planar point location data structure that is I/Oefficient both in theory and practice. The developed structure uses linear space and answers a query in optimal O(log B N) I/Os, where B is the disk block size. It is based on a persistent Btree, and all previously developed such structures assume a total order on the elements in the structure. As a theoretical result of independent interest, we show how to remove this assumption. Most previous theoretical I/Oefficient planer point location structures are relatively complicated and have not been implemented. Based on a bucket approach, Vahrenhold and Hinrichs therefore developed a simple and practical, but theoretically nonoptimal, heuristic structure. We present an extensive experimental evaluation that shows that on a range of realworld Geographic Information Systems (GIS) data, our structure uses fewer I/Os than the structure of Vahrenhold and Hinrichs to answer a query. On a synthetically generated worstcase dataset, our structure uses significantly fewer I/Os.
Optimal dynamic vertical ray shooting in rectilinear planar subdivisions
"... Optimal dynamic vertical ray shooting in rectilinear planar subdivisions. In this paper we consider the dynamic vertical ray shooting problem, that is the task of maintaining a dynamic set S of n non intersecting horizontal line segments in the plane subject to a query that reports the first segment ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Optimal dynamic vertical ray shooting in rectilinear planar subdivisions. In this paper we consider the dynamic vertical ray shooting problem, that is the task of maintaining a dynamic set S of n non intersecting horizontal line segments in the plane subject to a query that reports the first segment in S intersecting a vertical ray from a query point. We develop a linearsize structure that supports queries, insertions and deletions in O(log n) worstcase time. Our structure works in the comparison model and uses a RAM.