Results 1  10
of
29
External Memory Algorithms and Data Structures
, 1998
"... Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we surve ..."
Abstract

Cited by 333 (25 self)
 Add to MetaCart
(Show Context)
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we survey the state of the art in the design and analysis of external memory algorithms and data structures (which are sometimes referred to as "EM" or "I/O" or "outofcore" algorithms and data structures). EM algorithms and data structures are often designed and analyzed using the parallel disk model (PDM). The three machineindependent measures of performance in PDM are the number of I/O operations, the CPU time, and the amount of disk space. PDM allows for multiple disks (or disk arrays) and parallel CPUs, and it can be generalized to handle tertiary storage and hierarchical memory. We discuss several important paradigms for how to solve batched and online problems efficiently in external memory. Programming tools and environments are available for simplifying the programming task. The TPIE system (Transparent Parallel I/O programming Environment) is both easy to use and efficient in terms of execution speed. We report on some experiments using TPIE in the domain of spatial databases. The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.
External Memory Data Structures
, 2001
"... In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worstcase efficient external memory dynami ..."
Abstract

Cited by 83 (37 self)
 Add to MetaCart
In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worstcase efficient external memory dynamic data structures. We also briefly discuss some of the most popular external data structures used in practice.
The Priority RTree: A Practically Efficient and WorstCase Optimal RTree
 SIGMOD 2004 JUNE 1318, 2004, PARIS, FRANCE
, 2004
"... We present the Priority Rtree, or PRtree, which is the first Rtree variant that always answers a window query using O((N/B) 1−1/d + T/B) I/Os, where N is the number of ddimensional (hyper) rectangles stored in the Rtree, B is the disk block size, and T is the output size. This is provably asymp ..."
Abstract

Cited by 60 (7 self)
 Add to MetaCart
(Show Context)
We present the Priority Rtree, or PRtree, which is the first Rtree variant that always answers a window query using O((N/B) 1−1/d + T/B) I/Os, where N is the number of ddimensional (hyper) rectangles stored in the Rtree, B is the disk block size, and T is the output size. This is provably asymptotically optimal and significantly better than other Rtree variants, where a query may visit all N/B leaves in the tree even when T = 0. We also present an extensive experimental study of the practical performance of the PRtree using both reallife and synthetic data. This study shows that the PRtree performs similar to the best known Rtree variants on reallife and relatively nicely distributed data, but outperforms them significantly on more extreme data.
Implementing I/OEfficient Data Structures Using TPIE
 In Proc. European Symposium on Algorithms
, 2002
"... In recent years, many theoretically I/Oefficient algorithms and data structures have been developed. The TPIE project at Duke University was started to investigate the practical importance of these theoretical results. The goal of this ongoing project is to provide a portable, extensible, flexib ..."
Abstract

Cited by 33 (7 self)
 Add to MetaCart
(Show Context)
In recent years, many theoretically I/Oefficient algorithms and data structures have been developed. The TPIE project at Duke University was started to investigate the practical importance of these theoretical results. The goal of this ongoing project is to provide a portable, extensible, flexible, and easy to use C++ programming environment for efficiently implementing I/Oalgorithms and data structures. The TPIE library has been developed in two phases. The first phase focused on supporting algorithms with a sequential I/O pattern, while the recently developed second phase has focused on supporting online I/Oefficient data structures, which exhibit a more random I/O pattern. This paper describes the design and implementation of the second phase of TPIE.
Optimal External Memory Interval Management
, 2002
"... In this paper we present the external interval tree, an optimal external memory data structure for answering stabbing queries on a set of dynamically maintained intervals. The external interval tree can be used in an optimal solution to the dynamic interval management problem, which is a central pro ..."
Abstract

Cited by 31 (6 self)
 Add to MetaCart
In this paper we present the external interval tree, an optimal external memory data structure for answering stabbing queries on a set of dynamically maintained intervals. The external interval tree can be used in an optimal solution to the dynamic interval management problem, which is a central problem for objectoriented and temporal databases and for constraint logic programming. Part of the structure uses a novel weightbalancing technique for efficient worstcase manipulation of balanced trees of independent interest. The external interval tree, as well at our new balancing technique, have recently been used to develop several efficient external data structures.
Range Aggregate Processing in Spatial Databases
 TKDE
, 2004
"... Abstract—A range aggregate query returns summarized information about the points falling in a hyperrectangle (e.g., the total number of these points instead of their concrete ids). This paper studies spatial indexes that solve such queries efficiently and proposes the aggregate Pointtree (aPtree) ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
(Show Context)
Abstract—A range aggregate query returns summarized information about the points falling in a hyperrectangle (e.g., the total number of these points instead of their concrete ids). This paper studies spatial indexes that solve such queries efficiently and proposes the aggregate Pointtree (aPtree), which achieves logarithmic cost to the data set cardinality (independently of the query size) for twodimensional data. The aPtree requires only small modifications to the popular multiversion structural framework and, thus, can be implemented and applied easily in practice. We also present models that accurately predict the space consumption and query cost of the aPtree and are therefore suitable for query optimization. Extensive experiments confirm that the proposed methods are efficient and practical. Index Terms—Database, spatial database, range queries, aggregation. 1
A Framework for Index Bulk Loading and Dynamization
, 2001
"... In this paper we investigate automated methods for externalizing internal memory data structures. We consider a class of balanced trees that we call weightbalanced partitioning trees (or wptrees) for indexing a set of points in R d . Wellknown examples of wptrees include kd trees, BBDtre ..."
Abstract

Cited by 23 (15 self)
 Add to MetaCart
In this paper we investigate automated methods for externalizing internal memory data structures. We consider a class of balanced trees that we call weightbalanced partitioning trees (or wptrees) for indexing a set of points in R d . Wellknown examples of wptrees include kd trees, BBDtrees, pseudoquadtrees, and BARtrees. Given an efficient external wptree construction algorithm, we present a general framework for automatically obtaining a dynamic external data structure. Using this framework together with a new general construction (bulk loading) technique of independent interest, we obtain data structures with guaranteed good update performance in terms of I/O transfers. Our approach gives considerably improved construction and update I/O bounds for e.g. external kdtrees and BBDtrees.
I/Oefficient point location using persistent Btrees
 In Proc. Workshop on Algorithm Engineering and Experimentation
, 2003
"... Abstract We present an external planar point location data structure that is I/Oefficient both in theory and practice. The developed structure uses linear space and answers a query in optimal O(logB N) I/Os, where B is the disk block size. It is based on a persistent Btree, and all previously deve ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
(Show Context)
Abstract We present an external planar point location data structure that is I/Oefficient both in theory and practice. The developed structure uses linear space and answers a query in optimal O(logB N) I/Os, where B is the disk block size. It is based on a persistent Btree, and all previously developed such structures assume a total order on the elements in the structure. As a theoretical result of independent interest, we show how to remove this assumption. Most previous theoretical I/Oefficient planer point location structures are relatively complicated and have not been implemented. Based on a bucket approach, Vahrenhold and Hinrichs therefore developed a simple and practical, but theoretically nonoptimal, heuristic structure. We present an extensive experimental evaluation that shows that on a range of realworld Geographic Information Systems (GIS) data, our structure uses fewer I/Os than the structure of Vahrenhold and Hinrichs to answer a query. On a synthetically generated worstcase dataset, our structure uses significantly fewer I/Os. 1 Introduction The planar point location problem is the problem ofstoring a planar subdivision defined by N segmentssuch that the region containing a query point
CRBTree: An Efficient Indexing Scheme for Range Aggregate Queries
 IN PROC. INTERNATIONAL CONFERENCE ON DATABASE THEORY
, 2003
"... We propose a new indexing scheme, called the CRBtree, for efficiently answering rangeaggregate queries. The rangeaggregate problem is defined as follows: Given a set of weighted points in R , compute the aggregate of weights of points that lie inside a ddimensional query rectangle. In this ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
We propose a new indexing scheme, called the CRBtree, for efficiently answering rangeaggregate queries. The rangeaggregate problem is defined as follows: Given a set of weighted points in R , compute the aggregate of weights of points that lie inside a ddimensional query rectangle. In this paper we focus on COUNT, SUM, AVG aggregates. First, we develop an indexing scheme for answering twodimensional rangeCOUNT queries that uses O(N=B) disk blocks and answers a query in O(log B N) I/Os, where N is the number of input points and B is the disk block size. This is the first optimal index structure for the 2D rangeCOUNT problem. The index can be extended to obtain a nearlinearsize indexing structure for answering rangeSUM queries using O(log B N) I/Os. We also obtain similar bounds for rectangleintersection aggregate queries, in which the input is a set of weighted rectangles and a query asks to compute the aggregate of the weights of those input rectangles that overlap with the query rectangle. This result immediately improves a recent result on temporalaggregate queries. Our indexing scheme can be dynamized and extended to higher dimensions. Finally, we demonstrate the practical efficiency of our index by comparing its performance against kdBtree. For a dataset of around 100 million points, the CRBtree query time is 810 times faster than the kdBtree query time. Furthermore, unlike other indexing schemes, the query performance of CRBtree is oblivious to the distribution of the input points and placement, shape and size of the query rectangle.
CacheOblivious RTrees
, 2005
"... We develop a cacheoblivious data structure for storing a set S of N axisaligned rectangles in the plane, such that all rectangles in S intersecting a query rectangle or point can be found efficiently. Our structure is an axisaligned boundingbox hierarchy and as such it is the first cacheoblivio ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
We develop a cacheoblivious data structure for storing a set S of N axisaligned rectangles in the plane, such that all rectangles in S intersecting a query rectangle or point can be found efficiently. Our structure is an axisaligned boundingbox hierarchy and as such it is the first cacheoblivious Rtree with provable performance guarantees. If no point in the plane is contained in B or more rectangles in S, the structure answers a rectangle query using O(\sqrt{N/B} + T/B) memory transfers and a point query using O((N/B)^ε) memory transfers for any ε>0, where B is the block size of memory transfers between any two levels of a multilevel memory hierarchy. We also develop a variant of our structure that achieves the same performance on input sets with arbitrary overlap among the rectangles. The rectangle query bound matches the bound of the best known linearspace cacheaware structure.