Results 1  10
of
21
External Memory Algorithms and Data Structures
, 1998
"... Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we surve ..."
Abstract

Cited by 350 (25 self)
 Add to MetaCart
(Show Context)
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we survey the state of the art in the design and analysis of external memory algorithms and data structures (which are sometimes referred to as "EM" or "I/O" or "outofcore" algorithms and data structures). EM algorithms and data structures are often designed and analyzed using the parallel disk model (PDM). The three machineindependent measures of performance in PDM are the number of I/O operations, the CPU time, and the amount of disk space. PDM allows for multiple disks (or disk arrays) and parallel CPUs, and it can be generalized to handle tertiary storage and hierarchical memory. We discuss several important paradigms for how to solve batched and online problems efficiently in external memory. Programming tools and environments are available for simplifying the programming task. The TPIE system (Transparent Parallel I/O programming Environment) is both easy to use and efficient in terms of execution speed. We report on some experiments using TPIE in the domain of spatial databases. The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.
External Memory Data Structures
, 2001
"... In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worstcase efficient external memory dynami ..."
Abstract

Cited by 83 (37 self)
 Add to MetaCart
In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worstcase efficient external memory dynamic data structures. We also briefly discuss some of the most popular external data structures used in practice.
Range Searching
, 1996
"... Range searching is one of the central problems in computational geometry, because it arises in many applications and a wide variety of geometric problems can be formulated as a rangesearching problem. A typical rangesearching problem has the following form. Let S be a set of n points in R d , an ..."
Abstract

Cited by 73 (1 self)
 Add to MetaCart
Range searching is one of the central problems in computational geometry, because it arises in many applications and a wide variety of geometric problems can be formulated as a rangesearching problem. A typical rangesearching problem has the following form. Let S be a set of n points in R d , and let R be a family of subsets; elements of R are called ranges . We wish to preprocess S into a data structure so that for a query range R, the points in S " R can be reported or counted efficiently. Typical examples of ranges include rectangles, halfspaces, simplices, and balls. If we are only interested in answering a single query, it can be done in linear time, using linear space, by simply checking for each point p 2 S whether p lies in the query range.
The Priority RTree: A Practically Efficient and WorstCase Optimal RTree
 SIGMOD 2004 JUNE 1318, 2004, PARIS, FRANCE
, 2004
"... We present the Priority Rtree, or PRtree, which is the first Rtree variant that always answers a window query using O((N/B) 1−1/d + T/B) I/Os, where N is the number of ddimensional (hyper) rectangles stored in the Rtree, B is the disk block size, and T is the output size. This is provably asymp ..."
Abstract

Cited by 72 (8 self)
 Add to MetaCart
(Show Context)
We present the Priority Rtree, or PRtree, which is the first Rtree variant that always answers a window query using O((N/B) 1−1/d + T/B) I/Os, where N is the number of ddimensional (hyper) rectangles stored in the Rtree, B is the disk block size, and T is the output size. This is provably asymptotically optimal and significantly better than other Rtree variants, where a query may visit all N/B leaves in the tree even when T = 0. We also present an extensive experimental study of the practical performance of the PRtree using both reallife and synthetic data. This study shows that the PRtree performs similar to the best known Rtree variants on reallife and relatively nicely distributed data, but outperforms them significantly on more extreme data.
Implementing I/OEfficient Data Structures Using TPIE
 In Proc. European Symposium on Algorithms
, 2002
"... In recent years, many theoretically I/Oefficient algorithms and data structures have been developed. The TPIE project at Duke University was started to investigate the practical importance of these theoretical results. The goal of this ongoing project is to provide a portable, extensible, flexib ..."
Abstract

Cited by 33 (7 self)
 Add to MetaCart
(Show Context)
In recent years, many theoretically I/Oefficient algorithms and data structures have been developed. The TPIE project at Duke University was started to investigate the practical importance of these theoretical results. The goal of this ongoing project is to provide a portable, extensible, flexible, and easy to use C++ programming environment for efficiently implementing I/Oalgorithms and data structures. The TPIE library has been developed in two phases. The first phase focused on supporting algorithms with a sequential I/O pattern, while the recently developed second phase has focused on supporting online I/Oefficient data structures, which exhibit a more random I/O pattern. This paper describes the design and implementation of the second phase of TPIE.
Bkdtree: A dynamic scalable kdtree
 In Proc. International Symposium on Spatial and Temporal Databases
, 2003
"... ..."
(Show Context)
Cacheoblivious data structures for orthogonal range searching
 IN PROC. ACM SYMPOSIUM ON COMPUTATIONAL GEOMETRY
, 2003
"... We develop cacheoblivious data structures for orthogonal range searching, the problem of finding all T points in a set of N points in Rd lying in a query hyperrectangle. Cacheoblivious data structures are designed to be efficient in arbitrary memory hierarchies. We describe a dynamic linearsize ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
(Show Context)
We develop cacheoblivious data structures for orthogonal range searching, the problem of finding all T points in a set of N points in Rd lying in a query hyperrectangle. Cacheoblivious data structures are designed to be efficient in arbitrary memory hierarchies. We describe a dynamic linearsize data structure that answers ddimensional queries in O((N/B)11/d + T/B) memory transfers, where B is the block size of any two levels of a multilevel memory hierarchy. A point can be inserted into or deleted from this data structure in O(log2B N) memory transfers. We also develop a static structure for the twodimensional case that answers queries in O(logB N + T /B) memory transfers using O(N log22 N) space. The analysis of the latter structure requires that B = 22 c for some nonnegative integer constant c.
iWalk: Interactive outofcore rendering of large models
, 2002
"... We present iWalk, a system for interactive outofcore rendering of large models on an inexpensive PC. The system uses a new outofcore preprocessing algorithm and a new multithreaded outofcore rendering approach. The outofcore preprocessing algorithm is incremental and fast, and it builds an on ..."
Abstract

Cited by 20 (8 self)
 Add to MetaCart
(Show Context)
We present iWalk, a system for interactive outofcore rendering of large models on an inexpensive PC. The system uses a new outofcore preprocessing algorithm and a new multithreaded outofcore rendering approach. The outofcore preprocessing algorithm is incremental and fast, and it builds an ondisk hierarchical representation for a model larger than main memory. The outofcore rendering approach uses multiple threads to overlap rendering, visibility computation, and disk operations. A rendering thread uses a frompoint visibility algorithm to find the nodes of the model hierarchy that the user sees, and sends fetch requests to a geometry cache, which reads nodes from disk into memory. To avoid bursts of disk operations, a lookahead thread guesses the nodes that the user may see next, and sends prefetch requests to the geometry cache. The system can run in approximate mode for interactive rendering, or in conservative mode for rendering with guaranteed accuracy. On a commodity PC, iWalk can preprocess a 13millionpolygon model in 17 minutes, and then render it in approximate mode with 98 % accuracy at 9 frames per second. Thus, iWalk allows us to use an inexpensive PC to visualize models that would typically require expensive highend graphics workstations or parallel machines. 1
From point cloud to grid DEM: A scalable approach
 In Proc. 12th International Symposium on Spatial Data Handling
, 2006
"... Summary. Given a set S of points in R 3 sampled from an elevation function H: R 2 → R, we present a scalable algorithm for constructing a grid digital elevation model (DEM). Our algorithm consists of three stages: First, we construct a quad tree on S to partition the point set into a set of nonover ..."
Abstract

Cited by 12 (8 self)
 Add to MetaCart
(Show Context)
Summary. Given a set S of points in R 3 sampled from an elevation function H: R 2 → R, we present a scalable algorithm for constructing a grid digital elevation model (DEM). Our algorithm consists of three stages: First, we construct a quad tree on S to partition the point set into a set of nonoverlapping segments. Next, for each segment q, we compute the set of points in q and all segments neighboring q. Finally, we interpolate each segment independently using points within the segment and its neighboring segments. Data sets acquired by LIDAR and other modern mapping technologies consist of hundreds of millions of points and are too large to fit in main memory. When processing such massive data sets, the transfer of data between disk and main memory (also called I/O), rather than the CPU time, becomes the performance bottleneck. We therefore present an I/Oefficient algorithm for constructing a grid DEM. Our experiments show that the algorithm scales to data sets much larger than the size of main memory, while existing algorithms do not scale. For example, using a machine with 1GB RAM, we were able to construct a grid DEM containing 1.3 billion cells (occupying 1.2GB) from a LIDAR data set of over 390 million points (occupying 20GB) in about 53 hours. Neither ArcGIS nor GRASS, two popular GIS products, were able to process this data set. 1
CacheOblivious RTrees
, 2005
"... We develop a cacheoblivious data structure for storing a set S of N axisaligned rectangles in the plane, such that all rectangles in S intersecting a query rectangle or point can be found efficiently. Our structure is an axisaligned boundingbox hierarchy and as such it is the first cacheoblivio ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
We develop a cacheoblivious data structure for storing a set S of N axisaligned rectangles in the plane, such that all rectangles in S intersecting a query rectangle or point can be found efficiently. Our structure is an axisaligned boundingbox hierarchy and as such it is the first cacheoblivious Rtree with provable performance guarantees. If no point in the plane is contained in B or more rectangles in S, the structure answers a rectangle query using O(\sqrt{N/B} + T/B) memory transfers and a point query using O((N/B)^ε) memory transfers for any ε>0, where B is the block size of memory transfers between any two levels of a multilevel memory hierarchy. We also develop a variant of our structure that achieves the same performance on input sets with arbitrary overlap among the rectangles. The rectangle query bound matches the bound of the best known linearspace cacheaware structure.