Results 1 - 10
of
13
External Memory Data Structures
, 2001
"... In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worst-case efficient external memory dynami ..."
Abstract
-
Cited by 78 (34 self)
- Add to MetaCart
In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worst-case efficient external memory dynamic data structures. We also briefly discuss some of the most popular external data structures used in practice.
The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree
- SIGMOD 2004 JUNE 13-18, 2004, PARIS, FRANCE
, 2004
"... We present the Priority R-tree, or PR-tree, which is the first R-tree variant that always answers a window query using O((N/B) 1−1/d + T/B) I/Os, where N is the number of ddimensional (hyper-) rectangles stored in the R-tree, B is the disk block size, and T is the output size. This is provably asymp ..."
Abstract
-
Cited by 46 (5 self)
- Add to MetaCart
We present the Priority R-tree, or PR-tree, which is the first R-tree variant that always answers a window query using O((N/B) 1−1/d + T/B) I/Os, where N is the number of ddimensional (hyper-) rectangles stored in the R-tree, B is the disk block size, and T is the output size. This is provably asymptotically optimal and significantly better than other Rtree variants, where a query may visit all N/B leaves in the tree even when T = 0. We also present an extensive experimental study of the practical performance of the PR-tree using both real-life and synthetic data. This study shows that the PR-tree performs similar to the best known R-tree variants on real-life and relatively nicely distributed data, but outperforms them significantly on more extreme data.
Implementing I/O-Efficient Data Structures Using TPIE
- In Proc. European Symposium on Algorithms
, 2002
"... In recent years, many theoretically I/O-efficient algorithms and data structures have been developed. The TPIE project at Duke University was started to investigate the practical importance of these theoretical results. The goal of this ongoing project is to provide a portable, extensible, flexib ..."
Abstract
-
Cited by 28 (6 self)
- Add to MetaCart
In recent years, many theoretically I/O-efficient algorithms and data structures have been developed. The TPIE project at Duke University was started to investigate the practical importance of these theoretical results. The goal of this ongoing project is to provide a portable, extensible, flexible, and easy to use C++ programming environment for efficiently implementing I/O-algorithms and data structures. The TPIE library has been developed in two phases. The first phase focused on supporting algorithms with a sequential I/O pattern, while the recently developed second phase has focused on supporting on-line I/O-efficient data structures, which exhibit a more random I/O pattern. This paper describes the design and implementation of the second phase of TPIE.
Bkd-tree: A dynamic scalable kd-tree
- In Proc. International Symposium on Spatial and Temporal Databases
, 2003
"... ..."
iWalk: Interactive out-of-core rendering of large models
, 2002
"... We present iWalk, a system for interactive out-of-core rendering of large models on an inexpensive PC. The system uses a new outof-core preprocessing algorithm and a new multi-threaded out-ofcore rendering approach. The out-of-core preprocessing algorithm is incremental and fast, and it builds an on ..."
Abstract
-
Cited by 19 (8 self)
- Add to MetaCart
We present iWalk, a system for interactive out-of-core rendering of large models on an inexpensive PC. The system uses a new outof-core preprocessing algorithm and a new multi-threaded out-ofcore rendering approach. The out-of-core preprocessing algorithm is incremental and fast, and it builds an on-disk hierarchical representation for a model larger than main memory. The out-of-core rendering approach uses multiple threads to overlap rendering, visibility computation, and disk operations. A rendering thread uses a from-point visibility algorithm to find the nodes of the model hierarchy that the user sees, and sends fetch requests to a geometry cache, which reads nodes from disk into memory. To avoid bursts of disk operations, a look-ahead thread guesses the nodes that the user may see next, and sends prefetch requests to the geometry cache. The system can run in approximate mode for interactive rendering, or in conservative mode for rendering with guaranteed accuracy. On a commodity PC, iWalk can preprocess a 13-million-polygon model in 17 minutes, and then render it in approximate mode with 98 % accuracy at 9 frames per second. Thus, iWalk allows us to use an inexpensive PC to visualize models that would typically require expensive high-end graphics workstations or parallel machines. 1
Cache-oblivious data structures for orthogonal range searching
- In Proc. ACM Symposium on Computational Geometry
, 2003
"... ABSTRACT We develop cache-oblivious data structures for orthogonal range searching, the problem of finding all T points in a set of N points in Rd lying in a query hyper-rectangle. Cacheoblivious data structures are designed to be efficient in arbitrary memory hierarchies. We describe a dynamic line ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
ABSTRACT We develop cache-oblivious data structures for orthogonal range searching, the problem of finding all T points in a set of N points in Rd lying in a query hyper-rectangle. Cacheoblivious data structures are designed to be efficient in arbitrary memory hierarchies. We describe a dynamic linear-size data structure that answers d-dimensional queries in O((N/B)1-1/d + T/B) memory transfers, where B is the block size of any two levels of a multilevel memory hierarchy. A point can be inserted into or deleted from this data structure in O(log2B N) memory transfers. We also develop a static structure for the twodimensional case that answers queries in O(logB N + T /B) memory transfers using O(N log22 N) space. The analysis of the latter structure requires that B = 22 c for some nonnegative integer constant c. Categories and Subject Descriptors F.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems
Cache-Oblivious R-Trees
, 2005
"... We develop a cache-oblivious data structure for storing a set S of N axis-aligned rectangles in the plane, such that all rectangles in S intersecting a query rectangle or point can be found efficiently. Our structure is an axis-aligned bounding-box hierarchy and as such it is the first cache-oblivio ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
We develop a cache-oblivious data structure for storing a set S of N axis-aligned rectangles in the plane, such that all rectangles in S intersecting a query rectangle or point can be found efficiently. Our structure is an axis-aligned bounding-box hierarchy and as such it is the first cache-oblivious R-tree with provable performance guarantees. If no point in the plane is contained in B or more rectangles in S, the structure answers a rectangle query using O(\sqrt{N/B} + T/B) memory transfers and a point query using O((N/B)^ε) memory transfers for any ε>0, where B is the block size of memory transfers between any two levels of a multilevel memory hierarchy. We also develop a variant of our structure that achieves the same performance on input sets with arbitrary overlap among the rectangles. The rectangle query bound matches the bound of the best known linear-space cache-aware structure.
From point cloud to grid DEM: A scalable approach
- In Proc. 12th International Symposium on Spatial Data Handling
, 2006
"... Summary. Given a set S of points in R 3 sampled from an elevation function H: R 2 → R, we present a scalable algorithm for constructing a grid digital elevation model (DEM). Our algorithm consists of three stages: First, we construct a quad tree on S to partition the point set into a set of non-over ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
Summary. Given a set S of points in R 3 sampled from an elevation function H: R 2 → R, we present a scalable algorithm for constructing a grid digital elevation model (DEM). Our algorithm consists of three stages: First, we construct a quad tree on S to partition the point set into a set of non-overlapping segments. Next, for each segment q, we compute the set of points in q and all segments neighboring q. Finally, we interpolate each segment independently using points within the segment and its neighboring segments. Data sets acquired by LIDAR and other modern mapping technologies consist of hundreds of millions of points and are too large to fit in main memory. When processing such massive data sets, the transfer of data between disk and main memory (also called I/O), rather than the CPU time, becomes the performance bottleneck. We therefore present an I/O-efficient algorithm for constructing a grid DEM. Our experiments show that the algorithm scales to data sets much larger than the size of main memory, while existing algorithms do not scale. For example, using a machine with 1GB RAM, we were able to construct a grid DEM containing 1.3 billion cells (occupying 1.2GB) from a LIDAR data set of over 390 million points (occupying 20GB) in about 53 hours. Neither ArcGIS nor GRASS, two popular GIS products, were able to process this data set. 1
Hierarchical Graph Indexing
- Proc. of Int’l Conf. on Information and Knowledge Management (CIKM
, 2003
"... Tra#c analysis, in the context of Telecommunications or Internet and Web data, is crucial for large network operations. Data in such networks is often provided as large graphs with hundreds of millions of vertices and edges. We propose e#cient techniques for managing such graphs at the storage level ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Tra#c analysis, in the context of Telecommunications or Internet and Web data, is crucial for large network operations. Data in such networks is often provided as large graphs with hundreds of millions of vertices and edges. We propose e#cient techniques for managing such graphs at the storage level in order to facilitate its processing at the interface level(visualization). The methods are based on a hierarchical decomposition of the graph edge set that is inherited from a hierarchical decomposition of the vertex set. Real time navigation is provided by an e#cient two level indexing schema called the gkd # -tree. The first level is a variation of a kd-tree index that partitions the edge set in a way that conforms to the hierarchical decomposition and the data distribution (the gkd-tree). The second level is a redundant R # -tree that indexes the leaf pages of the gkd- tree. We provide computational results that illustrate the superiority of the gkd # -tree against conventional indexes like the kd-tree and the R # -tree both in creation as well as query response times.
LR-tree: a Logarithmic Decomposable Spatial Index Method
, 2003
"... this paper, we present LR-trees, a new efficient scheme for dynamic manipulation of large datasets, which combines the search performance of the bulkloaded R-trees with the updated performance of R # -trees. Experimental results provide evidence on the latter statement and illustrate the superior ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
this paper, we present LR-trees, a new efficient scheme for dynamic manipulation of large datasets, which combines the search performance of the bulkloaded R-trees with the updated performance of R # -trees. Experimental results provide evidence on the latter statement and illustrate the superiority of the proposed method

