Results 1 - 10
of
15
External Memory Algorithms and Data Structures
, 1998
"... Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we survey the ..."
Abstract
-
Cited by 286 (24 self)
- Add to MetaCart
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we survey the state of the art in the design and analysis of external memory algorithms and data structures (which are sometimes referred to as "EM" or "I/O" or "out-of-core" algorithms and data structures). EM algorithms and data structures are often designed and analyzed using the parallel disk model (PDM). The three machine-independent measures of performance in PDM are the number of I/O operations, the CPU time, and the amount of disk space. PDM allows for multiple disks (or disk arrays) and parallel CPUs, and it can be generalized to handle tertiary storage and hierarchical memory. We discuss several important paradigms for how to solve batched and online problems efficiently in external memory. Programming tools and environments are available for simplifying the programming task. The TPIE system (Transparent Parallel I/O programming Environment) is both easy to use and efficient in terms of execution speed. We report on some experiments using TPIE in the domain of spatial databases. The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.
Terrain Simplification Simplified: A General Framework for View-Dependent Out-of-Core Visualization
, 2002
"... This paper describes a general framework for out-of-core rendering and management of massive terrain surfaces. The two key components of this framework are: view-dependent refinement of the terrain mesh; and a simple scheme for organizing the terrain data to improve coherence and reduce the number o ..."
Abstract
-
Cited by 67 (1 self)
- Add to MetaCart
This paper describes a general framework for out-of-core rendering and management of massive terrain surfaces. The two key components of this framework are: view-dependent refinement of the terrain mesh; and a simple scheme for organizing the terrain data to improve coherence and reduce the number of paging events from external storage to main memory. Similar to several previously proposed methods for viewdependent refinement, we recursively subdivide a triangle mesh defined over regularly gridded data using longest-edge bisection. As part of this single, per-frame refinement pass, we perform triangle stripping, view frustum culling, and smooth blending of geometry using geomorphing. Meanwhile, our refinement framework supports a large class of error metrics, is highly competitive in terms of rendering performance, and is surprisingly simple to implement. Independent
I/O-Complexity of Graph Algorithms
, 1999
"... We show lower bounds of \Omega\Gamma E V Sort(V )) for the I/Ocomplexity of graph theoretic problems like connected components, biconnected components, and minimum spanning trees, where E and V are the number of edges and vertices in the input graph, respectively. We also present a deterministic O ..."
Abstract
-
Cited by 59 (0 self)
- Add to MetaCart
We show lower bounds of \Omega\Gamma E V Sort(V )) for the I/Ocomplexity of graph theoretic problems like connected components, biconnected components, and minimum spanning trees, where E and V are the number of edges and vertices in the input graph, respectively. We also present a deterministic O( E V Sort(V ) \Delta max(1; log log V BD E )) algorithm for the problem of graph connectivity, where B and D denote respectively the block size and number of disks. Our algorithm includes a breadth first search; this maybe of independent interest. 1 Introduction Data sets of many modern applications are too large to fit into main memory, and must reside on disk. To run such applications efficiently, it is often necessary to explicitly manage disk accesses as a part of the algorithm. In other words, the algorithm must be designed for a model that includes disk, rather than the customary RAM model. Recently, this area has received a lot of attention, and algorithms have been developed for...
Visualization of Large Terrains Made Easy
, 2001
"... We present an elegant and simple to implement framework for performing out-of-core visualization and view-dependent refinement of large terrain surfaces. Contrary to the recent trend of increasingly elaborate algorithms for large-scale terrain visualization, our algorithms and data structures have b ..."
Abstract
-
Cited by 58 (4 self)
- Add to MetaCart
We present an elegant and simple to implement framework for performing out-of-core visualization and view-dependent refinement of large terrain surfaces. Contrary to the recent trend of increasingly elaborate algorithms for large-scale terrain visualization, our algorithms and data structures have been designed with the primary goal of simplicity and efficiency of implementation. Our approach to managing large terrain data also departs from more conventional strategies based on data tiling. Rather than emphasizing how to segment and efficiently bring data in and out of memory, we focus on the manner in which the data is laid out to achieve good memory coherency for data accesses made in a top-down (coarse-to-fine) refinement of the terrain. We present and compare the results of using several different data indexing schemes, and propose a simple to compute index that yields substantial improvements in locality and speed over more commonly used data layouts. Our second contribution is a new and simple, yet easy to generalize method for view-dependent refinement. Similar to several published methods in this area, we use longest edge bisection in a top-down traversal of the mesh hierarchy to produce a continuous surface with subdivision connectivity. In tandem with the refinement, we perform view frustum culling and triangle stripping. These three components are done together in a single pass over the mesh. We show how this framework supports virtually any error metric, while still being highly memory and compute efficient. 1
Scalable sweeping-based spatial join
- IN PROC. 24TH INT. CONF. VERY LARGE DATA BASES, VLDB
, 1998
"... In this paper, we consider the filter step of the spatial join problem, for the case where neither of the inputs are indexed. We present a new algorithm, Scalable Sweeping-Based Spatial Join (SSSJ), that achieves both efficiency on real-life data and robustness against highly skewed and worst-case d ..."
Abstract
-
Cited by 56 (7 self)
- Add to MetaCart
In this paper, we consider the filter step of the spatial join problem, for the case where neither of the inputs are indexed. We present a new algorithm, Scalable Sweeping-Based Spatial Join (SSSJ), that achieves both efficiency on real-life data and robustness against highly skewed and worst-case data sets. The algorithm combines a method with theoretically optimal bounds on I/O transfers based on the recently proposed distribution-sweeping technique with a highly optimized implementation of internal-memory plane-sweeping. We present experimental results based on an efficient implementation of the SSSJ algorithm, and compare it to the state-ofthe-art Partition-Based Spatial-Merge (PBSM) algorithm of Pate1 and DeWitt.
Global Static Indexing for Real-time Exploration of Very Large Regular Grids
, 2001
"... In this paper we introduce a new indexing scheme for progressive traversal and visualization of large regular grids. We demonstrate the potential of our approach by providing a tool that displays at interactive rates planar slices of scalar field data with very modest computing resources. We obtain ..."
Abstract
-
Cited by 33 (6 self)
- Add to MetaCart
In this paper we introduce a new indexing scheme for progressive traversal and visualization of large regular grids. We demonstrate the potential of our approach by providing a tool that displays at interactive rates planar slices of scalar field data with very modest computing resources. We obtain unprecedented results both in terms of absolute performance and, more importantly, in terms of scalability. On a laptop computer we provide real time interaction with a 2048 3 grid (8 Giga-nodes) using only 20MB of memory. On an SGI Onyx we slice interactively an 8192 3 grid ( tera-nodes) using only 60MB of memory. The scheme relies simply on the determination of an appropriate reordering of the rectilinear grid data and a progressive construction of the output slice. The reordering minimizes the amount of I/O performed during the out-of-core computation. The progressive and asynchronous computation of the output provides flexible quality/speed tradeoffs and a timecritical and interruptible user interface. 1.
On External Memory MST, SSSP and Multi-way Planar Graph Separation (Extended Abstract)
, 2000
"... Recently external memory graph algorithms have received considerable attention because massive graphs arise naturally in many applications involving massive data sets. Even though a large number of I/O-efficient graph algorithms have been developed, a number of fundamental problems still remain ..."
Abstract
-
Cited by 30 (9 self)
- Add to MetaCart
Recently external memory graph algorithms have received considerable attention because massive graphs arise naturally in many applications involving massive data sets. Even though a large number of I/O-efficient graph algorithms have been developed, a number of fundamental problems still remain open. In this paper we develop improved algorithms for the problem of computing a minimum spanning tree of a general graph G = (V; E), as well as new algorithms for the single source shortest paths and the multi-way graph separation problems on planar graphs.
External-Memory Algorithms with Applications in Geographic Information Systems
- Algorithmic Foundations of GIS
, 1997
"... In the design of algorithms for large-scale applications it is essential to consider the problem of minimizing Input/Output (I/O) communication. Geographical information systems (GIS) are good examples of such large-scale applications as they frequently handle huge amounts of spatial data. In this n ..."
Abstract
-
Cited by 24 (9 self)
- Add to MetaCart
In the design of algorithms for large-scale applications it is essential to consider the problem of minimizing Input/Output (I/O) communication. Geographical information systems (GIS) are good examples of such large-scale applications as they frequently handle huge amounts of spatial data. In this note we survey the recent developments in external-memory algorithms with applications in GIS. First we discuss the Aggarwal-Vitter I/O-model and illustrate why normal internal-memory algorithms for even very simple problems can perform terribly in an I/O-environment. Then we describe the fundamental paradigms for designing I/O-efficient algorithms by using them to design efficient sorting algorithms. We then go on and survey external-memory algorithms for computational geometry problems -- with special emphasis on problems with applications in GIS -- and techniques for designing such algorithms: Using the orthogonal line segment intersection problem we illustrate the distribution-sweeping and ...
Lower bounds for external memory dictionaries
- IN PROCEEDINGS OF THE 14TH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA
, 2003
"... We study trade-offs between the update time and the query time for comparisonbased external memory dictionaries. The main contributions of this paper are two lower bound trade-offs between the I/O complexity of member queries and insertions: If N> M insertions perform at most ffi * N/B I/Os, then ( ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
We study trade-offs between the update time and the query time for comparisonbased external memory dictionaries. The main contributions of this paper are two lower bound trade-offs between the I/O complexity of member queries and insertions: If N> M insertions perform at most ffi * N/B I/Os, then (1) there exists a query requiring N/(M * ( MB)O(ffi)) I/Os, and (2) there exists a query requiring \Omega (logffi log2 N NM) I/Os when ffi is O(B / log3 N) and N is at least M 2. For both lower bounds we describe data structureswhich give matching upper bounds for a wide range of parameters, thereby showing the lower bounds to be tight within these ranges.

