Results 1  10
of
35
External Memory Algorithms and Data Structures
, 1998
"... Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we surve ..."
Abstract

Cited by 360 (23 self)
 Add to MetaCart
(Show Context)
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we survey the state of the art in the design and analysis of external memory algorithms and data structures (which are sometimes referred to as "EM" or "I/O" or "outofcore" algorithms and data structures). EM algorithms and data structures are often designed and analyzed using the parallel disk model (PDM). The three machineindependent measures of performance in PDM are the number of I/O operations, the CPU time, and the amount of disk space. PDM allows for multiple disks (or disk arrays) and parallel CPUs, and it can be generalized to handle tertiary storage and hierarchical memory. We discuss several important paradigms for how to solve batched and online problems efficiently in external memory. Programming tools and environments are available for simplifying the programming task. The TPIE system (Transparent Parallel I/O programming Environment) is both easy to use and efficient in terms of execution speed. We report on some experiments using TPIE in the domain of spatial databases. The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.
A Functional Approach to External Graph Algorithms
 Algorithmica
, 1998
"... . We present a new approach for designing external graph algorithms and use it to design simple external algorithms for computing connected components, minimum spanning trees, bottleneck minimum spanning trees, and maximal matchings in undirected graphs and multigraphs. Our I/O bounds compete w ..."
Abstract

Cited by 109 (2 self)
 Add to MetaCart
. We present a new approach for designing external graph algorithms and use it to design simple external algorithms for computing connected components, minimum spanning trees, bottleneck minimum spanning trees, and maximal matchings in undirected graphs and multigraphs. Our I/O bounds compete with those of previous approaches. Unlike previous approaches, ours is purely functionalwithout side effectsand is thus amenable to standard checkpointing and programming language optimization techniques. This is an important practical consideration for applications that may take hours to run. 1 Introduction We present a divideandconquer approach for designing external graph algorithms, i.e., algorithms on graphs that are too large to fit in main memory. Our approach is simple to describe and implement: it builds a succession of graph transformations that reduce to sorting, selection, and a recursive bucketing technique. No sophisticated data structures are needed. We apply our t...
External Memory Data Structures
, 2001
"... In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worstcase efficient external memory dynami ..."
Abstract

Cited by 76 (32 self)
 Add to MetaCart
In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worstcase efficient external memory dynamic data structures. We also briefly discuss some of the most popular external data structures used in practice.
The Buffer Tree: A Technique for Designing Batched External Data Structures
, 2003
"... We present a technique for designing external memory data structures that support batched operations I/O efficiently. We show how the technique can be used to develop external versions of a search tree, a priority queue, and a segment tree, and give examples of how these structures can be used to d ..."
Abstract

Cited by 75 (14 self)
 Add to MetaCart
We present a technique for designing external memory data structures that support batched operations I/O efficiently. We show how the technique can be used to develop external versions of a search tree, a priority queue, and a segment tree, and give examples of how these structures can be used to develop I/Oefficient algorithms. The developed algorithms are either extremely simple or straightforward generalizations of known internal memory algorithms—given the developed external data structures.
Cacheoblivious priority queue and graph algorithm applications
 In Proc. 34th Annual ACM Symposium on Theory of Computing
, 2002
"... In this paper we develop an optimal cacheoblivious priority queue data structure, supporting insertion, deletion, and deletemin operations in O ( 1 B logM/B N) amortized memory B transfers, where M and B are the memory and block transfer sizes of any two consecutive levels of a multilevel memory hi ..."
Abstract

Cited by 68 (9 self)
 Add to MetaCart
(Show Context)
In this paper we develop an optimal cacheoblivious priority queue data structure, supporting insertion, deletion, and deletemin operations in O ( 1 B logM/B N) amortized memory B transfers, where M and B are the memory and block transfer sizes of any two consecutive levels of a multilevel memory hierarchy. In a cacheoblivious data structure, M and B are not used in the description of the structure. The bounds match the bounds of several previously developed externalmemory (cacheaware) priority queue data structures, which all rely crucially on knowledge about M and B. Priority queues are a critical component in many of the best known externalmemory graph algorithms, and using our cacheoblivious priority queue we develop several cacheoblivious graph algorithms.
Fast Priority Queues for Cached Memory
 ACM Journal of Experimental Algorithmics
, 1999
"... This paper advocates the adaption of external memory algorithms to this purpose. This idea and the practical issues involved are exemplified by engineering a fast priority queue suited to external memory and cached memory that is based on kway merging. It improves previous external memory algorithm ..."
Abstract

Cited by 58 (8 self)
 Add to MetaCart
(Show Context)
This paper advocates the adaption of external memory algorithms to this purpose. This idea and the practical issues involved are exemplified by engineering a fast priority queue suited to external memory and cached memory that is based on kway merging. It improves previous external memory algorithms by constant factors crucial for transferring it to cached memory. Running in the cache hierarchy of a workstation the algorithm is at least two times faster than an optimized implementation of binary heaps and 4ary heaps for large inputs
On External Memory MST, SSSP and Multiway Planar Graph Separation (Extended Abstract)
, 2000
"... Recently external memory graph algorithms have received considerable attention because massive graphs arise naturally in many applications involving massive data sets. Even though a large number of I/Oefficient graph algorithms have been developed, a number of fundamental problems still remain ..."
Abstract

Cited by 32 (11 self)
 Add to MetaCart
Recently external memory graph algorithms have received considerable attention because massive graphs arise naturally in many applications involving massive data sets. Even though a large number of I/Oefficient graph algorithms have been developed, a number of fundamental problems still remain open. In this paper we develop improved algorithms for the problem of computing a minimum spanning tree of a general graph G = (V; E), as well as new algorithms for the single source shortest paths and the multiway graph separation problems on planar graphs.
On externalmemory MST, SSSP and multiway planar graph separation
 In Proc. 8th Scandinavian Workshop on Algorithmic Theory, volume 1851 of LNCS
, 2000
"... Recently external memory graph algorithms have received considerable attention because massive graphs arise naturally in many applications involving massive data sets. Even though a large number of I/Oefficient graph algorithms have been developed, a number of fundamental problems still remain open ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
(Show Context)
Recently external memory graph algorithms have received considerable attention because massive graphs arise naturally in many applications involving massive data sets. Even though a large number of I/Oefficient graph algorithms have been developed, a number of fundamental problems still remain open. In this paper we develop an improved algorithm for the problem of computing a minimum spanning tree of a general graph, as well as new algorithms for the single source shortest paths and the multiway graph separation problems on planar graphs.
I/OEfficient Algorithms for Problems on Gridbased Terrains (Extended Abstract)
 In Proc. Workshop on Algorithm Engineering and Experimentation
, 2000
"... Lars Arge Laura Toma Jeffrey Scott Vitter Center for Geometric Computing Department of Computer Science Duke University Durham, NC 277080129 Abstract The potential and use of Geographic Information Systems (GIS) is rapidly increasing due to the increasing availability of massive amoun ..."
Abstract

Cited by 29 (13 self)
 Add to MetaCart
(Show Context)
Lars Arge Laura Toma Jeffrey Scott Vitter Center for Geometric Computing Department of Computer Science Duke University Durham, NC 277080129 Abstract The potential and use of Geographic Information Systems (GIS) is rapidly increasing due to the increasing availability of massive amounts of geospatial data from projects like NASA's Mission to Planet Earth. However, the use of these massive datasets also exposes scalability problems with existing GIS algorithms. These scalability problems are mainly due to the fact that most GIS algorithms have been designed to minimize internal computation time, while I/O communication often is the bottleneck when processing massive amounts of data.
A Unified Approach For Indexed and NonIndexed Spatial Joins
, 2000
"... . Most spatial join algorithms either assume the existence of a spatial index structure that is traversed during the join process, or solve the problem by sorting, partitioning, or onthefly index construction. In this paper, we develop a simple planesweeping algorithm that unifies the indexba ..."
Abstract

Cited by 22 (6 self)
 Add to MetaCart
(Show Context)
. Most spatial join algorithms either assume the existence of a spatial index structure that is traversed during the join process, or solve the problem by sorting, partitioning, or onthefly index construction. In this paper, we develop a simple planesweeping algorithm that unifies the indexbased and nonindex based approaches. This algorithm processes indexed as well as nonindexed inputs, extends naturally to multiway joins, and can be built easily from a few standard operations. We present the results of a comparative study of the new algorithm with several indexbased and nonindex based spatial join algorithms. We consider a number of factors, including the relative performance of CPU and disk, the quality of the spatial indexes, and the sizes of the input relations. An important conclusion from our work is that using an indexbased approach whenever indexes are available does not always lead to the best execution time, and hence we propose the use of a simple cost...