Results 1  10
of
119
External Memory Algorithms and Data Structures
, 1998
"... Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we survey the ..."
Abstract

Cited by 320 (24 self)
 Add to MetaCart
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we survey the state of the art in the design and analysis of external memory algorithms and data structures (which are sometimes referred to as "EM" or "I/O" or "outofcore" algorithms and data structures). EM algorithms and data structures are often designed and analyzed using the parallel disk model (PDM). The three machineindependent measures of performance in PDM are the number of I/O operations, the CPU time, and the amount of disk space. PDM allows for multiple disks (or disk arrays) and parallel CPUs, and it can be generalized to handle tertiary storage and hierarchical memory. We discuss several important paradigms for how to solve batched and online problems efficiently in external memory. Programming tools and environments are available for simplifying the programming task. The TPIE system (Transparent Parallel I/O programming Environment) is both easy to use and efficient in terms of execution speed. We report on some experiments using TPIE in the domain of spatial databases. The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.
Geometric Range Searching and Its Relatives
 CONTEMPORARY MATHEMATICS
"... ... process a set S of points in so that the points of S lying inside a query R region can be reported or counted quickly. Wesurvey the known techniques and data structures for range searching and describe their application to other related searching problems. ..."
Abstract

Cited by 253 (41 self)
 Add to MetaCart
... process a set S of points in so that the points of S lying inside a query R region can be reported or counted quickly. Wesurvey the known techniques and data structures for range searching and describe their application to other related searching problems.
Cacheoblivious Btrees
, 2000
"... Abstract. This paper presents two dynamic search trees attaining nearoptimal performance on any hierarchical memory. The data structures are independent of the parameters of the memory hierarchy, e.g., the number of memory levels, the blocktransfer size at each level, and the relative speeds of me ..."
Abstract

Cited by 133 (22 self)
 Add to MetaCart
Abstract. This paper presents two dynamic search trees attaining nearoptimal performance on any hierarchical memory. The data structures are independent of the parameters of the memory hierarchy, e.g., the number of memory levels, the blocktransfer size at each level, and the relative speeds of memory levels. The performance is analyzed in terms of the number of memory transfers between two memory levels with an arbitrary blocktransfer size of B; this analysis can then be applied to every adjacent pair of levels in a multilevel memory hierarchy. Both search trees match the optimal search bound of Θ(1+logB+1 N) memory transfers. This bound is also achieved by the classic Btree data structure on a twolevel memory hierarchy with a known blocktransfer size B. The first search tree supports insertions and deletions in Θ(1 + logB+1 N) amortized memory transfers, which matches the Btree’s worstcase bounds. The second search tree supports scanning S consecutive elements optimally in Θ(1 + S/B) memory transfers and supports insertions and deletions in Θ(1 + logB+1 N + log2 N) amortized memory transfers, matching the performance of the Btree for B = B Ω(log N log log N).
A Functional Approach to External Graph Algorithms
 Algorithmica
, 1998
"... . We present a new approach for designing external graph algorithms and use it to design simple external algorithms for computing connected components, minimum spanning trees, bottleneck minimum spanning trees, and maximal matchings in undirected graphs and multigraphs. Our I/O bounds compete w ..."
Abstract

Cited by 89 (2 self)
 Add to MetaCart
. We present a new approach for designing external graph algorithms and use it to design simple external algorithms for computing connected components, minimum spanning trees, bottleneck minimum spanning trees, and maximal matchings in undirected graphs and multigraphs. Our I/O bounds compete with those of previous approaches. Unlike previous approaches, ours is purely functionalwithout side effectsand is thus amenable to standard checkpointing and programming language optimization techniques. This is an important practical consideration for applications that may take hours to run. 1 Introduction We present a divideandconquer approach for designing external graph algorithms, i.e., algorithms on graphs that are too large to fit in main memory. Our approach is simple to describe and implement: it builds a succession of graph transformations that reduce to sorting, selection, and a recursive bucketing technique. No sophisticated data structures are needed. We apply our t...
Optimal Dynamic Interval Management in External Memory (Extended Abstract))
 IN PROC. IEEE SYMP. ON FOUNDATIONS OF COMP. SCI
, 1996
"... We present a space and I/Ooptimal externalmemory data structure for answering stabbing queries on a set of dynamically maintained intervals. Our data structure settles an open problem in databases and I/O algorithms by providing the first optimal externalmemory solution to the dynamic interval m ..."
Abstract

Cited by 84 (23 self)
 Add to MetaCart
We present a space and I/Ooptimal externalmemory data structure for answering stabbing queries on a set of dynamically maintained intervals. Our data structure settles an open problem in databases and I/O algorithms by providing the first optimal externalmemory solution to the dynamic interval management problem, which is a special case of 2dimensional range searching and a central problem for objectoriented and temporal databases and for constraint logic programming. Our data structure simultaneously uses optimal linear space (that is, O(N/B) blocks of disk space) and achieves the optimal O(log B N + T/B) I/O query bound and O(log B N ) I/O update bound, where B is the I/O block size and T the number of elements in the answer to a query. Our structure is also the first optimal external data structure for a 2dimensional range searching problem that has worstcase as opposed to amortized update bounds. Part of the data structure uses a novel balancing technique for efficient worstcase manipulation of balanced trees, which is of independent interest.
External Memory Data Structures
, 2001
"... In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worstcase efficient external memory dynami ..."
Abstract

Cited by 79 (36 self)
 Add to MetaCart
In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worstcase efficient external memory dynamic data structures. We also briefly discuss some of the most popular external data structures used in practice.
Improved Algorithms and Data Structures for Solving Graph Problems in External Memory
 In Proc. IEEE Symp. on Parallel and Distributed Processing
, 1996
"... Recently, the study of I/Oefficient algorithms has moved beyond fundamental problems of sorting and permuting and into wider areas such as computational geometry and graph algorithms. With this expansion has come a need for new algorithmic techniques and data structures. In this paper, we present I ..."
Abstract

Cited by 75 (0 self)
 Add to MetaCart
Recently, the study of I/Oefficient algorithms has moved beyond fundamental problems of sorting and permuting and into wider areas such as computational geometry and graph algorithms. With this expansion has come a need for new algorithmic techniques and data structures. In this paper, we present I/Oefficient analogues of wellknown data structures that we show to be useful for obtaining simpler and improved algorithms for several graph problems. Our results include improved algorithms for minimum spanning trees, breadthfirst search, and singlesource shortest paths. The descriptions of these algorithms are greatly simplified by their use of welldefined I/Oefficient data structures with good amortized performance bounds. We expect that I/Oefficient data structures such as these will be a useful tool for the design of I/Oefficient algorithms. 1. Introduction 1.1. Background and model The study of I/Oefficient algorithms has been receiving increased attention as increases in pro...
ExternalMemory Algorithms for Processing Line Segments in Geographic Information Systems
, 2007
"... In the design of algorithms for largescale applications it is essential to consider the problem of minimizing I/O communication. Geographical information systems (GIS) are good examples of such largescale applications as they frequently handle huge amounts of spatial data. In this paper we develop ..."
Abstract

Cited by 75 (30 self)
 Add to MetaCart
In the design of algorithms for largescale applications it is essential to consider the problem of minimizing I/O communication. Geographical information systems (GIS) are good examples of such largescale applications as they frequently handle huge amounts of spatial data. In this paper we develop efficient externalmemory algorithms for a number of important problems involving line segments in the plane, including trapezoid decomposition, batched planar point location, triangulation, red–blue line segment intersection reporting, and general line segment intersection reporting. In GIS systems the first three problems are useful for rendering and modeling, and the latter two are frequently used for overlaying maps and extracting information from them.
I/O Optimal Isosurface Extraction
, 1997
"... In this paper we give I/Ooptimal techniques for the extraction of isosurfaces from volumetric data, by a novel application of the I/Ooptimal interval tree of Arge and Vitter. The main idea is to preprocess the dataset once and for all to build an efficient search structure in disk, and then each ti ..."
Abstract

Cited by 73 (17 self)
 Add to MetaCart
In this paper we give I/Ooptimal techniques for the extraction of isosurfaces from volumetric data, by a novel application of the I/Ooptimal interval tree of Arge and Vitter. The main idea is to preprocess the dataset once and for all to build an efficient search structure in disk, and then each time we want to extract an isosurface, we perform an outputsensitive query on the search structure to retrieve only those active cells that are intersected by the isosurface. During the query operation, only two blocks of main memory space are needed, and only those active cells are brought into the main memory, plus some negligible overhead of disk accesses. This implies that we can efficiently visualize very large datasets on workstations with just enough main memory to hold the isosurfaces themselves. The implementation is delicate but not complicated. We give the first implementation of the I/Ooptimal interval tree, and also implement our methods as an I/O filter for Vtk's isosurface ext...
Cacheoblivious priority queue and graph algorithm applications
 In Proc. 34th Annual ACM Symposium on Theory of Computing
, 2002
"... In this paper we develop an optimal cacheoblivious priority queue data structure, supporting insertion, deletion, and deletemin operations in O ( 1 B logM/B N) amortized memory B transfers, where M and B are the memory and block transfer sizes of any two consecutive levels of a multilevel memory hi ..."
Abstract

Cited by 64 (10 self)
 Add to MetaCart
In this paper we develop an optimal cacheoblivious priority queue data structure, supporting insertion, deletion, and deletemin operations in O ( 1 B logM/B N) amortized memory B transfers, where M and B are the memory and block transfer sizes of any two consecutive levels of a multilevel memory hierarchy. In a cacheoblivious data structure, M and B are not used in the description of the structure. The bounds match the bounds of several previously developed externalmemory (cacheaware) priority queue data structures, which all rely crucially on knowledge about M and B. Priority queues are a critical component in many of the best known externalmemory graph algorithms, and using our cacheoblivious priority queue we develop several cacheoblivious graph algorithms.