Results 11  20
of
32
I/OEfficient Algorithms for Problems on Gridbased Terrains (Extended Abstract)
 In Proc. Workshop on Algorithm Engineering and Experimentation
, 2000
"... Lars Arge Laura Toma Jeffrey Scott Vitter Center for Geometric Computing Department of Computer Science Duke University Durham, NC 277080129 Abstract The potential and use of Geographic Information Systems (GIS) is rapidly increasing due to the increasing availability of massive amoun ..."
Abstract

Cited by 31 (14 self)
 Add to MetaCart
Lars Arge Laura Toma Jeffrey Scott Vitter Center for Geometric Computing Department of Computer Science Duke University Durham, NC 277080129 Abstract The potential and use of Geographic Information Systems (GIS) is rapidly increasing due to the increasing availability of massive amounts of geospatial data from projects like NASA's Mission to Planet Earth. However, the use of these massive datasets also exposes scalability problems with existing GIS algorithms. These scalability problems are mainly due to the fact that most GIS algorithms have been designed to minimize internal computation time, while I/O communication often is the bottleneck when processing massive amounts of data.
The I/OComplexity of Ordered BinaryDecision Diagram Manipulation
 UNIVERSITY OF AARHUS
, 1995
"... Ordered BinaryDecision Diagrams (OBDD) are the stateoftheart data structure for boolean function manipulation and there exist several software packages for OBDD manipulation. OBDDs have been successfully used to solve problems in e.g. digitalsystems design, verification and testing, in math ..."
Abstract

Cited by 28 (17 self)
 Add to MetaCart
Ordered BinaryDecision Diagrams (OBDD) are the stateoftheart data structure for boolean function manipulation and there exist several software packages for OBDD manipulation. OBDDs have been successfully used to solve problems in e.g. digitalsystems design, verification and testing, in mathematical logic, concurrent system design and in artificial intelligence. The OBDDs used in many of these applications quickly get larger than the avaliable main memory and it becomes essential to consider the problem of minimizing the Input/Output (I/O) communication. In this paper we analyze why existing OBDD manipulation algorithms perform poorly in an I/O environment and develop new I/Oefficient algorithms.
ExternalMemory Algorithms with Applications in Geographic Information Systems
 Algorithmic Foundations of GIS
, 1997
"... In the design of algorithms for largescale applications it is essential to consider the problem of minimizing Input/Output (I/O) communication. Geographical information systems (GIS) are good examples of such largescale applications as they frequently handle huge amounts of spatial data. In this n ..."
Abstract

Cited by 27 (9 self)
 Add to MetaCart
In the design of algorithms for largescale applications it is essential to consider the problem of minimizing Input/Output (I/O) communication. Geographical information systems (GIS) are good examples of such largescale applications as they frequently handle huge amounts of spatial data. In this note we survey the recent developments in externalmemory algorithms with applications in GIS. First we discuss the AggarwalVitter I/Omodel and illustrate why normal internalmemory algorithms for even very simple problems can perform terribly in an I/Oenvironment. Then we describe the fundamental paradigms for designing I/Oefficient algorithms by using them to design efficient sorting algorithms. We then go on and survey externalmemory algorithms for computational geometry problems  with special emphasis on problems with applications in GIS  and techniques for designing such algorithms: Using the orthogonal line segment intersection problem we illustrate the distributionsweeping and ...
Experiments on the Practical I/O Efficiency of Geometric Algorithms: Distribution Sweep vs. Plane Sweep
, 1995
"... We present an extensive experimental study comparing the performance of four algorithms for the following orthogonal segment intersection problem: given a set of horizontal and vertical line segments in the plane, report all intersecting horizontalvertical pairs. The problem has important applicati ..."
Abstract

Cited by 26 (8 self)
 Add to MetaCart
We present an extensive experimental study comparing the performance of four algorithms for the following orthogonal segment intersection problem: given a set of horizontal and vertical line segments in the plane, report all intersecting horizontalvertical pairs. The problem has important applications in VLSI layout and graphics, which are largescale in nature. The algorithms under evaluation are distribution sweep and three variations of plane sweep. Distribution sweep is specifically designed for the situations in which the problem is too large to be solved in internal memory, and theoretically has optimal I/O cost. Plane sweep is a wellknown and powerful technique in computational geometry, and is optimal for this particular problem in terms of internal computation. The three variations of plane sweep differ by the sorting methods (external vs. internal sorting) used in the preprocessing phase and the dynamic data structures (B tree vs. 234 tree) used in the sweeping ...
Theory and Practice of IOEfficient Algorithms for Multidimensional Batched Searching Problems (Extended Abstract)
"... We describe a powerful framework for designing efficient batch algorithms for certain largescale dynamic problems that must be solved using external memory. The class of problems we consider, which we call colorable externaldecomposable problems, include rectangle intersection, orthogonal line se ..."
Abstract

Cited by 22 (15 self)
 Add to MetaCart
We describe a powerful framework for designing efficient batch algorithms for certain largescale dynamic problems that must be solved using external memory. The class of problems we consider, which we call colorable externaldecomposable problems, include rectangle intersection, orthogonal line segment intersection, range searching, and point location. We are particularly interested in these problems in two and higher dimensions. They have numerous applications in geographic information systems (GIS), spatial databases, and VLSI and CAD design. We present simplified algorithms for problems previously solved by more complicated approaches (such as rectangle intersection), and we present efficient algorithms for problems not previously solved in an efficient way (such as point location and higherdimensional versions of range searching and rectangle intersection). We give experimen...
Early experiences in evaluating the Parallel Disk Model with the ViC* implementation
, 1996
"... Although several algorithms have been developed for the Parallel Disk Model (PDM), few have beenimplemented. Consequently, little has been known about the accuracy of thePDMin measuring I/O time and total running time toperform an outofcore computation. This paper analyzes timing results on multip ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
Although several algorithms have been developed for the Parallel Disk Model (PDM), few have beenimplemented. Consequently, little has been known about the accuracy of thePDMin measuring I/O time and total running time toperform an outofcore computation. This paper analyzes timing results on multipledisk platforms fortwo PDM algorithms, outofcore radix sort and BMMC permutations, to determine the strengths and weaknesses of thePDM. The results indicate the following. First, good PDM algorithms are usually not I/O bound. Second, of the four PDM parameters, one (problem size) is a good indicator of I/O time and running time, one (memory size) is a good indicator of I/O time but not necessarily running time, and the other two (block size and number of disks) do not necessarily indicate either I/O or running time. Third, because PDM algorithms tendnottobeI/Obound, using asynchronous I/O can reduce I/O wait times signi cantly. The software interface to the PDM is part of the ViC * runtime library. The interface is a set of wrappers that are designed to be both e cient and portable across several underlying le systems and target machines. 1
CRBTree: An Efficient Indexing Scheme for Range Aggregate Queries
 IN PROC. INTERNATIONAL CONFERENCE ON DATABASE THEORY
, 2003
"... We propose a new indexing scheme, called the CRBtree, for efficiently answering rangeaggregate queries. The rangeaggregate problem is defined as follows: Given a set of weighted points in R , compute the aggregate of weights of points that lie inside a ddimensional query rectangle. In this ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
We propose a new indexing scheme, called the CRBtree, for efficiently answering rangeaggregate queries. The rangeaggregate problem is defined as follows: Given a set of weighted points in R , compute the aggregate of weights of points that lie inside a ddimensional query rectangle. In this paper we focus on COUNT, SUM, AVG aggregates. First, we develop an indexing scheme for answering twodimensional rangeCOUNT queries that uses O(N=B) disk blocks and answers a query in O(log B N) I/Os, where N is the number of input points and B is the disk block size. This is the first optimal index structure for the 2D rangeCOUNT problem. The index can be extended to obtain a nearlinearsize indexing structure for answering rangeSUM queries using O(log B N) I/Os. We also obtain similar bounds for rectangleintersection aggregate queries, in which the input is a set of weighted rectangles and a query asks to compute the aggregate of the weights of those input rectangles that overlap with the query rectangle. This result immediately improves a recent result on temporalaggregate queries. Our indexing scheme can be dynamized and extended to higher dimensions. Finally, we demonstrate the practical efficiency of our index by comparing its performance against kdBtree. For a dataset of around 100 million points, the CRBtree query time is 810 times faster than the kdBtree query time. Furthermore, unlike other indexing schemes, the query performance of CRBtree is oblivious to the distribution of the input points and placement, shape and size of the query rectangle.
A Simple and Efficient Parallel Disk Mergesort
, 2002
"... External sorting—the process of sorting a file that is too large to fit into the computer’s internal memory and must be stored externally on disks—is a fundamental subroutine in database systems [G], [IBM]. Of prime importance are techniques that use multiple disks in parallel in order to speed up t ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
External sorting—the process of sorting a file that is too large to fit into the computer’s internal memory and must be stored externally on disks—is a fundamental subroutine in database systems [G], [IBM]. Of prime importance are techniques that use multiple disks in parallel in order to speed up the performance of external sorting. The simple randomized merging (SRM) mergesort algorithm proposed by Barve et al. [BGV] is the first parallel disk sorting algorithm that requires a provably optimal number of passes and that is fast in practice. Knuth [K, Section 5.4.9] recently identified SRM (which he calls “randomized striping”) as the method of choice for sorting with parallel disks. In this paper we present an efficient implementation of SRM, based upon novel and elegant data structures. We give a new implementation for SRM’s lookahead forecasting technique for parallel prefetching and its forecast and flush technique for buffer management. Our techniques amount to a significant improvement in the way SRM carries out the parallel, independent disk accesses necessary to read blocks of input runs efficiently during external merging. Our implementation is
Early Experiences in Implementing the Buffer Tree
, 1997
"... Computer processing speeds are increasing rapidly due to the evolution of faster chips, parallel processing of data, and more efficient software. Users today have access to an unprecedented amount of high quality, high resolution data through various technologies. This is resulting in a growing dema ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Computer processing speeds are increasing rapidly due to the evolution of faster chips, parallel processing of data, and more efficient software. Users today have access to an unprecedented amount of high quality, high resolution data through various technologies. This is resulting in a growing demand for higher performance input and output mechanisms in order to pass huge data sets from the external memory (EM), or disk system, through the relatively small main memory of the computer and back again. In recent years, research into external memory algorithms has been growing to keep pace with the demand for innovation in this area. EM algorithms for individual problems have been developed but few general purpose EM tools have been designed. A fundamental tool is the buffer tree, an external version of the (a,b) tree. It can be used to satisfy a number of EM requirements such as sorting, priority queues, range searching, etc. in a straightforward and I/Ooptimal manner. In this paper we...
An API for Choreographing Data Accesses
 DARTMOUTH COLLEGE DEPARTMENT OF COMPUTER SCIENCE
, 1995
"... Current APIs for multiprocessor multidisk file systems are not easy to use in developing outofcore algorithms that choreograph parallel data accesses. Consequently, the efficiency of these algorithms is hard to achieve in practice. We address this deficiency by specifying an API that includes ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Current APIs for multiprocessor multidisk file systems are not easy to use in developing outofcore algorithms that choreograph parallel data accesses. Consequently, the efficiency of these algorithms is hard to achieve in practice. We address this deficiency by specifying an API that includes dataaccess primitives for data choreography. With our API, the programmer can easily access specific blocks from each disk in a single operation, thereby fully utilizing the parallelism of the underlying storage system. Our API supports the development of libraries of commonlyused higherlevel routines such as matrixmatrix addition, matrixmatrix multiplication, and BMMC (bitmatrixmultiply/complement) permutations. We illustrate our API in implementations of these three highlevel routines to demonstrate how easy it is to use.