Results 1 - 10
of
28
Algorithms for Parallel Memory I: Two-Level Memories
, 1992
"... We provide the first optimal algorithms in terms of the number of input/outputs (I/Os) required between internal memory and multiple secondary storage devices for the problems of sorting, FFT, matrix transposition, standard matrix multiplication, and related problems. Our two-level memory model is n ..."
Abstract
-
Cited by 226 (32 self)
- Add to MetaCart
We provide the first optimal algorithms in terms of the number of input/outputs (I/Os) required between internal memory and multiple secondary storage devices for the problems of sorting, FFT, matrix transposition, standard matrix multiplication, and related problems. Our two-level memory model is new and gives a realistic treatment of parallel block transfer, in which during a single I/O each of the P secondary storage devices can simultaneously transfer a contiguous block of B records. The model pertains to a large-scale uniprocessor system or parallel multiprocessor system with P disks. In addition, the sorting, FFT, permutation network, and standard matrix multiplication algorithms are typically optimal in terms of the amount of internal processing time. The difficulty in developing optimal algorithms is to cope with the partitioning of memory into P separate physical devices. Our algorithms' performance can be significantly better than those obtained by the well-known but nonopti...
External-Memory Algorithms for Processing Line Segments in Geographic Information Systems
, 2007
"... In the design of algorithms for large-scale applications it is essential to consider the problem of minimizing I/O communication. Geographical information systems (GIS) are good examples of such large-scale applications as they frequently handle huge amounts of spatial data. In this paper we develop ..."
Abstract
-
Cited by 74 (29 self)
- Add to MetaCart
In the design of algorithms for large-scale applications it is essential to consider the problem of minimizing I/O communication. Geographical information systems (GIS) are good examples of such large-scale applications as they frequently handle huge amounts of spatial data. In this paper we develop efficient external-memory algorithms for a number of important problems involving line segments in the plane, including trapezoid decomposition, batched planar point location, triangulation, red–blue line segment intersection reporting, and general line segment intersection reporting. In GIS systems the first three problems are useful for rendering and modeling, and the latter two are frequently used for overlaying maps and extracting information from them.
Improved Algorithms and Data Structures for Solving Graph Problems in External Memory
- In Proc. IEEE Symp. on Parallel and Distributed Processing
, 1996
"... Recently, the study of I/O-efficient algorithms has moved beyond fundamental problems of sorting and permuting and into wider areas such as computational geometry and graph algorithms. With this expansion has come a need for new algorithmic techniques and data structures. In this paper, we present I ..."
Abstract
-
Cited by 71 (0 self)
- Add to MetaCart
Recently, the study of I/O-efficient algorithms has moved beyond fundamental problems of sorting and permuting and into wider areas such as computational geometry and graph algorithms. With this expansion has come a need for new algorithmic techniques and data structures. In this paper, we present I/O-efficient analogues of well-known data structures that we show to be useful for obtaining simpler and improved algorithms for several graph problems. Our results include improved algorithms for minimum spanning trees, breadth-first search, and single-source shortest paths. The descriptions of these algorithms are greatly simplified by their use of well-defined I/O-efficient data structures with good amortized performance bounds. We expect that I/Oefficient data structures such as these will be a useful tool for the design of I/O-efficient algorithms. 1. Introduction 1.1. Background and model The study of I/O-efficient algorithms has been receiving increased attention as increases in pro...
Algorithms for Parallel Memory II: Hierarchical Multilevel Memories
- ALGORITHMICA
, 1993
"... In this paper we introduce parallel versions of two hierarchical memory models and give optimal algorithms in these models for sorting, FFT, and matrix multiplication. In our parallel models, there are P memory hierarchies operating simultaneously; communication among the hierarchies takes place ..."
Abstract
-
Cited by 62 (5 self)
- Add to MetaCart
In this paper we introduce parallel versions of two hierarchical memory models and give optimal algorithms in these models for sorting, FFT, and matrix multiplication. In our parallel models, there are P memory hierarchies operating simultaneously; communication among the hierarchies takes place at a base memory level. Our optimal sorting algorithm is randomized and is based upon the probabilistic partitioning technique developed in the companion paper for optimal disk sorting in a two-level memory with parallel block transfer. The probability of using l times the optimal running time is exponentially small in l(log l) log P.
Simple Randomized Mergesort on Parallel Disks
- PARALLEL COMPUTING
, 1996
"... We consider the problem of sorting a file of N records on the D-disk model of parallel I/O [VS94] in which there are two sources of parallelism. Records are transferred to and from disk concurrently in blocks of B contiguous records. In each I/O operation, up to one block can be transferred to or fr ..."
Abstract
-
Cited by 59 (11 self)
- Add to MetaCart
We consider the problem of sorting a file of N records on the D-disk model of parallel I/O [VS94] in which there are two sources of parallelism. Records are transferred to and from disk concurrently in blocks of B contiguous records. In each I/O operation, up to one block can be transferred to or from each of the D disks in parallel. We propose a simple, efficient, randomized mergesort algorithm called SRM that uses a forecast-and-flush approach to overcome the inherent difficulties of simple merging on parallel disks. SRM exhibits a limited use of randomization and also has a useful deterministic version. Generalizing the technique of forecasting [Knu73], our algorithm is able to read in, at any time, the "right" block from any disk, and using the technique of flushing, our algorithm evicts, without any I/O overhead, just the "right" blocks from memory to make space for new ones to be read in. The disk layout of SRM is such that it enjoys perfect write parallelism, avoiding fundamenta...
Asymptotically Tight Bounds for Performing BMMC Permutations on Parallel Disk Systems
, 1994
"... This paper presents asymptotically equal lower and upper bounds for the number of parallel I/O operations required to perform bit-matrix-multiply/complement (BMMC) permutations on the Parallel Disk Model proposed by Vitter and Shriver. A BMMC permutation maps a source index to a target index by an a ..."
Abstract
-
Cited by 59 (19 self)
- Add to MetaCart
This paper presents asymptotically equal lower and upper bounds for the number of parallel I/O operations required to perform bit-matrix-multiply/complement (BMMC) permutations on the Parallel Disk Model proposed by Vitter and Shriver. A BMMC permutation maps a source index to a target index by an affine transformation over GF (2), where the source and target indices are treated as bit vectors. The class of BMMC permutations includes many common permutations, such as matrix transposition (when dimensions are powers of 2), bit-reversal permutations, vector-reversal permutations, hypercube permutations, matrix reblocking, Graycode permutations, and inverse Gray-code permutations. The upper bound improves upon the asymptotic bound in the previous best known BMMC algorithm and upon the constant factor in the previous best known bit-permute/complement (BPC) permutation algorithm. The algorithm achieving the upper bound uses basic linear-algebra techniques to factor the characteristic matrix...
Integrating Theory and Practice in Parallel File Systems
- PROCEEDINGS OF THE 1993 DAGS/PC SYMPOSIUM (THE DARTMOUTH INSTITUTE FOR ADVANCED GRADUATE STUDIES
, 1993
"... Several algorithms for parallel disk systems have appeared in the literature recently, and they are asymptotically optimal in terms of the number of disk accesses. Scalable systems with parallel disks must be able to run these algorithms. We present for the first time a list of capabilities that mus ..."
Abstract
-
Cited by 48 (11 self)
- Add to MetaCart
Several algorithms for parallel disk systems have appeared in the literature recently, and they are asymptotically optimal in terms of the number of disk accesses. Scalable systems with parallel disks must be able to run these algorithms. We present for the first time a list of capabilities that must be provided by the system to support these optimal algorithms: control over declustering, querying about the configuration, independent I/O, and turning off parity, file caching, and prefetching. We summarize recent theoretical and empirical work that justifies the need for these capabilities. In addition, we sketch an organization for a parallel file interface with low-level primitives and higher-level operations.
Can a Shared-Memory Model Serve as a Bridging Model for Parallel Computation?
, 1999
"... There has been a great deal of interest recently in the development of general-purpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style fo ..."
Abstract
-
Cited by 41 (11 self)
- Add to MetaCart
There has been a great deal of interest recently in the development of general-purpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style for designing algorithms when compared with the PRAM model. Indeed, while many consider data parallelism as a convenient style, and the shared-memory abstraction as an easyto-use platform, the bandwidth limitations of current machines have diverted much attention to message-passing and distributed-memory models (such as the BSP and LogP) that account more properly for these limitations. In this paper we consider the question of whether a shared-memory model can serve as an effective bridging model for parallel computation. In particular, can a shared-memory model be as effective as, say, the BSP? As a candidate for a bridging model, we introduce the Queuing Shared-Memory (QSM) model, which accounts for limited communication bandwidth while still providing a simple shared-memory abstraction. We substantiate the ability of the QSM to serve as a bridging model by providing a simple work-preserving emulation of the QSM on both the BSP, and on a related model, the (d, x)-BSP. We present evidence that the features of the QSM are essential to its effectiveness as a bridging model. In addition, we describe scenarios

