I/OEfficient Scientific Computation Using TPIE
 In Proceedings of the Goddard Conference on Mass Storage Systems and Technologies, NASA Conference Publication 3340, Volume II
, 1995
In recent years, I/Oefficient algorithms for a wide variety of problems have appeared in the literature. Thus far, however, systems specifically designed to assist programmers in implementing such algorithms have remained scarce. TPIE is a system designed to fill this void. It supports I/Oeff
In recent years, I/Oefficient algorithms for a wide variety of problems have appeared in the literature. Thus far, however, systems specifically designed to assist programmers in implementing such algorithms have remained scarce. TPIE is a system designed to fill this void. It supports I/Oefficient paradigms for problems from a variety of domains, including computational geometry, graph algorithms, and scientific computation. The TPIE interface frees programmers from having to deal not only of explicit read and write calls, but also the complex memory management that must be performed for I/Oefficient computation.
The I/OComplexity of Ordered BinaryDecision Diagram Manipulation
 UNIVERSITY OF AARHUS
, 1995
"... Ordered BinaryDecision Diagrams (OBDD) are the stateoftheart data structure for boolean function manipulation and there exist several software packages for OBDD manipulation. OBDDs have been successfully used to solve problems in e.g. digitalsystems design, verification and testing, in math ..."
Ordered BinaryDecision Diagrams (OBDD) are the stateoftheart data structure for boolean function manipulation and there exist several software packages for OBDD manipulation. OBDDs have been successfully used to solve problems in e.g. digitalsystems design, verification and testing, in mathematical logic, concurrent system design and in artificial intelligence. The OBDDs used in many of these applications quickly get larger than the avaliable main memory and it becomes essential to consider the problem of minimizing the Input/Output (I/O) communication. In this paper we analyze why existing OBDD manipulation algorithms perform poorly in an I/O environment and develop new I/Oefficient algorithms.
ExternalMemory Algorithms with Applications in Geographic Information Systems
 Algorithmic Foundations of GIS
, 1997
"... In the design of algorithms for largescale applications it is essential to consider the problem of minimizing Input/Output (I/O) communication. Geographical information systems (GIS) are good examples of such largescale applications as they frequently handle huge amounts of spatial data. In this n ..."
In the design of algorithms for largescale applications it is essential to consider the problem of minimizing Input/Output (I/O) communication. Geographical information systems (GIS) are good examples of such largescale applications as they frequently handle huge amounts of spatial data. In this note we survey the recent developments in externalmemory algorithms with applications in GIS. First we discuss the AggarwalVitter I/Omodel and illustrate why normal internalmemory algorithms for even very simple problems can perform terribly in an I/Oenvironment. Then we describe the fundamental paradigms for designing I/Oefficient algorithms by using them to design efficient sorting algorithms. We then go on and survey externalmemory algorithms for computational geometry problems  with special emphasis on problems with applications in GIS  and techniques for designing such algorithms: Using the orthogonal line segment intersection problem we illustrate the distributionsweeping and ...
On Sorting Strings in External Memory
, 1997
"... ) Lars Arge Paolo Ferragina y Roberto Grossi z Jeffrey Scott Vitter x Abstract. In this paper we address for the first time the I/O complexity of the problem of sorting strings in external memory, which is a fundamental component of many largescale text applications. In the standard unitcost RAM c ..."
) Lars Arge Paolo Ferragina y Roberto Grossi z Jeffrey Scott Vitter x Abstract. In this paper we address for the first time the I/O complexity of the problem of sorting strings in external memory, which is a fundamental component of many largescale text applications. In the standard unitcost RAM comparison model, the complexity of sorting K strings of total length N is \Theta(K log 2 K+N). By analogy, in the external memory (or I/O) model, where the internal memory has size M and the block transfer size is B, it would be natural to guess that the I/O complexity of sorting strings is \Theta( K B log M=B K B + N B ), but the known algorithms do not come even close to achieving this bound. Our results show, somewhat counterintuitively, that the I/O complexity of string sorting depends upon the length of the strings relative to the block size. We first consider a simple comparison I/O model, where one is not allowed to break the strings into their characters, and we sho...
Experiments on the Practical I/O Efficiency of Geometric Algorithms: Distribution Sweep vs. Plane Sweep
, 1995
"... We present an extensive experimental study comparing the performance of four algorithms for the following orthogonal segment intersection problem: given a set of horizontal and vertical line segments in the plane, report all intersecting horizontalvertical pairs. The problem has important applicati ..."
We present an extensive experimental study comparing the performance of four algorithms for the following orthogonal segment intersection problem: given a set of horizontal and vertical line segments in the plane, report all intersecting horizontalvertical pairs. The problem has important applications in VLSI layout and graphics, which are largescale in nature. The algorithms under evaluation are distribution sweep and three variations of plane sweep. Distribution sweep is specifically designed for the situations in which the problem is too large to be solved in internal memory, and theoretically has optimal I/O cost. Plane sweep is a wellknown and powerful technique in computational geometry, and is optimal for this particular problem in terms of internal computation. The three variations of plane sweep differ by the sorting methods (external vs. internal sorting) used in the preprocessing phase and the dynamic data structures (B tree vs. 234 tree) used in the sweeping ...
Massively parallel algorithms for privatecache chip multiprocessors. Under submission
, 2008
"... In this paper, we study massively parallel algorithms for privatecache chip multiprocessors (CMPs), focusing on methods for foundational problems that can scale to hundreds or even thousands of cores. By focusing on privatecache CMPs, we show that we can design efficient algorithms that need no add ..."
In this paper, we study massively parallel algorithms for privatecache chip multiprocessors (CMPs), focusing on methods for foundational problems that can scale to hundreds or even thousands of cores. By focusing on privatecache CMPs, we show that we can design efficient algorithms that need no additional assumptions about the way that cores are interconnected, for we assume that all interprocessor communication occurs through the memory hierarchy. We study several fundamental problems, including prefix sums, selection, and sorting, which often form the building blocks of other parallel algorithms. Indeed, we present two sorting algorithms, a distribution sort and a mergesort. All algorithms in the paper are asymptotically optimal in terms of the parallel cache accesses and space complexity under reasonable assumptions about the relationships between the number of processors, the size of memory, and the size of cache blocks. In addition, we study sorting lower bounds in a computational model, which we call the parallel externalmemory (PEM) model, that formalizes the essential properties of our algorithms for privatecache chip multiprocessors. [Regular paper submission to SPAA 2008, which may be considered for a normal track or the special track on multicore systems.] ∗ Center for Massive Data Algorithmics – a Center of the Danish National Research Foundation 1
Asynchronous Parallel Disk Sorting
 IN 15TH ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES
, 2003
"... We develop an algorithm for parallel disk sorting, whose I/O cost approaches the lower bound and that guarantees almost perfect overlap between I/O and computation. Previous algorithms have either suboptimal I/O volume or cannot guarantee that I/O and computations can always be overlapped. We give a ..."
We develop an algorithm for parallel disk sorting, whose I/O cost approaches the lower bound and that guarantees almost perfect overlap between I/O and computation. Previous algorithms have either suboptimal I/O volume or cannot guarantee that I/O and computations can always be overlapped. We give an efficient implementation that can (at least) compete with the best practical implementations but gives additional performance guarantees. For the experiments we have configured a state of the art machine that can sustain full bandwidth I/O with eight disks and is very cost effective.
Early experiences in evaluating the Parallel Disk Model with the ViC* implementation
, 1996
"... Although several algorithms have been developed for the Parallel Disk Model (PDM), few have beenimplemented. Consequently, little has been known about the accuracy of thePDMin measuring I/O time and total running time toperform an outofcore computation. This paper analyzes timing results on multip ..."
Although several algorithms have been developed for the Parallel Disk Model (PDM), few have beenimplemented. Consequently, little has been known about the accuracy of thePDMin measuring I/O time and total running time toperform an outofcore computation. This paper analyzes timing results on multipledisk platforms fortwo PDM algorithms, outofcore radix sort and BMMC permutations, to determine the strengths and weaknesses of thePDM. The results indicate the following. First, good PDM algorithms are usually not I/O bound. Second, of the four PDM parameters, one (problem size) is a good indicator of I/O time and running time, one (memory size) is a good indicator of I/O time but not necessarily running time, and the other two (block size and number of disks) do not necessarily indicate either I/O or running time. Third, because PDM algorithms tendnottobeI/Obound, using asynchronous I/O can reduce I/O wait times signi cantly. The software interface to the PDM is part of the ViC * runtime library. The interface is a set of wrappers that are designed to be both e cient and portable across several underlying le systems and target machines. 1
Portable HighPerformance Programs
, 1999
"... right notice and this permission notice are preserved on all copies. ..."
right notice and this permission notice are preserved on all copies.
Duality between prefetching and queued writing with applications to external sorting
 IN EUROPEAN SYMPOSIUM ON ALGORITHMS, VOLUME 2161 OF LECTURE NOTES IN COMPUTER SCIENCE
, 1998
"... Parallel disks promise to be a cost effective means for achieving high bandwidth in applications involving massive data sets, but algorithms for parallel disks can be difficult to devise. To combat this problem, we define a useful and natural duality between writing to parallel disks and the seeming ..."
Parallel disks promise to be a cost effective means for achieving high bandwidth in applications involving massive data sets, but algorithms for parallel disks can be difficult to devise. To combat this problem, we define a useful and natural duality between writing to parallel disks and the seemingly more difficult problem of prefetching. We first explore this duality for applications involving readonce accesses using parallel disks. We get a simple linear time algorithm for computing optimal prefetch schedules and analyze the efficiency of the resulting schedules for randomly placed data and for arbitrary interleaved accesses to striped sequences. Duality also provides an optimal schedule for the integrated caching and prefetching problem, in which blocks can be accessed multiple times. Another application of this duality gives us the rst parallel disk sorting algorithms that are provably optimal up to lower order terms. One of these algorithms is a simple and practical variant of multiway merge sort, addressing a question that has been open for some time.