Results 1 
4 of
4
Duality between prefetching and queued writing with applications to external sorting
 IN EUROPEAN SYMPOSIUM ON ALGORITHMS, VOLUME 2161 OF LECTURE NOTES IN COMPUTER SCIENCE
, 1998
"... Parallel disks promise to be a cost effective means for achieving high bandwidth in applications involving massive data sets, but algorithms for parallel disks can be difficult to devise. To combat this problem, we define a useful and natural duality between writing to parallel disks and the seeming ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
(Show Context)
Parallel disks promise to be a cost effective means for achieving high bandwidth in applications involving massive data sets, but algorithms for parallel disks can be difficult to devise. To combat this problem, we define a useful and natural duality between writing to parallel disks and the seemingly more difficult problem of prefetching. We first explore this duality for applications involving readonce accesses using parallel disks. We get a simple linear time algorithm for computing optimal prefetch schedules and analyze the efficiency of the resulting schedules for randomly placed data and for arbitrary interleaved accesses to striped sequences. Duality also provides an optimal schedule for the integrated caching and prefetching problem, in which blocks can be accessed multiple times. Another application of this duality gives us the rst parallel disk sorting algorithms that are provably optimal up to lower order terms. One of these algorithms is a simple and practical variant of multiway merge sort, addressing a question that has been open for some time.
Slabpose columnsort: A new oblivious algorithm for outofcore sorting on distributedmemory clusters
, 2004
"... Our goal is to develop a robust outofcore sorting program for a distributedmemory cluster. The literature contains two dominant paradigms for outofcore sorting algorithms: mergingbased and partitioningbased. We explore a third paradigm, that of oblivious algorithms. Unlike the two dominant pa ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Our goal is to develop a robust outofcore sorting program for a distributedmemory cluster. The literature contains two dominant paradigms for outofcore sorting algorithms: mergingbased and partitioningbased. We explore a third paradigm, that of oblivious algorithms. Unlike the two dominant paradigms, oblivious algorithms do not depend on the input keys and therefore lead to predetermined I/O and communication patterns in an outofcore setting. We have developed several outofcore sorting programs using this paradigm. Our baseline implementation, 3pass columnsort, was based on Leighton’s columnsort algorithm. Though efficient in terms of I/O and communication, 3pass columnsort has a restriction on the maximum problem size. As our first effort toward relaxing this restriction, we developed two implementations: subblock columnsort and Mcolumnsort. Both of these implementations incur substantial performance costs: subblock columnsort performs additional disk I/O, and Mcolumnsort needs substantial amounts of extra communication and computation. In this paper, we present slabpose columnsort, a new oblivious algorithm that we have designed explicitly for the outofcore setting. Slabpose columnsort relaxes the problemsize restriction at no extra I/O or communication cost. Experimental evidence on a Beowulf cluster shows that unlike subblock columnsort and Mcolumnsort, slabpose columnsort runs almost as fast as 3pass columnsort. To the best of our knowledge, our implementations are the first outofcore multiprocessor sorting algorithms that make no assumptions about the keys and produce output that is perfectly load balanced and in the striped order assumed by the Parallel Disk Model.
Examining Committee:
, 2004
"... Sorting very large datasets is a key subroutine in almost any application that is built on top of a large database. Two ways to sort outofcore data dominate the literature: mergingbased algorithms and partitioningbased algorithms. Within these two paradigms, all the programs that sort outofcor ..."
Abstract
 Add to MetaCart
Sorting very large datasets is a key subroutine in almost any application that is built on top of a large database. Two ways to sort outofcore data dominate the literature: mergingbased algorithms and partitioningbased algorithms. Within these two paradigms, all the programs that sort outofcore data on a cluster rely on assumptions about the input distribution. We propose a third way of outofcore sorting: oblivious algorithms. In all, we have developed six programs that sort outofcore data on a cluster. The first three programs, based completely on Leighton’s columnsort algorithm, have a restriction on the maximum problem size that they can sort. The other three programs relax this restriction; two are based on our original algorithmic extensions to columnsort. We present experimental results to show that our algorithms perform well. To the best of our knowledge, the programs presented in this thesis are the first to sort outofcore data on a cluster without making any simplifying assumptions about the distribution of the data to be sorted. ii To my parents. Acknowledgments