Results 1 
2 of
2
Radix Sort For Vector Multiprocessors
 In Proceedings Supercomputing '91
, 1991
"... We have designed a radix sort algorithm for vector multiprocessors and have implemented the algorithm on the CRAY YMP. On one processor of the YMP, our sort is over 5 times faster on large sorting problems than the optimized library sort provided by CRAY Research. On eight processors we achieve a ..."
Abstract

Cited by 43 (6 self)
 Add to MetaCart
We have designed a radix sort algorithm for vector multiprocessors and have implemented the algorithm on the CRAY YMP. On one processor of the YMP, our sort is over 5 times faster on large sorting problems than the optimized library sort provided by CRAY Research. On eight processors we achieve an additional speedup of almost 5, yielding a routine over 25 times faster than the library sort. Using this multiprocessor version, we can sort at a rate of 15 million 64bit keys per second. Our sorting algorithm is adapted from a dataparallel algorithm previously designed for a highly parallel Single Instruction Multiple Data (SIMD) computer, the Connection Machine CM2. To develop our version we introduce three general techniques for mapping dataparallel algorithms ontovector multiprocessors. These techniques allow us to fully vectorize and parallelize the algorithm. The paper also derives equations that model the performance of our algorithm on the YMP. These equations are then used t...
Periodic Merging Networks
 Theory of Computing Systems
, 1998
"... Abstract. We consider the problem of merging two sorted sequences on constant degree networks using comparators only. The classical solution to the problem are the networks based on Batcher’s OddEven Merge and Bitonic Merge running in log(2n) time. Due to the obvious log n lower bound for the runti ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. We consider the problem of merging two sorted sequences on constant degree networks using comparators only. The classical solution to the problem are the networks based on Batcher’s OddEven Merge and Bitonic Merge running in log(2n) time. Due to the obvious log n lower bound for the runtime, this is timeoptimal. We present new merging networks that have a novel property of being periodic: for some (small) constant k, each processing unit of the network performs the same operations at steps t and t+k (as long as t+k does not exceed the runtime.) The only operations executed are compareexchange operations, just like in the case of the Batcher’s networks. The architecture of the networks is very simple, easy to be laid out. The runtimes achieved are c · log n, for a small constant c. 1