Results 1 
2 of
2
Radix Sort For Vector Multiprocessors
 In Proceedings Supercomputing '91
, 1991
"... We have designed a radix sort algorithm for vector multiprocessors and have implemented the algorithm on the CRAY YMP. On one processor of the YMP, our sort is over 5 times faster on large sorting problems than the optimized library sort provided by CRAY Research. On eight processors we achieve a ..."
Abstract

Cited by 44 (6 self)
 Add to MetaCart
(Show Context)
We have designed a radix sort algorithm for vector multiprocessors and have implemented the algorithm on the CRAY YMP. On one processor of the YMP, our sort is over 5 times faster on large sorting problems than the optimized library sort provided by CRAY Research. On eight processors we achieve an additional speedup of almost 5, yielding a routine over 25 times faster than the library sort. Using this multiprocessor version, we can sort at a rate of 15 million 64bit keys per second. Our sorting algorithm is adapted from a dataparallel algorithm previously designed for a highly parallel Single Instruction Multiple Data (SIMD) computer, the Connection Machine CM2. To develop our version we introduce three general techniques for mapping dataparallel algorithms ontovector multiprocessors. These techniques allow us to fully vectorize and parallelize the algorithm. The paper also derives equations that model the performance of our algorithm on the YMP. These equations are then used t...
A Vectorized HashJoin
, 1996
"... A vector instruction set is a well known method for exposing bandwidth to applications. Although extensively studied in the scientific programming community, less work exists on vectorizing other kinds of applications. This work examines vectorizing a traditional database operation, a Grace hashjoi ..."
Abstract
 Add to MetaCart
(Show Context)
A vector instruction set is a well known method for exposing bandwidth to applications. Although extensively studied in the scientific programming community, less work exists on vectorizing other kinds of applications. This work examines vectorizing a traditional database operation, a Grace hashjoin. We how to vectorize both the hash and join phases of the algorithm, and present performance results on a Cray C90 as well as traditional microprocessors. We concluded that vector scattergather and compress are essential to both this algorithm as well as to other nonscientific codes. 1.