Abstract:
We have developed a methodology for predicting the performance of parallel algorithms on real parallel machines. The methodology consists of two steps. First, we characterize a machine by enumerating the primitive operations that it is capable of performing along with the cost of each operation. Next, we analyze an algorithm by making a precise count of the number of times the algorithm performs each type of operation. We have used this methodology to evaluate many of the parallel sorting algorithms proposed in the literature. Of these, we selected the three most promising, Batcher's bitonic sort, a parallel radix sort, and a sample sort similar to Reif and Valiant's flashsort, and implemented them on the Connection Machine model CM-2. This paper analyzes the three algorithms in detail and discusses the issues that led us to our particular implementations. On the CM-2 the predicted performance of the algorithms closely matches the observed performance, and hence our methodology can be ...
Citations
|
6121
|
Introduction to Algorithms
– Cormen, Leiserson, et al.
- 2001
|
|
367
|
Sorting networks and their applications
– Batcher
- 1968
|
|
269
|
Parallel merge sort
– Cole
- 1988
|
|
214
|
Vector Models for Data-Parallel Computing
– Blelloch
- 1990
|
|
173
|
A guided tour of Chernoff bounds
– Hagerup, Rub
- 1990
|
|
155
|
Tight bounds on the complexity of parallel sorting
– Leighton
- 1985
|
|
114
|
A logarithmic time sort for linear size networks
– Reif, Valiant
- 1987
|
|
94
|
Sorting in c log n parallel steps
– AJTAI, KOMLĂ“S, et al.
- 1983
|
|
72
|
Parallel sorting algorithms
– Akl
- 1985
|
|
69
|
Deterministic sorting in nearly logarithmic time on the hypercube and related computers
– Cypher, Plaxton
- 1990
|
|
68
|
Parallel permutation and sorting algorithms and a new generalized connection network
– Nassimi, Sahni
- 1982
|
|
40
|
Samplesort: A sampling approach to minimal storage tree sorting
– Frazer, McKellar
|
|
38
|
Parallel sorting and data partitioning by sampling
– Huang, Chow
- 1983
|
|
35
|
Radix sort for vector multiprocessors
– Zagha, Blelloch
- 1991
|
|
34
|
Optimal sorting algorithms for parallel computers
– Baudet, Stevenson
- 1978
|
|
28
|
Implementations of randomized sorting on large parallel machines
– Hightower, Prins, et al.
|
|
22
|
Efficient Computation on Sparse Interconnection Networks
– Plaxton
- 1989
|
|
20
|
An improved supercomputer sorting benchmark
– Thearling, Smith
- 1992
|
|
17
|
A (fairly) simple circuit that (usually) sorts
– Leighton, Plaxton
- 1990
|
|
17
|
Supporting the hypercube programming model on mesh architectures (A fast sorter for iWarp tori
– Stricker
- 1992
|
|
14
|
Hyperquicksort: A fast sorting algorithm for hypercubes
– Wagar
- 1986
|
|
12
|
Cubesort: A parallel algorithm for sorting N data items with S-sorters
– Cypher, Sanz
- 1992
|
|
12
|
A balanced bin sort for hypercube multicomputers
– Won, Sahni
- 1988
|
|
11
|
Combining parallel and sequential sorting on a boolean n-cube
– Johnsson
- 1984
|
|
8
|
Binsorting on hypercubes with d-port communication
– Seidel, George
- 1988
|
|
2
|
Implementations of randomized sorting on large parallel machines
– Reif
- 1992
|