## Deterministic Sorting in Nearly Logarithmic Time on the Hypercube and Related Computers (1996)

### Cached

### Download Links

- [www.cs.utexas.edu]
- [www.cs.utexas.edu]
- [www.cs.jhu.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | Journal of Computer and System Sciences |

Citations: | 67 - 10 self |

### BibTeX

@INPROCEEDINGS{Cypher96deterministicsorting,

author = {Robert Cypher and C. Greg Plaxton},

title = {Deterministic Sorting in Nearly Logarithmic Time on the Hypercube and Related Computers},

booktitle = {Journal of Computer and System Sciences},

year = {1996},

pages = {193--203}

}

### OpenURL

### Abstract

This paper presents a deterministic sorting algorithm, called Sharesort, that sorts n records on an n-processor hypercube, shuffle-exchange, or cube-connected cycles in O(log n (log log n) 2 ) time in the worst case. The algorithm requires only a constant amount of storage at each processor. The fastest previous deterministic algorithm for this problem was Batcher's bitonic sort, which runs in O(log 2 n) time. Supported by an NSERC postdoctoral fellowship, and DARPA contracts N00014--87--K--825 and N00014-- 89--J--1988. 1 Introduction Given n records distributed uniformly over the n processors of some fixed interconnection network, the sorting problem is to route the record with the ith largest associated key to processor i, 0 i ! n. One of the earliest parallel sorting algorithms is Batcher's bitonic sort [3], which runs in O(log 2 n) time on the hypercube [10], shuffle-exchange [17], and cube-connected cycles [14]. More recently, Leighton [9] exhibited a bounded-degree,...

### Citations

501 | Sorting networks and their applications
- Batcher
- 1968
(Show Context)
Citation Context ...erconnection network, the sorting problem is to route the record with the ith largest associated key to processor i, 0si ! n. One of the earliest parallel sorting algorithms is Batcher's bitonic sort =-=[3]-=-, which runs in O(log 2 n) time on the hypercube [10], shuffle-exchange [17], and cube-connected cycles [14]. More recently, Leighton [9] exhibited a bounded-degree, O(log n)-time sorting network base... |

209 |
An O(n log n) sorting network
- Ajtai, Komlós, et al.
- 1983
(Show Context)
Citation Context ...17], and cube-connected cycles [14]. More recently, Leighton [9] exhibited a bounded-degree, O(log n)-time sorting network based on the O(log n)-depth sorting circuit of Ajtai, Koml'os and Szemer'edi =-=[1]-=-. However, no efficient emulation of Leighton's sorting network is known for the hypercube, and it has been shown that such an emulation requires\Omega\Gammaqui 2 n) time on the shuffle-exchange or cu... |

207 |
The cubeconnected cycles: A versatile network for parallel computation
- Preparata, Vuillemin
- 1981
(Show Context)
Citation Context ...cessor i, 0si ! n. One of the earliest parallel sorting algorithms is Batcher's bitonic sort [3], which runs in O(log 2 n) time on the hypercube [10], shuffle-exchange [17], and cube-connected cycles =-=[14]-=-. More recently, Leighton [9] exhibited a bounded-degree, O(log n)-time sorting network based on the O(log n)-depth sorting circuit of Ajtai, Koml'os and Szemer'edi [1]. However, no efficient emulatio... |

167 |
Tight bounds on the complexity of parallel sorting
- Leighton
- 1985
(Show Context)
Citation Context ...earliest parallel sorting algorithms is Batcher's bitonic sort [3], which runs in O(log 2 n) time on the hypercube [10], shuffle-exchange [17], and cube-connected cycles [14]. More recently, Leighton =-=[9]-=- exhibited a bounded-degree, O(log n)-time sorting network based on the O(log n)-depth sorting circuit of Ajtai, Koml'os and Szemer'edi [1]. However, no efficient emulation of Leighton's sorting netwo... |

165 |
Parallel processing with the perfect shuffle
- Stone
- 1971
(Show Context)
Citation Context ...th largest associated key to processor i, 0si ! n. One of the earliest parallel sorting algorithms is Batcher's bitonic sort [3], which runs in O(log 2 n) time on the hypercube [10], shuffle-exchange =-=[17]-=-, and cube-connected cycles [14]. More recently, Leighton [9] exhibited a bounded-degree, O(log n)-time sorting network based on the O(log n)-depth sorting circuit of Ajtai, Koml'os and Szemer'edi [1]... |

120 |
A logarithmic time sort for linear size networks
- Reif, Valiant
- 1987
(Show Context)
Citation Context ...he gap between the trivial\Omega\Gammaivi n) lower bound and the O(log 2 n) upper bound remained open. A noteworthy breakthrough was provided by the randomized Flashsort algorithm of Reif and Valiant =-=[15]-=-, which sorts every possible input permutation with high probability in O(log n) time on a cube-connected cycles. In contrast, this work is the first to narrow the gap in terms of worst case, determin... |

112 |
Sorting in c log n parallel steps
- Ajtai, Komlos, et al.
- 1983
(Show Context)
Citation Context ...17], and cube-connected cycles [14]. More recently, Leighton [9] exhibited a bounded-degree, O(log n)-time sorting network based on the O(log n)-depth sorting circuit of Ajtai, Koml'os and Szemer'edi =-=[1]-=-. However, no efficient emulation of Leighton's sorting network is known for the hypercube, and it has been shown that such an emulation requires\Omega\Gammaqui 2 n) time on the shuffle-exchange or cu... |

70 |
Parallel Permutation and Sorting Algorithms and a New Generalized Connection Network
- Nassimi, Sahni
- 1982
(Show Context)
Citation Context ... cost of these monotone routes is O(d 0 ). Sparse enumeration sort is a useful sorting technique when the number of records to be sorted, n, is much smaller than the number of processors available, p =-=[12]-=-. Sparse enumeration sort runs in O(log n log p= log(p=n)) time. Finally, it is possible to efficiently simulate a large parallel computer with a small one. Specifically, for any constant c, an n proc... |

66 |
Data broadcasting in SIMD computers
- Nassimi, Sahni
- 1981
(Show Context)
Citation Context ...e the record with the ith largest associated key to processor i, 0si ! n. One of the earliest parallel sorting algorithms is Batcher's bitonic sort [3], which runs in O(log 2 n) time on the hypercube =-=[10]-=-, shuffle-exchange [17], and cube-connected cycles [14]. More recently, Leighton [9] exhibited a bounded-degree, O(log n)-time sorting network based on the O(log n)-depth sorting circuit of Ajtai, Kom... |

60 | An O(logN) Deterministic Packet Routing Scheme. STOC - Upfal - 1998 |

31 |
Parallel algorithms to set up the benes permutation network
- Nassimi, Sahni
- 1982
(Show Context)
Citation Context ...tained by applying known methods to this problem. Bitonic sort gives a solution with T p = 0 and T r = O(log 2 n). Using Nassimi and Sahni's parallel algorithm to set up the Benes permutation network =-=[9]-=- yields a solution with T p = O(log 3 n) and T r = O(log n). It is natural to ask whether any intermediate point exists between these two extremes with T p +T r = o(log 2 n), and in fact, Section 4.2 ... |

30 |
A Self-Routing Benes Network and Parallel Permutation Algorithms
- Nassimi, Sahni
- 1981
(Show Context)
Citation Context ...array location. Bit-Permute-Complement (BPC) routing performs a permutation of n records where the destination addresses are calculated by permuting and complementing the bits of the source addresses =-=[11]-=-. Broadcasting copies a record from one processor to all n processors [10]. Bitonic merging is the basic operation underlying Batcher's bitonic sort. Given two sorted lists, each of length at most n, ... |

25 |
Theoretical aspects of VLSI pin limitations
- Cypher
- 1993
(Show Context)
Citation Context ...ent emulation of Leighton's sorting network is known for the hypercube, and it has been shown that such an emulation requires\Omega\Gammaqui 2 n) time on the shuffle-exchange or cube-connected cycles =-=[6]-=-. Hence, for these networks the problem of closing the gap between the trivial\Omega\Gammaivi n) lower bound and the O(log 2 n) upper bound remained open. A noteworthy breakthrough was provided by the... |

22 |
Efficient Computation on Sparse Interconnection Networks
- Plaxton
- 1989
(Show Context)
Citation Context ...ection algorithm known for cube-type computers is based on the O((log log n) 2 ) selection algorithm devised by Cole and Yap for the parallel comparison model [4], and runs in O(log n log log n) time =-=[13]-=-. If the input consists of n 1\Gammaffl sorted sublists of length n ffl for some constant ffl ? 0, algorithm FindSplitters() can be used to perform O(n ffl 0 ) evenly-spaced selections in O(log n) tim... |

18 |
Quotient Networks
- Fishburn, Finkel
(Show Context)
Citation Context ...e. Specifically, for any constant c, an n processor hypercube, shuffle-exchange, or cube-connected cycles can simulate a cn processor machine of the same topology with only a constant factor slowdown =-=[8]. 3 The To-=-p-Level Routine This section defines the top-level routine of the Sharesort algorithm. Sharesort belongs to the class of "bottom-up" sorting algorithms based on the principle of recursive me... |

12 |
A parallel median algorithm
- Cole, Yap
- 1985
(Show Context)
Citation Context ...Currently, the asymptotically fastest selection algorithm known for cube-type computers is based on the O((log log n) 2 ) selection algorithm devised by Cole and Yap for the parallel comparison model =-=[4]-=-, and runs in O(log n log log n) time [13]. If the input consists of n 1\Gammaffl sorted sublists of length n ffl for some constant ffl ? 0, algorithm FindSplitters() can be used to perform O(n ffl 0 ... |

4 |
Efficient Communication in Massively Parallel Computers
- Cypher
- 1989
(Show Context)
Citation Context ...ire more time for communication than do others. However, it has been shown that this complication can always be managed in time proportional to the running time of the shuffle-exchange implementation =-=[5]. Our boun-=-ds on the running time of Sharesort also apply to the butterfly topology (with or without "wrap-around" connections), which is closely related to the cube-connected cycles. The following tec... |