Results 1  10
of
37
CommunicationEfficient Parallel Sorting
, 1996
"... We study the problem of sorting n numbers on a pprocessor bulksynchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processortoprocessor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sort ..."
Abstract

Cited by 74 (5 self)
 Add to MetaCart
(Show Context)
We study the problem of sorting n numbers on a pprocessor bulksynchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processortoprocessor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sorting methods that use internal computation time that is O( n log n p ) and a number of communication rounds that is O( log n log(h+1) ) for h = \Theta(n=p). The internal computation bound is optimal for any comparisonbased sorting algorithm. Moreover, the number of communication rounds is bounded by a constant for the (practical) situations when p n 1\Gamma1=c for a constant c 1. In fact, we show that our bound on the number of communication rounds is asymptotically optimal for the full range of values for p, for we show that just computing the "or" of n bits distributed evenly to the first O(n=h) of an arbitrary number of processors in a BSP computer requires\Omega\Gammaqui n= log(h...
Packet Routing In FixedConnection Networks: A Survey
, 1998
"... We survey routing problems on fixedconnection networks. We consider many aspects of the routing problem and provide known theoretical results for various communication models. We focus on (partial) permutation, krelation routing, routing to random destinations, dynamic routing, isotonic routing ..."
Abstract

Cited by 36 (3 self)
 Add to MetaCart
We survey routing problems on fixedconnection networks. We consider many aspects of the routing problem and provide known theoretical results for various communication models. We focus on (partial) permutation, krelation routing, routing to random destinations, dynamic routing, isotonic routing, fault tolerant routing, and related sorting results. We also provide a list of unsolved problems and numerous references.
Implementations of Randomized Sorting on Large Parallel Machines
, 1992
"... Flashsort [RV83,86] and Samplesort [HC83] are related parallel sorting algorithms proposed in the literature. Both utilize a sophisticated randomized sampling technique to form a splitter set, but Samplesort distributes the splitter set to each processor while Flashsort uses splitterdirected routin ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
Flashsort [RV83,86] and Samplesort [HC83] are related parallel sorting algorithms proposed in the literature. Both utilize a sophisticated randomized sampling technique to form a splitter set, but Samplesort distributes the splitter set to each processor while Flashsort uses splitterdirected routing. In this paper we present BFlashsort, a new batchedrouting variant of Flashsort designed to sort N>P values using P processors connected in a ddimensional mesh and using constant space in addition to the input and output. The key advantage of the Flashsort approach over Samplesort is a decrease in memory requirements, by avoiding the broadcast of the splitter set to all processors. The practical advantage of BFlashsort over Flashsort is that it replaces pipelined splitterdirected routing with a set of synchronous local communications and bounds recursion, while still being demonstrably efficient. The performance of BFlashsort and Samplesort is compared using a parameterized analytic model in the style of [BLM+91] to show that on a ddimensional toroidal mesh BFlashsort improves on Samplesort when (N/P)ּ<ּP/(c 1log P +c 2dP 1/d +c 3), for machinedependent parameters c 1, c 2, and c 3. Empirical confirmation of the analytical model is obtained through implementations on a MasPar MP1 of Samplesort and two BFlashsort variants.
Randomized Shellsort: A simple oblivious sorting algorithm
 In Proceedings 21st ACMSIAM Symposium on Discrete Algorithms (SODA
, 2010
"... In this paper, we describe a randomized Shellsort algorithm. This algorithm is a simple, randomized, dataoblivious version of the Shellsort algorithm that always runs in O(n log n) time and succeeds in sorting any given input permutation with very high probability. Taken together, these properties ..."
Abstract

Cited by 28 (8 self)
 Add to MetaCart
(Show Context)
In this paper, we describe a randomized Shellsort algorithm. This algorithm is a simple, randomized, dataoblivious version of the Shellsort algorithm that always runs in O(n log n) time and succeeds in sorting any given input permutation with very high probability. Taken together, these properties imply applications in the design of new efficient privacypreserving computations based on the secure multiparty computation (SMC) paradigm. In addition, by a trivial conversion of this Monte Carlo algorithm to its Las Vegas equivalent, one gets the first version of Shellsort with a running time that is provably O(n log n) with very high probability. 1
Sorting on a parallel pointer machine with applications to set expression evaluation
 In 30th Annual Symposium on Foundations of Computer Science
, 1989
"... ..."
(Show Context)
Supporting the hypercube programming model on mesh architectures (A fast sorter for iWarp tori)
, 1992
"... ..."
Hypercubic Sorting Networks
 SIAM J. Comput
, 1998
"... . This paper provides an analysis of a natural dround tournamentover n = 2 d players, and demonstrates that the tournament possesses a surprisingly strong ranking property. The ranking property of this tournament is used to design efficient sorting algorithms for a variety of different models of ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
. This paper provides an analysis of a natural dround tournamentover n = 2 d players, and demonstrates that the tournament possesses a surprisingly strong ranking property. The ranking property of this tournament is used to design efficient sorting algorithms for a variety of different models of parallel computation: (i) a comparator network of depth c \Delta lg n, c 7:44, that sorts the vast majority of the n! possible input permutations, (ii) an O(lg n)depth hypercubic comparator network that sorts the vast majority of permutations, (iii) a hypercubic sorting network with nearly logarithmic depth, (iv) an O(lgn)time randomized sorting algorithm for any hypercubic machine (other such algorithms have been previously discovered, but this algorithm has a significantly smaller failure probability than any previously known algorithm), and (v) a randomized algorithm for sorting n O(m)bit records on an (n lg n)node omega machine in O(m + lg n) bit steps. Key words. parallel sort...
Approximate and Exact Deterministic Parallel Selection
, 1993
"... The selection problem of size n is, given a set of n elements drawn from an ordered universe and an integer k with 1 k n, to identify the kth smallest element in the set. We study approximate and exact selection on deterministic concurrentread concurrentwrite parallel RAMs, where approximate sel ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
The selection problem of size n is, given a set of n elements drawn from an ordered universe and an integer k with 1 k n, to identify the kth smallest element in the set. We study approximate and exact selection on deterministic concurrentread concurrentwrite parallel RAMs, where approximate selection with relative accuracy ? 0 asks for any element whose true rank differs from k by at most n. Our main results are: (1) Exact selection problems of size n can be solved in O(logn=log log n) time with O(n log log n=logn) processors. This running time is the best possible (using only a polynomial number of processors) , and the number of processors is optimal for the given running time (optimal speedup); the best previous algorithm achieves optimal speedup with a running time of O(logn log n=log log n). (2) For all t (log log n) 4 log n, approximate selection problems of size n can be solved in O(t) time with optimal speedup with relative accuracy 2 \Gammat loglog log n=(log logn) ...
Logarithmic time cost optimal parallel sorting is not yet fast in practice
 August), Dept. of Computer Science, Brown University
, 1990
"... When looking for new and faster parallel sorting algorithms for use in massively parallel systems it is tempting to investigate promising alternatives from the large body of research doneon parallel sorting in the eld of theoretical computer science. Such \theoretical " algorithms are mainly de ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
When looking for new and faster parallel sorting algorithms for use in massively parallel systems it is tempting to investigate promising alternatives from the large body of research doneon parallel sorting in the eld of theoretical computer science. Such \theoretical " algorithms are mainly described for the PRAM (Parallel Random Access Machine) model of computation [13, 26]. This paper shows how this kind of investigation can be done on a simple but versatile environment forprogramming and measuring of PRAM algorithms [18, 19]. The practical value of Cole's Parallel Merge Sort algorithm [10,11] have beeninvestigated by comparing it with Batcher's bitonic sorting [5]. The O(log n) time consumption of Cole's algorithm implies that it must be faster than bitonic sorting which is O(log 2 n) timeif n is large enough. However, we havefound that bitonic sorting is faster as long as n is less than 1:2 1021, i.e. more than 1 Giga Tera items!. Consequently, Cole's logarithmic time algorithm is not fast in practice. 1Introduction and Motivation The work reported in this paper is an attempt to lessen the gap between theory and practice within the eld of parallel computing. Within theoretical computer science, parallel algorithms are mainly compared by using asymptotical analysis (Onotation). This paper gives an example on how the analysis of implemented algorithms on nite problems provides new and more practically oriented results than those traditionally obtained by asymptotical analysis. Parallel Complexity TheoryA Rich Source for
Lecture notes on the new AKS sorting network
, 1992
"... Ajtai, Komlós, and Szemerédi constructed sorting networks with N wires of depth O(logN). They were not concerned with the value of the proportionality constant implicit in the Onotation; subsequently Paterson replaced the O(logN) by c log 2 N with c under 6100. We describe an implementation of a mo ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Ajtai, Komlós, and Szemerédi constructed sorting networks with N wires of depth O(logN). They were not concerned with the value of the proportionality constant implicit in the Onotation; subsequently Paterson replaced the O(logN) by c log 2 N with c under 6100. We describe an implementation of a more recent, and as yet unpublished, proposal of Ajtai, Komlós, and Szemerédi, that yields a smaller value of c: for every integer N such that N ≥ 2 78 there is a sorting network on N wires whose depth is at most 1830 log 2 N − 58657. The basic units in this new construction are sorting networks on M wires such that M is relatively small; these may be thought of as indivisible hardware elements (rather than networks made from comparators); following Knuth, we call them Msorters. For every choice of positive integers M and N such that N ≥ M, the construction yields a sorting network on N wires, made from Msorters, whose depth is at most (48 + o(1)) log M N + 115 as M → ∞. (It is worth emphasizing that the asymptotic o(1) here is relative to M rather than N.) 2 1