Results 1  10
of
27
CommunicationEfficient Parallel Sorting
, 1996
"... We study the problem of sorting n numbers on a pprocessor bulksynchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processortoprocessor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sort ..."
Abstract

Cited by 64 (2 self)
 Add to MetaCart
We study the problem of sorting n numbers on a pprocessor bulksynchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processortoprocessor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sorting methods that use internal computation time that is O( n log n p ) and a number of communication rounds that is O( log n log(h+1) ) for h = \Theta(n=p). The internal computation bound is optimal for any comparisonbased sorting algorithm. Moreover, the number of communication rounds is bounded by a constant for the (practical) situations when p n 1\Gamma1=c for a constant c 1. In fact, we show that our bound on the number of communication rounds is asymptotically optimal for the full range of values for p, for we show that just computing the "or" of n bits distributed evenly to the first O(n=h) of an arbitrary number of processors in a BSP computer requires\Omega\Gammaqui n= log(h...
Packet Routing In FixedConnection Networks: A Survey
, 1998
"... We survey routing problems on fixedconnection networks. We consider many aspects of the routing problem and provide known theoretical results for various communication models. We focus on (partial) permutation, krelation routing, routing to random destinations, dynamic routing, isotonic routing ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
We survey routing problems on fixedconnection networks. We consider many aspects of the routing problem and provide known theoretical results for various communication models. We focus on (partial) permutation, krelation routing, routing to random destinations, dynamic routing, isotonic routing, fault tolerant routing, and related sorting results. We also provide a list of unsolved problems and numerous references.
Implementations of Randomized Sorting on Large Parallel Machines
"... Flashsort [RV83,86] and Samplesort [HC83] are related parallel sorting algorithms proposed in the literature. Both utilize a sophisticated randomized sampling technique to form a splitter set, but Samplesort distributes the splitter set to each processor while Flashsort uses splitterdirected routin ..."
Abstract

Cited by 28 (1 self)
 Add to MetaCart
Flashsort [RV83,86] and Samplesort [HC83] are related parallel sorting algorithms proposed in the literature. Both utilize a sophisticated randomized sampling technique to form a splitter set, but Samplesort distributes the splitter set to each processor while Flashsort uses splitterdirected routing. In this
Supporting the hypercube programming model on mesh architectures (A fast sorter for iWarp tori)
, 1992
"... ..."
Approximate and Exact Deterministic Parallel Selection
, 1993
"... The selection problem of size n is, given a set of n elements drawn from an ordered universe and an integer k with 1 k n, to identify the kth smallest element in the set. We study approximate and exact selection on deterministic concurrentread concurrentwrite parallel RAMs, where approximate sel ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
The selection problem of size n is, given a set of n elements drawn from an ordered universe and an integer k with 1 k n, to identify the kth smallest element in the set. We study approximate and exact selection on deterministic concurrentread concurrentwrite parallel RAMs, where approximate selection with relative accuracy ? 0 asks for any element whose true rank differs from k by at most n. Our main results are: (1) Exact selection problems of size n can be solved in O(logn=log log n) time with O(n log log n=logn) processors. This running time is the best possible (using only a polynomial number of processors) , and the number of processors is optimal for the given running time (optimal speedup); the best previous algorithm achieves optimal speedup with a running time of O(logn log n=log log n). (2) For all t (log log n) 4 log n, approximate selection problems of size n can be solved in O(t) time with optimal speedup with relative accuracy 2 \Gammat loglog log n=(log logn) ...
Hypercubic Sorting Networks
 SIAM J. Comput
, 1998
"... . This paper provides an analysis of a natural dround tournamentover n = 2 d players, and demonstrates that the tournament possesses a surprisingly strong ranking property. The ranking property of this tournament is used to design efficient sorting algorithms for a variety of different models of ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
. This paper provides an analysis of a natural dround tournamentover n = 2 d players, and demonstrates that the tournament possesses a surprisingly strong ranking property. The ranking property of this tournament is used to design efficient sorting algorithms for a variety of different models of parallel computation: (i) a comparator network of depth c \Delta lg n, c 7:44, that sorts the vast majority of the n! possible input permutations, (ii) an O(lg n)depth hypercubic comparator network that sorts the vast majority of permutations, (iii) a hypercubic sorting network with nearly logarithmic depth, (iv) an O(lgn)time randomized sorting algorithm for any hypercubic machine (other such algorithms have been previously discovered, but this algorithm has a significantly smaller failure probability than any previously known algorithm), and (v) a randomized algorithm for sorting n O(m)bit records on an (n lg n)node omega machine in O(m + lg n) bit steps. Key words. parallel sort...
Sorting on a Parallel Pointer Machine with Applications to Set Expression Evaluation
 J. ACM
, 1989
"... We present optimal algorithms for sorting on parallel CREW and EREW versions of the pointer machine model. Intuitively, one can view our methods as being based on a parallel mergesort using linked lists rather than arrays (the usual parallel data structure). We also show how to exploit the "locality ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
We present optimal algorithms for sorting on parallel CREW and EREW versions of the pointer machine model. Intuitively, one can view our methods as being based on a parallel mergesort using linked lists rather than arrays (the usual parallel data structure). We also show how to exploit the "locality" of our approach to solve the set expression evaluation problem, a problem with applications to database querying and logicprogramming, in O(log n) time using O(n) processors. Interestingly, this is an asymptotic improvement over what seems possible using previous techniques. Categories and Subject Descriptors: E.1 [Data Structures]: arrays, lists; F.2.2. [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problemssorting and searching General Terms: Algorithms, Theory, Verification Additional Key Words and Phrases: parallel algorithms, PRAM, pointer machine, linking automaton, expression evaluation, mergesort, cascade merging 1 Introduction One of the primar...
Logarithmic time cost optimal parallel sorting is not yet fast in practice
 August), Dept. of Computer Science, Brown University
, 1990
"... When looking for new and faster parallel sorting algorithms for use in massively parallel systems it is tempting to investigate promising alternatives from the large body of research doneon parallel sorting in the eld of theoretical computer science. Such \theoretical " algorithms are mainly describ ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
When looking for new and faster parallel sorting algorithms for use in massively parallel systems it is tempting to investigate promising alternatives from the large body of research doneon parallel sorting in the eld of theoretical computer science. Such \theoretical " algorithms are mainly described for the PRAM (Parallel Random Access Machine) model of computation [13, 26]. This paper shows how this kind of investigation can be done on a simple but versatile environment forprogramming and measuring of PRAM algorithms [18, 19]. The practical value of Cole's Parallel Merge Sort algorithm [10,11] have beeninvestigated by comparing it with Batcher's bitonic sorting [5]. The O(log n) time consumption of Cole's algorithm implies that it must be faster than bitonic sorting which is O(log 2 n) timeif n is large enough. However, we havefound that bitonic sorting is faster as long as n is less than 1:2 1021, i.e. more than 1 Giga Tera items!. Consequently, Cole's logarithmic time algorithm is not fast in practice. 1Introduction and Motivation The work reported in this paper is an attempt to lessen the gap between theory and practice within the eld of parallel computing. Within theoretical computer science, parallel algorithms are mainly compared by using asymptotical analysis (Onotation). This paper gives an example on how the analysis of implemented algorithms on nite problems provides new and more practically oriented results than those traditionally obtained by asymptotical analysis. Parallel Complexity TheoryA Rich Source for
Lecture notes on the new AKS sorting network
, 1992
"... Ajtai, Komlós, and Szemerédi constructed sorting networks with N wires of depth O(logN). They were not concerned with the value of the proportionality constant implicit in the Onotation; subsequently Paterson replaced the O(logN) by c log 2 N with c under 6100. We describe an implementation of a mo ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Ajtai, Komlós, and Szemerédi constructed sorting networks with N wires of depth O(logN). They were not concerned with the value of the proportionality constant implicit in the Onotation; subsequently Paterson replaced the O(logN) by c log 2 N with c under 6100. We describe an implementation of a more recent, and as yet unpublished, proposal of Ajtai, Komlós, and Szemerédi, that yields a smaller value of c: for every integer N such that N ≥ 2 78 there is a sorting network on N wires whose depth is at most 1830 log 2 N − 58657. The basic units in this new construction are sorting networks on M wires such that M is relatively small; these may be thought of as indivisible hardware elements (rather than networks made from comparators); following Knuth, we call them Msorters. For every choice of positive integers M and N such that N ≥ M, the construction yields a sorting network on N wires, made from Msorters, whose depth is at most (48 + o(1)) log M N + 115 as M → ∞. (It is worth emphasizing that the asymptotic o(1) here is relative to M rather than N.) 2 1
Feasible TimeOptimal Algorithms for Boolean Functions on ExclusiveWrite PRAMs
, 1994
"... It was shown some years ago that the computation time for many important Boolean functions of n arguments on concurrentread exclusivewrite parallel randomaccess machines (CREW PRAMs) of unlimited size is at least '(n) 0:72 log 2 n. On the other hand, it is known that every Boolean function of n ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
It was shown some years ago that the computation time for many important Boolean functions of n arguments on concurrentread exclusivewrite parallel randomaccess machines (CREW PRAMs) of unlimited size is at least '(n) 0:72 log 2 n. On the other hand, it is known that every Boolean function of n arguments can be computed in '(n) + 1 steps on a CREW PRAM with n \Delta 2 n\Gamma1 processors and memory cells. In the case of the OR of n bits, n processors and cells are sufficient. In this paper it is shown that for many important functions there are CREW PRAM algorithms that almost meet the lower bound in that they take '(n) + o(log n) steps, but use only a small number of processors and memory cells (in most cases, n). In addition, the cells only have to store binary words of bounded length (in most cases, length 1). We call such algorithms "feasible". The functions concerned include: the PARITY function and, more generally, all symmetric functions; a large class of Boolean formulas...