Results 1  10
of
19
A Comparison of Sorting Algorithms for the Connection Machine CM2
"... We have implemented three parallel sorting algorithms on the Connection Machine Supercomputer model CM2: Batcher's bitonic sort, a parallel radix sort, and a sample sort similar to Reif and Valiant's flashsort. We have also evaluated the implementation of many other sorting algorithms proposed in t ..."
Abstract

Cited by 173 (6 self)
 Add to MetaCart
We have implemented three parallel sorting algorithms on the Connection Machine Supercomputer model CM2: Batcher's bitonic sort, a parallel radix sort, and a sample sort similar to Reif and Valiant's flashsort. We have also evaluated the implementation of many other sorting algorithms proposed in the literature. Our computational experiments show that the sample sort algorithm, which is a theoretically efficient "randomized" algorithm, is the fastest of the three algorithms on large data sets. On a 64Kprocessor CM2, our sample sort implementation can sort 32 10 6 64bit keys in 5.1 seconds, which is over 10 times faster than the CM2 library sort. Our implementation of radix sort, although not as fast on large data sets, is deterministic, much simpler to code, stable, faster with small keys, and faster on small data sets (few elements per processor). Our implementation of bitonic sort, which is pipelined to use all the hypercube wires simultaneously, is the least efficient of the three on large data sets, but is the most efficient on small data sets, and is considerably more space efficient. This paper analyzes the three algorithms in detail and discusses many practical issues that led us to the particular implementations.
Online algorithms for path selection in a nonblocking network
 SIAM Journal on Computing
, 1996
"... This paper presents the first optimaltime algorithms for path selection in an optimalsize nonblocking network. In particular, we describe an Ninput, Noutput, nonblocking network with O(N log N) boundeddegree nodes, and an algorithm that can satisfy any request for a connection or disconnection ..."
Abstract

Cited by 63 (14 self)
 Add to MetaCart
This paper presents the first optimaltime algorithms for path selection in an optimalsize nonblocking network. In particular, we describe an Ninput, Noutput, nonblocking network with O(N log N) boundeddegree nodes, and an algorithm that can satisfy any request for a connection or disconnection between an input and an output in O(log N) bit steps, even if many requests are made at once. Viewed in a telephone switching context, the algorithm can put through any set of calls among N parties in O(log N) bit steps, even if many calls are placed simultaneously. Parties can hang up and call again whenever they like; every call is still put through O(log N) bit steps after being placed. Viewed in a distributed memory machine context, our algorithm allows any processor to access any idle block of memory within O(log N) bit steps, no matter what other connections have been made previously or are being made simultaneously.
SmallDepth Counting Networks
, 1992
"... Generalizing the notion of a sorting network, Aspnes, Herlihy, and Shavit recently introduced a class of socalled "counting" networks, and established an O(lg 2 n) upper bound on the depth complexity of such networks. Their work was motivated by a number of practical applications arising in the dom ..."
Abstract

Cited by 41 (2 self)
 Add to MetaCart
Generalizing the notion of a sorting network, Aspnes, Herlihy, and Shavit recently introduced a class of socalled "counting" networks, and established an O(lg 2 n) upper bound on the depth complexity of such networks. Their work was motivated by a number of practical applications arising in the domain of asynchronous shared memory machines. This paper continues the analysis of counting networks, providing a number of new upper bounds. In particular, we present an explicit construction of an O(c lg* lg n) depth counting network, a randomized construction of an O(lgn)depth network (that works with extremely high probability), and using the random con struction we present an existential proof of a de terministic O(lgn)depth network. The latter result matches the trivial (lgn)depth lower bound to within a constant factor.
Fast Algorithms for BitSerial Routing on a Hypercube
, 1991
"... In this paper, we describe an O(log N)bitstep randomized algorithm for bitserial message routing on a hypercube. The result is asymptotically optimal, and improves upon the best previously known algorithms by a logarithmic factor. The result also solves the problem of online circuit switching in ..."
Abstract

Cited by 36 (9 self)
 Add to MetaCart
In this paper, we describe an O(log N)bitstep randomized algorithm for bitserial message routing on a hypercube. The result is asymptotically optimal, and improves upon the best previously known algorithms by a logarithmic factor. The result also solves the problem of online circuit switching in an O(1)dilated hypercube (i.e., the problem of establishing edgedisjoint paths between the nodes of the dilated hypercube for any onetoone mapping). Our algorithm is adaptive and we show that this is necessary to achieve the logarithmic speedup. We generalize the BorodinHopcroft lower bound on oblivious routing by proving that any randomized oblivious algorithm on a polylogarithmic degree network requires at least \Omega\Gammaast 2 N= log log N) bit steps with high probability for almost all permutations. 1 Introduction Substantial effort has been devoted to the study of storeandforward packet routing algorithms for hypercubic networks. The fastest algorithms are randomized, and c...
Analysis of Shellsort and related algorithms
 ESA ’96: Fourth Annual European Symposium on Algorithms
, 1996
"... This is an abstract of a survey talk on the theoretical and empirical studies that have been done over the past four decades on the Shellsort algorithm and its variants. The discussion includes: upper bounds, including linkages to numbertheoretic properties of the algorithm; lower bounds on Shellso ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
This is an abstract of a survey talk on the theoretical and empirical studies that have been done over the past four decades on the Shellsort algorithm and its variants. The discussion includes: upper bounds, including linkages to numbertheoretic properties of the algorithm; lower bounds on Shellsort and Shellsortbased networks; averagecase results; proposed probabilistic sorting networks based on the algorithm; and a list of open problems. 1 Shellsort The basic Shellsort algorithm is among the earliest sorting methods to be discovered (by D. L. Shell in 1959 [36]) and is among the easiest to implement, as exhibited by the following C code for sorting an array a[l],..., a[r]: shellsort(itemType a[], int l, int r) { int i, j, h; itemType v;
An Experimental Analysis of Parallel Sorting Algorithms
 THEORY OF COMPUTING SYSTEMS
, 1998
"... We have developed a methodology for predicting the performance of parallel algorithms on real parallel machines. The methodology consists of two steps. First, we characterize a machine by enumerating the primitive operations that it is capable of performing along with the cost of each operation. Ne ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
We have developed a methodology for predicting the performance of parallel algorithms on real parallel machines. The methodology consists of two steps. First, we characterize a machine by enumerating the primitive operations that it is capable of performing along with the cost of each operation. Next, we analyze an algorithm by making a precise count of the number of times the algorithm performs each type of operation. We have used this methodology to evaluate many of the parallel sorting algorithms proposed in the literature. Of these, we selected the three most promising, Batcher’s bitonic sort, a parallel radix sort, and a sample sort similar to Reif and Valiant’s flashsort, and implemented them on the connection Machine model CM2. This paper analyzes the three algorithms in detail and discusses the issues that led us to our particular implementations. On the CM2 the predicted performance of the algorithms closely matches the observed performance, and hence our methodology can be used to tune the algorithms for optimal performance. Although our programs were designed for the CM2, our conclusions about the merits of the three algorithms apply to other parallel machines as well.
Optimal Routing of Parentheses on the Hypercube
 IN PROCEEDINGS OF THE SYMPOSIUM ON PARALLEL ARCHITECTURES AND ALGORITHMS
, 1994
"... We consider a new class of routing requests or partial permutations for which we give optimal online routing algorithms on the hypercube and shuffleexchange network. For wellformed words of parentheses our algorithm establishes communication between all matching pairs in logarithmic time. It can ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
We consider a new class of routing requests or partial permutations for which we give optimal online routing algorithms on the hypercube and shuffleexchange network. For wellformed words of parentheses our algorithm establishes communication between all matching pairs in logarithmic time. It can be applied to the membership problem for Dyck languages and a number of problems for algebraic expressions.
Hypercubic Sorting Networks
 SIAM J. Comput
, 1998
"... . This paper provides an analysis of a natural dround tournamentover n = 2 d players, and demonstrates that the tournament possesses a surprisingly strong ranking property. The ranking property of this tournament is used to design efficient sorting algorithms for a variety of different models of ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
. This paper provides an analysis of a natural dround tournamentover n = 2 d players, and demonstrates that the tournament possesses a surprisingly strong ranking property. The ranking property of this tournament is used to design efficient sorting algorithms for a variety of different models of parallel computation: (i) a comparator network of depth c \Delta lg n, c 7:44, that sorts the vast majority of the n! possible input permutations, (ii) an O(lg n)depth hypercubic comparator network that sorts the vast majority of permutations, (iii) a hypercubic sorting network with nearly logarithmic depth, (iv) an O(lgn)time randomized sorting algorithm for any hypercubic machine (other such algorithms have been previously discovered, but this algorithm has a significantly smaller failure probability than any previously known algorithm), and (v) a randomized algorithm for sorting n O(m)bit records on an (n lg n)node omega machine in O(m + lg n) bit steps. Key words. parallel sort...
SmallDepth Counting Networks and Related Topics
, 1994
"... In [5], Aspnes, Herlihy, and Shavit generalized the notion of a sorting network by introducing a class of so called "counting" networks and establishing an O(lg 2 n) upper bound on the depth complexity of such networks. Their work was motivated by a number of practical applications arising in the do ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
In [5], Aspnes, Herlihy, and Shavit generalized the notion of a sorting network by introducing a class of so called "counting" networks and establishing an O(lg 2 n) upper bound on the depth complexity of such networks. Their work was motivated by a number of practical applications arising in the domain of asynchronous shared llemory machines. In this thesis, we continue the analysis of counting networks and produce a number of new upper bounds on their depths. Our results are predicated on the rich combinatorial structure which counting networks possess. In particular, we present a simple explicit construction of an O(lg n lg lg n)depth counting network, a randomized construction of an O(lg n)depth network (which works with extremely high probability), and we present an existential proof of a deterministic O(lg n)depth network. The latter result matches the trivial ((lg n)depth lower bound to within a constant factor. Our main result is a uniform polynomialtime construction of an O(lg n)depth counting network which depends heavily on the existential result, but makes use of extractor functions introduced in [25]. Using the extractor, we construct regular high degree hipattire graphs with extremely strong expansion properties. We believe this result is of independent interest.
SharedMemory Simulations on a FaultyMemory DMM
, 1996
"... this paper are synchronous, and the time performance is our major efficiency criterion. We consider a DMM with faulty memory words, otherwise everything is assumed to be operational. In particular the communication between the processors and the MUs is reliable, and a processor may always attempt to ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
this paper are synchronous, and the time performance is our major efficiency criterion. We consider a DMM with faulty memory words, otherwise everything is assumed to be operational. In particular the communication between the processors and the MUs is reliable, and a processor may always attempt to obtain an access to any MU, and, having been granted it, may access any memory word in it, even if all of them are faulty. The only restriction on the distribution of faults among memory words is that their total number is bounded from above by a fraction of the total number of memory words in all the MUs. In particular, some MUs may contain only operational cells, some only faulty cells, and some mixed cells. This report presents fast simulations of the PRAM on a DMM with faulty memory.