Results 1  10
of
14
Optimal and Sublogarithmic Time Randomized Parallel Sorting Algorithms
 SIAM Journal on Computing
, 1989
"... .We assume a parallel RAM model which allows both concurrent reads and concurrent writes of a global memory. Our main result is an optimal randomized parallel algorithm for INTEGER SORT (i.e., for sorting n integers in the range [1; n]). Our algorithm costs only logarithmic time and is the first kno ..."
Abstract

Cited by 64 (12 self)
 Add to MetaCart
.We assume a parallel RAM model which allows both concurrent reads and concurrent writes of a global memory. Our main result is an optimal randomized parallel algorithm for INTEGER SORT (i.e., for sorting n integers in the range [1; n]). Our algorithm costs only logarithmic time and is the first known that is optimal: the product of its time and processor bounds is upper bounded by a linear function of the input size. We also give a deterministic sublogarithmic time algorithm for prefix sum. In addition we present a sublogarithmic time algorithm for obtaining a random permutation of n elements in parallel. And finally, we present sublogarithmic time algorithms for GENERAL SORT and INTEGER SORT. Our sublogarithmic GENERAL SORT algorithm is also optimal. Key words. Randomized algorithms, parallel sorting, parallel random access machines, random permutations, radix sort, prefix sum, optimal algorithms. AMS(MOS) subject classifications. 68Q25. 1 A preliminary version of this paper ...
Efficient Parallel Evaluation of Straightline Code and Arithmetic Circuits
 SIAM J. Comput
, 1988
"... A new parallel algorithm is given to evaluate a straight line program. The algorithm evaluates a program over a commutative semiring R of degree d and size n in time O(log n(log nd)) using M(n) processors, where M(n) is the number of processors required for multiplying n \Theta n matrices over the ..."
Abstract

Cited by 31 (5 self)
 Add to MetaCart
A new parallel algorithm is given to evaluate a straight line program. The algorithm evaluates a program over a commutative semiring R of degree d and size n in time O(log n(log nd)) using M(n) processors, where M(n) is the number of processors required for multiplying n \Theta n matrices over the semiring R in O(log n) time. Appears in SIAM J. Comput., 17/4, pp. 687695 (1988). Preliminary version of this paper appeared in [6]. y Research supported in part by National Science Foundation Grant MCS800756 A01. z Research supported by NSF under ECS8404866, the Semiconductor Research Corporation under RSCH 84060496, and by an IBM Faculty Development Award. x Research Supported in part by NSF Grant DCR8504391 and by an IBM Faculty Development Award. 1 INTRODUCTION 1 1 Introduction In this paper we consider the problem of dynamic evaluation of a straight line program in parallel. This is a generalization of the result of Valiant et al [10]. They consider the problem of ta...
Explicit MultiThreading (XMT) Bridging Models for Instruction Parallelism
 Proc. 10th ACM Symposium on Parallel Algorithms and Architectures (SPAA
, 1998
"... The paper envisions an extension to a standard instruction set which efficiently implements PRAM algorithms using explicit multithreaded instructionlevel parallelism (ILP); that is, Explicit MultiThreading (XMT), a finegrained computational paradigm covering the spectrum from algorithms throu ..."
Abstract

Cited by 29 (12 self)
 Add to MetaCart
The paper envisions an extension to a standard instruction set which efficiently implements PRAM algorithms using explicit multithreaded instructionlevel parallelism (ILP); that is, Explicit MultiThreading (XMT), a finegrained computational paradigm covering the spectrum from algorithms through architecture to implementation is introduced; new elements are added where needed. The more detailed presentation is by way of a bridging model. Among other things, a bridging model provides a design space for algorithm designers and programmers, as well as a design space for computer architects. It is convenient to describe our wider vision regarding "parallelcomputingonachip" as a twostage development and therefore two bridging models are presented: Spawnbased multithreading (SpawnMT) and Elastic multithreading (EMT). The case for SpawnMT (or, alternatively, EMT) as a bridging model relies on the following evidence. (1) SpawnMT comprises an "instruction set level", wh...
Parallel Algorithmic Techniques for Combinatorial Computation
 Ann. Rev. Comput. Sci
, 1988
"... this paper and supplied many helpful comments. This research was supported in part by NSF grants DCR8511713, CCR8605353, and CCR8814977, and by DARPA contract N0003984C0165. ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
this paper and supplied many helpful comments. This research was supported in part by NSF grants DCR8511713, CCR8605353, and CCR8814977, and by DARPA contract N0003984C0165.
On Parallel Hashing and Integer Sorting
, 1991
"... The problem of sorting n integers from a restricted range [1::m], where m is superpolynomial in n, is considered. An o(n log n) randomized algorithm is given. Our algorithm takes O(n log log m) expected time and O(n) space. (Thus, for m = n polylog(n) we have an O(n log log n) algorithm.) The al ..."
Abstract

Cited by 25 (9 self)
 Add to MetaCart
The problem of sorting n integers from a restricted range [1::m], where m is superpolynomial in n, is considered. An o(n log n) randomized algorithm is given. Our algorithm takes O(n log log m) expected time and O(n) space. (Thus, for m = n polylog(n) we have an O(n log log n) algorithm.) The algorithm is parallelizable. The resulting parallel algorithm achieves optimal speed up. Some features of the algorithm make us believe that it is relevant for practical applications. A result of independent interest is a parallel hashing technique. The expected construction time is logarithmic using an optimal number of processors, and searching for a value takes O(1) time in the worst case. This technique enables drastic reduction of space requirements for the price of using randomness. Applicability of the technique is demonstrated for the parallel sorting algorithm, and for some parallel string matching algorithms. The parallel sorting algorithm is designed for a strong and non standard mo...
Designing Practical Efficient Algorithms for Symmetric Multiprocessors (Extended Abstract)
 IN ALGORITHM ENGINEERING AND EXPERIMENTATION (ALENEXâ€™99
, 1999
"... Symmetric multiprocessors (SMPs) dominate the highend server market and are currently the primary candidate for constructing large scale multiprocessor systems. Yet, the design of efficient parallel algorithms for this platform currently poses several challenges. In this paper, we present a comput ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
Symmetric multiprocessors (SMPs) dominate the highend server market and are currently the primary candidate for constructing large scale multiprocessor systems. Yet, the design of efficient parallel algorithms for this platform currently poses several challenges. In this paper, we present a computational model for designing efficient algorithms for symmetric multiprocessors. We then use this model to create efficient solutions to two widely different types of problems  linked list prefix computations and generalized sorting. Our novel algorithm for prefix computations builds upon the sparse ruling set approach of ReidMiller and Blelloch. Besides being somewhat simpler and requiring nearly half the number of memory accesses, we can bound our complexity with high probabi...
Prefix computations on symmetric multiprocessors
 Journal of Parallel and Distributed Computing
, 1998
"... We introduce a new prefix computation algorithm on linked lists which builds upon the sparse ruling set approach of ReidMiller and Blelloch. Besides being somewhat simpler and requiring nearly half the number of memory accesses, we can bound our complexity with high probability instead of merely on ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
We introduce a new prefix computation algorithm on linked lists which builds upon the sparse ruling set approach of ReidMiller and Blelloch. Besides being somewhat simpler and requiring nearly half the number of memory accesses, we can bound our complexity with high probability instead of merely on average. Moreover, whereas ReidMiller and Blelloch targeted their algorithm for implementation on a vector multiprocessor architecture, we develop our algorithm for implementation on the symmetric multiprocessor architecture (SMP). These symmetric multiprocessors dominate the highend server market and are currently the primary candidate for constructing large scale multiprocessor systems. Our prefix computation algorithm was implemented in C using POSIX threads and run on four symmetric multiprocessors: the HPConvex Exemplar (SClass), the IBM SP2 (High Node), the SGI Power Challenge, and the DEC AlphaServer. We ran our code using a variety of benchmarks which we identified to examine the dependence of our algorithm on memory access patterns. For some problems,
Can Parallel Algorithms Enhance Serial Implementation? (Extended Abstract)
, 1996
"... The broad thesis presented in this paper suggests that the serial emulation of a parallel algorithm has the potential advantage of running on a serial machine faster than a standard serial algorithm for the same problem. It is too early to reach definite conclusions regarding the significance of th ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
The broad thesis presented in this paper suggests that the serial emulation of a parallel algorithm has the potential advantage of running on a serial machine faster than a standard serial algorithm for the same problem. It is too early to reach definite conclusions regarding the significance of this thesis. However, using some imagination, validity of the thesis and some arguments supporting it may lead to several farreaching outcomes: (1) Reliance on "predictability of reference" in the design of computer systems will increase. (2) Parallel algorithms will be taught as part of the standard computer science and engineering undergraduate curriculum irrespective of whether (or when) parallel processing will become ubiquitous in the generalpurpose computing world. (3) A strategic agenda for highperformance parallel computing: A multistage agenda, which in no stage compromises userfriendliness of the programmer 's...
Structural Parallel Algorithmics
, 1991
"... The first half of the paper is a general introduction which emphasizes the central role that the PRAM model of parallel computation plays in algorithmic studies for parallel computers. Some of the collective knowledgebase on nonnumerical parallel algorithms can be characterized in a structural way ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
The first half of the paper is a general introduction which emphasizes the central role that the PRAM model of parallel computation plays in algorithmic studies for parallel computers. Some of the collective knowledgebase on nonnumerical parallel algorithms can be characterized in a structural way. Each structure relates a few problems and technique to one another from the basic to the more involved. The second half of the paper provides a bird'seye view of such structures for: (1) list, tree and graph parallel algorithms; (2) very fast deterministic parallel algorithms; and (3) very fast randomized parallel algorithms. 1 Introduction Parallelism is a concern that is missing from "traditional" algorithmic design. Unfortunately, it turns out that most efficient serial algorithms become rather inefficient parallel algorithms. The experience is that the design of parallel algorithms requires new paradigms and techniques, offering an exciting intellectual challenge. We note that it had...