Results 1  10
of
63
Applying parallel computation algorithms in the design of serial algorithms
 J. ACM
, 1983
"... Abstract. The goal of this paper is to point out that analyses of parallelism in computational problems have practical implications even when multiprocessor machines are not available. This is true because, in many cases, a good parallel algorithm for one problem may turn out to be useful for design ..."
Abstract

Cited by 232 (7 self)
 Add to MetaCart
Abstract. The goal of this paper is to point out that analyses of parallelism in computational problems have practical implications even when multiprocessor machines are not available. This is true because, in many cases, a good parallel algorithm for one problem may turn out to be useful for designing an efficient serial algorithm for another problem. A d ~ eframework d for cases like this is presented. Particular cases, which are discussed in this paper, provide motivation for examining parallelism in sorting, selection, minimumspanningtree, shortest route, maxflow, and matrix multiplication problems, as well as in scheduling and locational problems.
Programming Parallel Algorithms
, 1996
"... In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a th ..."
Abstract

Cited by 191 (9 self)
 Add to MetaCart
In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a theoretical framework, many are quite efficient in practice or have key ideas that have been used in efficient implementations. This research on parallel algorithms has not only improved our general understanding ofparallelism but in several cases has led to improvements in sequential algorithms. Unf:ortunately there has been less success in developing good languages f:or prograftlftling parallel algorithftls, particularly languages that are well suited for teaching and prototyping algorithms. There has been a large gap between languages
The NPcompleteness column: an ongoing guide
 Journal of Algorithms
, 1985
"... This is the nineteenth edition of a (usually) quarterly column that covers new developments in the theory of NPcompleteness. The presentation is modeled on that used by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NPCompleteness,’ ’ W. H. Freeman & Co ..."
Abstract

Cited by 188 (0 self)
 Add to MetaCart
This is the nineteenth edition of a (usually) quarterly column that covers new developments in the theory of NPcompleteness. The presentation is modeled on that used by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NPCompleteness,’ ’ W. H. Freeman & Co., New York, 1979 (hereinafter referred to as ‘‘[G&J]’’; previous columns will be referred to by their dates). A background equivalent to that provided by [G&J] is assumed, and, when appropriate, crossreferences will be given to that book and the list of problems (NPcomplete and harder) presented there. Readers who have results they would like mentioned (NPhardness, PSPACEhardness, polynomialtimesolvability, etc.) or open problems they would like publicized, should
Scans as Primitive Parallel Operations
 IEEE Transactions on Computers
, 1987
"... In most parallel randomaccess machine (PRAM) models, memory references are assumed to take unit time. In practice, and in theory, certain scan operations, also known as prefix computations, can executed in no more time than these parallel memory references. This paper outline an extensive study of ..."
Abstract

Cited by 157 (12 self)
 Add to MetaCart
In most parallel randomaccess machine (PRAM) models, memory references are assumed to take unit time. In practice, and in theory, certain scan operations, also known as prefix computations, can executed in no more time than these parallel memory references. This paper outline an extensive study of the effect of including in the PRAM models, such scan operations as unittime primitives. The study concludes that the primitives improve the asymptotic running time of many algorithms by an O(lg n) factor, greatly simplify the description of many algorithms, and are significantly easier to implement than memory references. We therefore argue that the algorithm designer should feel free to use these operations as if they were as cheap as a memory reference. This paper describes five algorithms that clearly illustrate how the scan primitives can be used in algorithm design: a radixsort algorithm, a quicksort algorithm, a minimumspanning tree algorithm, a linedrawing algorithm and a mergi...
merging, and sorting in parallel models of computation
 in “Proc. 14th Annual ACM Sympos. on Theory of Cornput
, 1982
"... A variety of models have been proposed for the study of synchronous parallel computation. These models are reviewed and some prototype problems are studied further. Two classes of models are recognized, fixed connection networks and models based on a shared memory. Routing and sorting are prototype ..."
Abstract

Cited by 105 (3 self)
 Add to MetaCart
A variety of models have been proposed for the study of synchronous parallel computation. These models are reviewed and some prototype problems are studied further. Two classes of models are recognized, fixed connection networks and models based on a shared memory. Routing and sorting are prototype problems for the networks; in particular, they provide the basis for simulating the more powerful shared memory models. It is shown that a simple but important class of deterministic strategies (oblivious routing) is necessarily inefficient with respect to worst case analysis. Routing can be viewed as a special case of sorting, and the existence of an O(log n) sorting algorithm for some n processor fixed connection network has only recently been established by Ajtai, Komlos, and Szemeredi (“15th ACM Sympos. on Theory of Cornput., ” Boston, Mass., 1983, pp. l9). If the more powerful class of shared memory models is considered then it is possible to simply achieve an O(log n loglog n) sort via Valiant’s parallel merging algorithm, which it is shown can be implemented on certain models. Within a spectrum of shared memory models, it is shown that loglogn is asymptotically optimal for n processors to merge two sorted lists containing n elements. 0 1985 Academic Press, Inc.
Optimal and Sublogarithmic Time Randomized Parallel Sorting Algorithms
 SIAM Journal on Computing
, 1989
"... .We assume a parallel RAM model which allows both concurrent reads and concurrent writes of a global memory. Our main result is an optimal randomized parallel algorithm for INTEGER SORT (i.e., for sorting n integers in the range [1; n]). Our algorithm costs only logarithmic time and is the first kno ..."
Abstract

Cited by 62 (12 self)
 Add to MetaCart
.We assume a parallel RAM model which allows both concurrent reads and concurrent writes of a global memory. Our main result is an optimal randomized parallel algorithm for INTEGER SORT (i.e., for sorting n integers in the range [1; n]). Our algorithm costs only logarithmic time and is the first known that is optimal: the product of its time and processor bounds is upper bounded by a linear function of the input size. We also give a deterministic sublogarithmic time algorithm for prefix sum. In addition we present a sublogarithmic time algorithm for obtaining a random permutation of n elements in parallel. And finally, we present sublogarithmic time algorithms for GENERAL SORT and INTEGER SORT. Our sublogarithmic GENERAL SORT algorithm is also optimal. Key words. Randomized algorithms, parallel sorting, parallel random access machines, random permutations, radix sort, prefix sum, optimal algorithms. AMS(MOS) subject classifications. 68Q25. 1 A preliminary version of this paper ...
Optimal Doubly Logarithmic Parallel Algorithms Based On Finding All Nearest Smaller Values
, 1993
"... The all nearest smaller values problem is defined as follows. Let A = (a 1 ; a 2 ; : : : ; an ) be n elements drawn from a totally ordered domain. For each a i , 1 i n, find the two nearest elements in A that are smaller than a i (if such exist): the left nearest smaller element a j (with j ! i) a ..."
Abstract

Cited by 37 (7 self)
 Add to MetaCart
The all nearest smaller values problem is defined as follows. Let A = (a 1 ; a 2 ; : : : ; an ) be n elements drawn from a totally ordered domain. For each a i , 1 i n, find the two nearest elements in A that are smaller than a i (if such exist): the left nearest smaller element a j (with j ! i) and the right nearest smaller element a k (with k ? i). We give an O(log log n) time optimal parallel algorithm for the problem on a CRCW PRAM. We apply this algorithm to achieve optimal O(log log n) time parallel algorithms for four problems: (i) Triangulating a monotone polygon, (ii) Preprocessing for answering range minimum queries in constant time, (iii) Reconstructing a binary tree from its inorder and either preorder or postorder numberings, (vi) Matching a legal sequence of parentheses. We also show that any optimal CRCW PRAM algorithm for the triangulation problem requires \Omega\Gammauir log n) time. Dept. of Computing, King's College London, The Strand, London WC2R 2LS, England. ...
The Complexity of Computation on the Parallel Random Access Machine
, 1993
"... PRAMs also approximate the situation where communication to and from shared memory is much more expensive than local operations, for example, where each processor is located on a separate chip and access to shared memory is through a combining network. Not surprisingly, abstract PRAMs can be much m ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
PRAMs also approximate the situation where communication to and from shared memory is much more expensive than local operations, for example, where each processor is located on a separate chip and access to shared memory is through a combining network. Not surprisingly, abstract PRAMs can be much more powerful than restricted instruction set PRAMs. THEOREM 21.16 Any function of n variables can be computed by an abstract EROW PRAM in O(log n) steps using n= log 2 n processors and n=2 log 2 n shared memory cells. PROOF Each processor begins by reading log 2 n input values and combining them into one large value. The information known by processors are combined in a binarytreelike fashion. In each round, the remaining processors are grouped into pairs. In each pair, one processor communicates the information it knows about the input to the other processor and then leaves the computation. After dlog 2 ne rounds, one processor knows all n input values. Then this processor computes th...
Explicit MultiThreading (XMT) Bridging Models for Instruction Parallelism
 Proc. 10th ACM Symposium on Parallel Algorithms and Architectures (SPAA
, 1998
"... The paper envisions an extension to a standard instruction set which efficiently implements PRAM algorithms using explicit multithreaded instructionlevel parallelism (ILP); that is, Explicit MultiThreading (XMT), a finegrained computational paradigm covering the spectrum from algorithms throu ..."
Abstract

Cited by 29 (12 self)
 Add to MetaCart
The paper envisions an extension to a standard instruction set which efficiently implements PRAM algorithms using explicit multithreaded instructionlevel parallelism (ILP); that is, Explicit MultiThreading (XMT), a finegrained computational paradigm covering the spectrum from algorithms through architecture to implementation is introduced; new elements are added where needed. The more detailed presentation is by way of a bridging model. Among other things, a bridging model provides a design space for algorithm designers and programmers, as well as a design space for computer architects. It is convenient to describe our wider vision regarding "parallelcomputingonachip" as a twostage development and therefore two bridging models are presented: Spawnbased multithreading (SpawnMT) and Elastic multithreading (EMT). The case for SpawnMT (or, alternatively, EMT) as a bridging model relies on the following evidence. (1) SpawnMT comprises an "instruction set level", wh...
Parallel Algorithmic Techniques for Combinatorial Computation
 Ann. Rev. Comput. Sci
, 1988
"... this paper and supplied many helpful comments. This research was supported in part by NSF grants DCR8511713, CCR8605353, and CCR8814977, and by DARPA contract N0003984C0165. ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
this paper and supplied many helpful comments. This research was supported in part by NSF grants DCR8511713, CCR8605353, and CCR8814977, and by DARPA contract N0003984C0165.