Results 1 
9 of
9
The Complexity of Computation on the Parallel Random Access Machine
, 1993
"... PRAMs also approximate the situation where communication to and from shared memory is much more expensive than local operations, for example, where each processor is located on a separate chip and access to shared memory is through a combining network. Not surprisingly, abstract PRAMs can be much m ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
PRAMs also approximate the situation where communication to and from shared memory is much more expensive than local operations, for example, where each processor is located on a separate chip and access to shared memory is through a combining network. Not surprisingly, abstract PRAMs can be much more powerful than restricted instruction set PRAMs. THEOREM 21.16 Any function of n variables can be computed by an abstract EROW PRAM in O(log n) steps using n= log 2 n processors and n=2 log 2 n shared memory cells. PROOF Each processor begins by reading log 2 n input values and combining them into one large value. The information known by processors are combined in a binarytreelike fashion. In each round, the remaining processors are grouped into pairs. In each pair, one processor communicates the information it knows about the input to the other processor and then leaves the computation. After dlog 2 ne rounds, one processor knows all n input values. Then this processor computes th...
Parallel Algorithmic Techniques for Combinatorial Computation
 Ann. Rev. Comput. Sci
, 1988
"... this paper and supplied many helpful comments. This research was supported in part by NSF grants DCR8511713, CCR8605353, and CCR8814977, and by DARPA contract N0003984C0165. ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
this paper and supplied many helpful comments. This research was supported in part by NSF grants DCR8511713, CCR8605353, and CCR8814977, and by DARPA contract N0003984C0165.
Simulation of PRAM Models on Meshes
 Nordic Journal on Computing, 2(1):51
, 1994
"... We analyze the complexity of simulating a PRAM (parallel random access machine) on a mesh structured distributed memory machine. By utilizing suitable algorithms for randomized hashing, routing in a mesh, and sorting in a mesh, we prove that simulation of a PRAM on p N \Theta p N (or 3 p N \The ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
We analyze the complexity of simulating a PRAM (parallel random access machine) on a mesh structured distributed memory machine. By utilizing suitable algorithms for randomized hashing, routing in a mesh, and sorting in a mesh, we prove that simulation of a PRAM on p N \Theta p N (or 3 p N \Theta 3 p N \Theta 3 p N ) mesh is possible with O( p N ) (respectively O( 3 p N )) delay with high probability and a relatively small constant. Furthermore, with more sophisticated simulations further speedups are achieved; experiments show delays as low as p N + o( p N ) (respectively 3 p N + o( 3 p N )) per N PRAM processors. These simulations compare quite favorably with PRAM simulations on butterfly and hypercube. 1 Introduction PRAM 1 (Parallel Random Access Machine) is an abstract model of computation. It consists of N processors, each of which may have some local memory and registers, and a global shared memory of size m. A step of a PRAM is often seen to consist of...
Limits on the Power of Parallel Random Access Machines with Weak Forms of Write Conflict Resolution
, 1993
"... this paper, we use P 1 ; : : : ; P p to denote the p processors of a PRAM and M 1 ; : : : ; Mm to denote its m memory cells. If the input consists of n variables, ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
this paper, we use P 1 ; : : : ; P p to denote the p processors of a PRAM and M 1 ; : : : ; Mm to denote its m memory cells. If the input consists of n variables,
The Random Adversary: A LowerBound Technique For Randomized Parallel Algorithms
 in Proc. of the 3rd SODA (ACM
, 1997
"... . The randomadversary technique is a general method for proving lower bounds on randomized parallel algorithms. The bounds apply to the number of communication steps, and they apply regardless of the processors' instruction sets, the lengths of messages, etc. This paper introduces the ra ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
.<F3.82e+05> The randomadversary technique is a general method for proving lower bounds on randomized parallel algorithms. The bounds apply to the number of communication steps, and they apply regardless of the processors' instruction sets, the lengths of messages, etc. This paper introduces the randomadversary technique and shows how it can be used to obtain lower bounds on randomized parallel algorithms for load balancing, compaction, padded sorting, and finding Hamiltonian cycles in random graphs. Using the randomadversary technique, we obtain the first lower bounds for randomized parallel algorithms which are provably faster than their deterministic counterparts (specifically, for load balancing and related problems).<F4.005e+05> Key words.<F3.82e+05> parallel algorithms, parallel computation, PRAM model, randomized parallel algorithms, expected time, lower bounds, load balancing<F4.005e+05> AMS subject classifications.<F3.82e+05> 68Q10, 68Q22, 68Q25<F4.005e+05> PII.<F3.82e+05> ...
Very Fast Optimal Parallel Algorithms for Heap Construction
, 1994
"... We give two algorithms for permuting n items in an array into heap order on a CRCW PRAM. The first is deterministic and runs in O(log log n) time and performs O(n) operations. This runtime is the best possible for any comparisonbased algorithm using n processors. The second is randomized and runs ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We give two algorithms for permuting n items in an array into heap order on a CRCW PRAM. The first is deterministic and runs in O(log log n) time and performs O(n) operations. This runtime is the best possible for any comparisonbased algorithm using n processors. The second is randomized and runs in O(log log log n) time with high probability, performing O(n) operations. No PRAM algorithm with o(log n) runtime was previously known for this problem. In order to obtain the deterministic result we study the parallel complexity of selecting the kth smallest of n elements on the CRCW PRAM, a problem that is of independent interest. We give an algorithm that is superior to existing ones when k is small compared to n. Consequently, we show that this problem can be solved in O(log log n + log k= log log n) time and O(n) operations for all 1 k n=2. A matching time lower bound is shown for all algorithms that use n or fewer processors to solve this problem. 1 Introduction A heap is a co...
ERCW PRAMs and Optical Communication
 in Proceedings of the European Conference on Parallel Processing, EUROPAR ’96
, 1996
"... This paper presents algorithms and lower bounds for several fundamental problems on the Exclusive Read, Concurrent Write Parallel Random Access Machine (ERCW PRAM) and some results for unbounded fanin, bounded fanout (or `BFO') circuits. Our results for these two models are of importance because o ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
This paper presents algorithms and lower bounds for several fundamental problems on the Exclusive Read, Concurrent Write Parallel Random Access Machine (ERCW PRAM) and some results for unbounded fanin, bounded fanout (or `BFO') circuits. Our results for these two models are of importance because of the close relationship of the ERCW model to the OCPC model, a model of parallel computing based on dynamically reconfigurable optical networks, and of BFO circuits to the OCPC model with limited dynamic reconfiguration ability. Topics: Parallel Algorithms, Theory of Parallel and Distributed Computing. This research was supported by Texas Advanced Research Projects Grant 003658480. (philmac@cs.utexas.edu) y This research was supported in part by Texas Advanced Research Projects Grants 003658480 and 003658386, and NSF Grant CCR 9023059. (vlr@cs.utexas.edu) 1 Introduction In this paper we develop algorithms and lower bounds for fundamental problems on the Exclusive Read Concurrent Wri...
Simple Fast Parallel Hashing by Oblivious Execution
 AT&T Bell Laboratories
, 1994
"... A hash table is a representation of a set in a linear size data structure that supports constanttime membership queries. We show how to construct a hash table for any given set of n keys in O(lg lg n) parallel time with high probability, using n processors on a weak version of a crcw pram. Our algo ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
A hash table is a representation of a set in a linear size data structure that supports constanttime membership queries. We show how to construct a hash table for any given set of n keys in O(lg lg n) parallel time with high probability, using n processors on a weak version of a crcw pram. Our algorithm uses a novel approach of hashing by "oblivious execution" based on probabilistic analysis to circumvent the parity lower bound barrier at the nearlogarithmic time level. The algorithm is simple and is sketched by the following: 1. Partition the input set into buckets by a random polynomial of constant degree. 2. For t := 1 to O(lg lg n) do (a) Allocate M t memory blocks, each of size K t . (b) Let each bucket select a block at random, and try to injectively map its keys into the block using a random linear function. Buckets that fail carry on to the next iteration. The crux of the algorithm is a careful a priori selection of the parameters M t and K t . The algorithm uses only O(lg lg...
An Ω(√(log log n)) Lower Bound for Routing in Optical Networks
 In Proc. 6th ACM Symp. on
, 1998
"... . Optical communication is likely to significantly speed up parallel computation because the vast bandwidth of the optical medium can be divided to produce communication networks of very high degree. However, the problem of contention in highdegree networks makes the routing problem in these networ ..."
Abstract
 Add to MetaCart
. Optical communication is likely to significantly speed up parallel computation because the vast bandwidth of the optical medium can be divided to produce communication networks of very high degree. However, the problem of contention in highdegree networks makes the routing problem in these networks theoretically (and practically) di#cult. In this paper we examine Valiant's hrelation routing problem, which is a fundamental problem in the theory of parallel computing. The hrelation routing problem arises both in the direct implementation of specific parallel algorithms on distributedmemory machines and in the general simulation of sharedmemory models such as the PRAM on distributedmemory machines. In an hrelation routing problem each processor has up to h messages that it wishes to send to other processors and each processor is the destination of at most h messages. We present a lower bound for routing an hrelation (for any h > 1) on a complete optical network of size n. Our lo...