Results 1  10
of
11
The Complexity of Computation on the Parallel Random Access Machine
, 1993
"... PRAMs also approximate the situation where communication to and from shared memory is much more expensive than local operations, for example, where each processor is located on a separate chip and access to shared memory is through a combining network. Not surprisingly, abstract PRAMs can be much m ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
PRAMs also approximate the situation where communication to and from shared memory is much more expensive than local operations, for example, where each processor is located on a separate chip and access to shared memory is through a combining network. Not surprisingly, abstract PRAMs can be much more powerful than restricted instruction set PRAMs. THEOREM 21.16 Any function of n variables can be computed by an abstract EROW PRAM in O(log n) steps using n= log 2 n processors and n=2 log 2 n shared memory cells. PROOF Each processor begins by reading log 2 n input values and combining them into one large value. The information known by processors are combined in a binarytreelike fashion. In each round, the remaining processors are grouped into pairs. In each pair, one processor communicates the information it knows about the input to the other processor and then leaves the computation. After dlog 2 ne rounds, one processor knows all n input values. Then this processor computes th...
Ultrafast expected time parallel algorithms
 Proc. of the 2nd SODA
, 1991
"... It has been shown previously that sorting n items into n locations with a polynomial number of processors requires Ω(log n/log log n) time. We sidestep this lower bound with the idea of Padded Sorting, or sorting n items into n + o(n) locations. Since many problems do not rely on the exact rank of s ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
It has been shown previously that sorting n items into n locations with a polynomial number of processors requires Ω(log n/log log n) time. We sidestep this lower bound with the idea of Padded Sorting, or sorting n items into n + o(n) locations. Since many problems do not rely on the exact rank of sorted items, a Padded Sort is often just as useful as an unpadded sort. Our algorithm for Padded Sort runs on the Tolerant CRCW PRAM and takes Θ(log log n/log log log n) expected time using n log log log n/log log n processors, assuming the items are taken from a uniform distribution. Using similar techniques we solve some computational geometry problems, including Voronoi Diagram, with the same processor and time bounds, assuming points are taken from a uniform distribution in the unit square. Further, we present an Arbitrary CRCW PRAM algorithm to solve the Closest Pair problem in constant expected time with n processors regardless of the distribution of points. All of these algorithms achieve linear speedup in expected time over their optimal serial counterparts. 1 Research done while at the University of Michigan and supported by an AT&T Fellowship.
Limits on the Power of Parallel Random Access Machines with Weak Forms of Write Conflict Resolution
, 1993
"... this paper, we use P 1 ; : : : ; P p to denote the p processors of a PRAM and M 1 ; : : : ; Mm to denote its m memory cells. If the input consists of n variables, ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
this paper, we use P 1 ; : : : ; P p to denote the p processors of a PRAM and M 1 ; : : : ; Mm to denote its m memory cells. If the input consists of n variables,
Retrieval of scattered information by EREW, CREW and CRCW PRAMs
, 1992
"... The kcompaction problem arises when k out of n cells in an array are nonempty and the contents of these cells must be moved to the first k locations in the array. Parallel algorithms for kcompaction have obvious applications in processor allocation and load balancing; kcompaction is also an im ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
The kcompaction problem arises when k out of n cells in an array are nonempty and the contents of these cells must be moved to the first k locations in the array. Parallel algorithms for kcompaction have obvious applications in processor allocation and load balancing; kcompaction is also an important subroutine in many recently developed parallel algorithms. We show that any EREW PRAM that solves the kcompaction problem requires \Omega\Gamma p log n) time, even if the number of processors is arbitrarily large and k = 2. On the CREW PRAM, we show that every nprocessor algorithm for kcompaction problem requires\Omega\Gammaqui log n) time, even if k = 2. Finally, we show that O(log k) time can be achieved on the ROBUST PRAM, a very weak CRCW PRAM model.
ERCW PRAMs and Optical Communication
 in Proceedings of the European Conference on Parallel Processing, EUROPAR ’96
, 1996
"... This paper presents algorithms and lower bounds for several fundamental problems on the Exclusive Read, Concurrent Write Parallel Random Access Machine (ERCW PRAM) and some results for unbounded fanin, bounded fanout (or `BFO') circuits. Our results for these two models are of importance beca ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
This paper presents algorithms and lower bounds for several fundamental problems on the Exclusive Read, Concurrent Write Parallel Random Access Machine (ERCW PRAM) and some results for unbounded fanin, bounded fanout (or `BFO') circuits. Our results for these two models are of importance because of the close relationship of the ERCW model to the OCPC model, a model of parallel computing based on dynamically reconfigurable optical networks, and of BFO circuits to the OCPC model with limited dynamic reconfiguration ability. Topics: Parallel Algorithms, Theory of Parallel and Distributed Computing. This research was supported by Texas Advanced Research Projects Grant 003658480. (philmac@cs.utexas.edu) y This research was supported in part by Texas Advanced Research Projects Grants 003658480 and 003658386, and NSF Grant CCR 9023059. (vlr@cs.utexas.edu) 1 Introduction In this paper we develop algorithms and lower bounds for fundamental problems on the Exclusive Read Concurrent Wri...
Simple Fast Parallel Hashing by Oblivious Execution
 AT&T Bell Laboratories
, 1994
"... A hash table is a representation of a set in a linear size data structure that supports constanttime membership queries. We show how to construct a hash table for any given set of n keys in O(lg lg n) parallel time with high probability, using n processors on a weak version of a crcw pram. Our algo ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
A hash table is a representation of a set in a linear size data structure that supports constanttime membership queries. We show how to construct a hash table for any given set of n keys in O(lg lg n) parallel time with high probability, using n processors on a weak version of a crcw pram. Our algorithm uses a novel approach of hashing by "oblivious execution" based on probabilistic analysis to circumvent the parity lower bound barrier at the nearlogarithmic time level. The algorithm is simple and is sketched by the following: 1. Partition the input set into buckets by a random polynomial of constant degree. 2. For t := 1 to O(lg lg n) do (a) Allocate M t memory blocks, each of size K t . (b) Let each bucket select a block at random, and try to injectively map its keys into the block using a random linear function. Buckets that fail carry on to the next iteration. The crux of the algorithm is a careful a priori selection of the parameters M t and K t . The algorithm uses only O(lg lg...
An Effective Load Balancing Policy for Geometric Decaying Algorithms
"... Parallel algorithms are often first designed as a sequence of rounds, where each round includes any number of independent constant time operations. This socalled worktime presentation is then followed by a processor scheduling implementation ona more concrete computational model. Many parallel alg ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Parallel algorithms are often first designed as a sequence of rounds, where each round includes any number of independent constant time operations. This socalled worktime presentation is then followed by a processor scheduling implementation ona more concrete computational model. Many parallel algorithms are geometricdecaying in the sense that the sequence of work loads is upper bounded by a decreasing geometric series. A standard scheduling implementation of such algorithms consists of a repeated application of load balancing. We present a more effective, yet as simple, policy for the utilization of load balancing in geometric decaying algorithms. By making a more careful choice of when and how often load balancing should be employed, and by using a simple amortization argument, we showthat the number of required applications of load balancing should be nearlyconstant. The policy is not restricted to any particular model of parallel computation, and, up to a constant factor, it is the best possible.
Complexity of Boolean Functions on PRAMs – Lower Bound Techniques ∗
"... Determining time necessary for computing important functions on parallel machines is one of the most important problems in complexity theory for parallel algorithms. Recently, a substantial progress has been made in this area. In this survey paper, we discuss the results that have been obtained for ..."
Abstract
 Add to MetaCart
Determining time necessary for computing important functions on parallel machines is one of the most important problems in complexity theory for parallel algorithms. Recently, a substantial progress has been made in this area. In this survey paper, we discuss the results that have been obtained for three types of parallel random access machines (PRAMs): CREW, ROBUST and EREW. 1
Complexity of Boolean Functions on PRAMs  Lower Bound Techniques
"... Determining time necessary for computing important functions on parallel machines is one of the most important problems in complexity theory for parallel algorithms. Recently, a substantial progress has been made in this area. In this survey paper, we discuss the results that have been obtained for ..."
Abstract
 Add to MetaCart
Determining time necessary for computing important functions on parallel machines is one of the most important problems in complexity theory for parallel algorithms. Recently, a substantial progress has been made in this area. In this survey paper, we discuss the results that have been obtained for three types of parallel random access machines (PRAMs): CREW, ROBUST and EREW.