Results 1  10
of
10
The Complexity of Computation on the Parallel Random Access Machine
, 1993
"... PRAMs also approximate the situation where communication to and from shared memory is much more expensive than local operations, for example, where each processor is located on a separate chip and access to shared memory is through a combining network. Not surprisingly, abstract PRAMs can be much m ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
PRAMs also approximate the situation where communication to and from shared memory is much more expensive than local operations, for example, where each processor is located on a separate chip and access to shared memory is through a combining network. Not surprisingly, abstract PRAMs can be much more powerful than restricted instruction set PRAMs. THEOREM 21.16 Any function of n variables can be computed by an abstract EROW PRAM in O(log n) steps using n= log 2 n processors and n=2 log 2 n shared memory cells. PROOF Each processor begins by reading log 2 n input values and combining them into one large value. The information known by processors are combined in a binarytreelike fashion. In each round, the remaining processors are grouped into pairs. In each pair, one processor communicates the information it knows about the input to the other processor and then leaves the computation. After dlog 2 ne rounds, one processor knows all n input values. Then this processor computes th...
Ultrafast expected time parallel algorithms
 Proc. of the 2nd SODA
, 1991
"... It has been shown previously that sorting n items into n locations with a polynomial number of processors requires Ω(log n/log log n) time. We sidestep this lower bound with the idea of Padded Sorting, or sorting n items into n + o(n) locations. Since many problems do not rely on the exact rank of s ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
It has been shown previously that sorting n items into n locations with a polynomial number of processors requires Ω(log n/log log n) time. We sidestep this lower bound with the idea of Padded Sorting, or sorting n items into n + o(n) locations. Since many problems do not rely on the exact rank of sorted items, a Padded Sort is often just as useful as an unpadded sort. Our algorithm for Padded Sort runs on the Tolerant CRCW PRAM and takes Θ(log log n/log log log n) expected time using n log log log n/log log n processors, assuming the items are taken from a uniform distribution. Using similar techniques we solve some computational geometry problems, including Voronoi Diagram, with the same processor and time bounds, assuming points are taken from a uniform distribution in the unit square. Further, we present an Arbitrary CRCW PRAM algorithm to solve the Closest Pair problem in constant expected time with n processors regardless of the distribution of points. All of these algorithms achieve linear speedup in expected time over their optimal serial counterparts. 1 Research done while at the University of Michigan and supported by an AT&T Fellowship.
Retrieval of scattered information by EREW, CREW, and CRCW PRAMs
 In Proc. 3rd Scand. Workshop on Alg. Theory
, 1992
"... Abstract. The kcompaction problem arises when k out of n cells in an array are nonempty and the contents of these cells must be moved to the first k locations in the array. Parallel algorithms for kcompaction have obvious applications in processor allocation and load balancing; kcompaction is al ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Abstract. The kcompaction problem arises when k out of n cells in an array are nonempty and the contents of these cells must be moved to the first k locations in the array. Parallel algorithms for kcompaction have obvious applications in processor allocation and load balancing; kcompaction is also an important subroutine in many recently developed parallel algorithms. We show that any EREW PRAM that solves the kcompaction problem requires Ω ( √ log n) time, even if the number of processors is arbitrarily large and k = 2. On the CREW PRAM, we show that every nprocessor algorithm for kcompaction problem requires Ω(log log n) time, even if k = 2. Finally, we show that O(log k) time can be achieved on the ROBUST PRAM, a very weak CRCW PRAM model.
Limits on the Power of Parallel Random Access Machines with Weak Forms of Write Conflict Resolution
, 1993
"... this paper, we use P 1 ; : : : ; P p to denote the p processors of a PRAM and M 1 ; : : : ; Mm to denote its m memory cells. If the input consists of n variables, ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
this paper, we use P 1 ; : : : ; P p to denote the p processors of a PRAM and M 1 ; : : : ; Mm to denote its m memory cells. If the input consists of n variables,
ERCW PRAMs and Optical Communication
 in Proceedings of the European Conference on Parallel Processing, EUROPAR ’96
, 1996
"... This paper presents algorithms and lower bounds for several fundamental problems on the Exclusive Read, Concurrent Write Parallel Random Access Machine (ERCW PRAM) and some results for unbounded fanin, bounded fanout (or `BFO') circuits. Our results for these two models are of importance because o ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
This paper presents algorithms and lower bounds for several fundamental problems on the Exclusive Read, Concurrent Write Parallel Random Access Machine (ERCW PRAM) and some results for unbounded fanin, bounded fanout (or `BFO') circuits. Our results for these two models are of importance because of the close relationship of the ERCW model to the OCPC model, a model of parallel computing based on dynamically reconfigurable optical networks, and of BFO circuits to the OCPC model with limited dynamic reconfiguration ability. Topics: Parallel Algorithms, Theory of Parallel and Distributed Computing. This research was supported by Texas Advanced Research Projects Grant 003658480. (philmac@cs.utexas.edu) y This research was supported in part by Texas Advanced Research Projects Grants 003658480 and 003658386, and NSF Grant CCR 9023059. (vlr@cs.utexas.edu) 1 Introduction In this paper we develop algorithms and lower bounds for fundamental problems on the Exclusive Read Concurrent Wri...
Simple Fast Parallel Hashing by Oblivious Execution
 AT&T Bell Laboratories
, 1994
"... A hash table is a representation of a set in a linear size data structure that supports constanttime membership queries. We show how to construct a hash table for any given set of n keys in O(lg lg n) parallel time with high probability, using n processors on a weak version of a crcw pram. Our algo ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
A hash table is a representation of a set in a linear size data structure that supports constanttime membership queries. We show how to construct a hash table for any given set of n keys in O(lg lg n) parallel time with high probability, using n processors on a weak version of a crcw pram. Our algorithm uses a novel approach of hashing by "oblivious execution" based on probabilistic analysis to circumvent the parity lower bound barrier at the nearlogarithmic time level. The algorithm is simple and is sketched by the following: 1. Partition the input set into buckets by a random polynomial of constant degree. 2. For t := 1 to O(lg lg n) do (a) Allocate M t memory blocks, each of size K t . (b) Let each bucket select a block at random, and try to injectively map its keys into the block using a random linear function. Buckets that fail carry on to the next iteration. The crux of the algorithm is a careful a priori selection of the parameters M t and K t . The algorithm uses only O(lg lg...
Pointers versus Arithmetic in PRAMs
 Journal of Computer and System Sciences
, 1996
"... Manipulation of pointers in shared data structures is an important communication mechanism used in many parallel algorithms. Indeed, many fundamental algorithms do essentially nothing else. A Parallel Pointer Machine, (or PPM ) is a parallel model having pointers as its principal data type. PPMs hav ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Manipulation of pointers in shared data structures is an important communication mechanism used in many parallel algorithms. Indeed, many fundamental algorithms do essentially nothing else. A Parallel Pointer Machine, (or PPM ) is a parallel model having pointers as its principal data type. PPMs have been characterized as PRAMs obeying two restrictions  first, restricted arithmetic capabilities, and second, the CROW memory access restriction (Concurrent Read, Owner Write, a commonly occurring special case of CREW). We present results concerning the relative power of PPMs (and other arithmetically restricted PRAMs) versus CROW PRAMs having ordinary arithmetic capabilities. First, we prove lower bounds separating PPMs from CROW PRAMs. For example, any stepbystep simulation of an nprocessor CROW PRAM by a PPM requires time \Omega# log log n) per step. Second, we show that this lower bound is tight  we give such a stepbystep simulation using O(log log n) time per step. As a coro...
An Effective Load Balancing Policy for Geometric Decaying Algorithms
"... Parallel algorithms are often first designed as a sequence of rounds, where each round includes any number of independent constant time operations. This socalled worktime presentation is then followed by a processor scheduling implementation ona more concrete computational model. Many parallel alg ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Parallel algorithms are often first designed as a sequence of rounds, where each round includes any number of independent constant time operations. This socalled worktime presentation is then followed by a processor scheduling implementation ona more concrete computational model. Many parallel algorithms are geometricdecaying in the sense that the sequence of work loads is upper bounded by a decreasing geometric series. A standard scheduling implementation of such algorithms consists of a repeated application of load balancing. We present a more effective, yet as simple, policy for the utilization of load balancing in geometric decaying algorithms. By making a more careful choice of when and how often load balancing should be employed, and by using a simple amortization argument, we showthat the number of required applications of load balancing should be nearlyconstant. The policy is not restricted to any particular model of parallel computation, and, up to a constant factor, it is the best possible.
Complexity of Boolean Functions on PRAMs  Lower Bound Techniques
"... Determining time necessary for computing important functions on parallel machines is one of the most important problems in complexity theory for parallel algorithms. Recently, a substantial progress has been made in this area. In this survey paper, we discuss the results that have been obtained for ..."
Abstract
 Add to MetaCart
Determining time necessary for computing important functions on parallel machines is one of the most important problems in complexity theory for parallel algorithms. Recently, a substantial progress has been made in this area. In this survey paper, we discuss the results that have been obtained for three types of parallel random access machines (PRAMs): CREW, ROBUST and EREW. 1 Introduction Parallel random access machine (PRAM) is a most abstract model of parallel computers, where interprocessor communication is realized using a shared memory. Each processor of a PRAM can access any cell of a shared memory in one computation step. This is certainly an unrealistic assumption, but it makes the analysis of the parallel algorithms much easier, and we can can concentrate ourselves on inherent complexity of a given problem. This is a reason, why most parallel algorithms have been described in terms of PRAMs. Each PRAM consists of a collection of processors and common memory cells. Each comp...
�2009 SCPE RELATIONS BETWEEN SEVERAL PARALLEL COMPUTATIONAL MODELS ∗
"... Abstract. We investigate the relative computational power of parallel models with shared memory. Based on feasibility considerations present in the literature, we split these models into “lightweight ” and “heavyweight, ” and then find that the heavyweight class is strictly more powerful than the li ..."
Abstract
 Add to MetaCart
Abstract. We investigate the relative computational power of parallel models with shared memory. Based on feasibility considerations present in the literature, we split these models into “lightweight ” and “heavyweight, ” and then find that the heavyweight class is strictly more powerful than the lightweight class, as expected. On the other hand, we contradict the long held belief that the heavyweight models (namely, the Combining CRCW PRAM and the BSR) form a hierarchy, showing that they are identical in computational power with each other. We thus introduce the BSR into the family of practically meaningful massively parallel models. We also investigate the power of concurrentwrite on models with reconfigurable buses, finding that it does not add computational power over exclusivewrite under certain reasonable assumptions. Overall, the Combining CRCW PRAM and the CREW models with directed reconfigurable buses are found to be the simplest of the heavyweight models, which now also include the BSR and all the models with directed reconfigurable buses. These results also have significant implications in the area of realtime computations. Key words: parallel computation, shared memory parallel models, reconfigurable buses, parallel random access machine, broadcast with selective reduction, reconfigurable multiple bus machine, reconfigurable network, concurrentread concurrentwrite conflict resolution rules, realtime 1. Introduction. The concurrentread concurrentwrite parallel random access machine (CRCW PRAM) is the most convenient model of parallel computation and so it is used extensively in analyzing parallel solutions