Results 1 - 10
of
11
Optimal bounds for decision problems on the CRCW PRAM
- In Proceedings of the 19th ACM Symposium on Theory of Computing (New
"... Abstract. Optimal Q(logn/log logn) lower bounds on the time for CRCW PRAMS with polynomially bounded numbers of processors or memory cells to compute parity and a number of related problems are proven. A strict time hierarchy of explicit Boolean functions of n bits on such machines that holds up to ..."
Abstract
-
Cited by 46 (3 self)
- Add to MetaCart
Abstract. Optimal Q(logn/log logn) lower bounds on the time for CRCW PRAMS with polynomially bounded numbers of processors or memory cells to compute parity and a number of related problems are proven. A strict time hierarchy of explicit Boolean functions of n bits on such machines that holds up to O(logn/loglogn) time is also exhibited. That is, for every time bound T within this range a function is exhibited that can be easily computed using polynomial resources in time T but requires more than polynomial resources to be computed in time T- 1. Finally, it is shown that almost all Boolean functions of n bits require logn- loglogn + fi ( 1) time when the number of processors is at most polynomial in n. The bounds do not place restrictions on the uniformity of the algorithms nor on the instruction sets of the machines.
The Complexity of Computation on the Parallel Random Access Machine
, 1993
"... PRAMs also approximate the situation where communication to and from shared memory is much more expensive than local operations, for example, where each processor is located on a separate chip and access to shared memory is through a combining network. Not surprisingly, abstract PRAMs can be much m ..."
Abstract
-
Cited by 31 (4 self)
- Add to MetaCart
PRAMs also approximate the situation where communication to and from shared memory is much more expensive than local operations, for example, where each processor is located on a separate chip and access to shared memory is through a combining network. Not surprisingly, abstract PRAMs can be much more powerful than restricted instruction set PRAMs. THEOREM 21.16 Any function of n variables can be computed by an abstract EROW PRAM in O(log n) steps using n= log 2 n processors and n=2 log 2 n shared memory cells. PROOF Each processor begins by reading log 2 n input values and combining them into one large value. The information known by processors are combined in a binary-tree-like fashion. In each round, the remaining processors are grouped into pairs. In each pair, one processor communicates the information it knows about the input to the other processor and then leaves the computation. After dlog 2 ne rounds, one processor knows all n input values. Then this processor computes th...
Parallel Algorithmic Techniques for Combinatorial Computation
- Ann. Rev. Comput. Sci
, 1988
"... this paper and supplied many helpful comments. This research was supported in part by NSF grants DCR-85-11713, CCR-86-05353, and CCR-88-14977, and by DARPA contract N00039-84-C-0165. ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
this paper and supplied many helpful comments. This research was supported in part by NSF grants DCR-85-11713, CCR-86-05353, and CCR-88-14977, and by DARPA contract N00039-84-C-0165.
Simulation of PRAM Models on Meshes
- Nordic Journal on Computing, 2(1):51
, 1994
"... We analyze the complexity of simulating a PRAM (parallel random access machine) on a mesh structured distributed memory machine. By utilizing suitable algorithms for randomized hashing, routing in a mesh, and sorting in a mesh, we prove that simulation of a PRAM on p N \Theta p N (or 3 p N \The ..."
Abstract
-
Cited by 14 (9 self)
- Add to MetaCart
We analyze the complexity of simulating a PRAM (parallel random access machine) on a mesh structured distributed memory machine. By utilizing suitable algorithms for randomized hashing, routing in a mesh, and sorting in a mesh, we prove that simulation of a PRAM on p N \Theta p N (or 3 p N \Theta 3 p N \Theta 3 p N ) mesh is possible with O( p N ) (respectively O( 3 p N )) delay with high probability and a relatively small constant. Furthermore, with more sophisticated simulations further speed-ups are achieved; experiments show delays as low as p N + o( p N ) (respectively 3 p N + o( 3 p N )) per N PRAM processors. These simulations compare quite favorably with PRAM simulations on butterfly and hypercube. 1 Introduction PRAM 1 (Parallel Random Access Machine) is an abstract model of computation. It consists of N processors, each of which may have some local memory and registers, and a global shared memory of size m. A step of a PRAM is often seen to consist of...
New lower bounds for parallel computation
- In Proceedings of the 18 th Annual ACM Symposium on Theory of Computing
, 1986
"... Abstract. Lower bounds are proven on the parallel-time complexity of several basic functions on the most powerful concurrent-read concurrent-write PRAM with unlimited shared memory and unlimited power of individual processors (denoted by PRIORITY(m)): (1) It is proved that with a number of processor ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract. Lower bounds are proven on the parallel-time complexity of several basic functions on the most powerful concurrent-read concurrent-write PRAM with unlimited shared memory and unlimited power of individual processors (denoted by PRIORITY(m)): (1) It is proved that with a number of processors polynomial in n, fi(log n) time is needed for addition, multiplication or bitwise OR of n numbers, when each number has II ’ bits. Hence even the bit complexity (i.e., the time complexity as a function of the total number of bits in the input) is logarithmic in this case. This improves a beautiful result of Meyer auf der Heide and Wigderson [22]. They proved a log n lower bound using Ramsey-type techniques. Using Ramsey theory, it is possible to get an upper bound on the number of bits in the inputs used. However, for the case of polynomially many processors, this upper bound is more than a polynomial in n. (2) An R(log n) lower bound is given for PRIORITY(m) with no” ’ processors on a function with inputs from (0, 11, namely for the functionf(xl,.,x.) = C:‘=, x,a ’ where a is fixed and x, E (0, 1). (3) Finally, by a new efficient simulation of PRIORITY(m) by unbounded fan-in circuits, that with less than exponential number of processors, it is proven a PRIORITY(m) cannot compute PARITY in constant time, and with nO” ’ processors Q(G) time is needed. The simulation technique is of
The parallel complexity of element distinctness is \Omega\Gamma p log n
- SIAM J. Disc. Math
, 1988
"... Abstract. We consider the problem of element distinctness. Here n synchronized processors, each given an integer input, must decide whether these integers are pairwise distinct, while communicating via an infinitely large shared memory. If simultaneous write access to a memory cell is forbidden, the ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract. We consider the problem of element distinctness. Here n synchronized processors, each given an integer input, must decide whether these integers are pairwise distinct, while communicating via an infinitely large shared memory. If simultaneous write access to a memory cell is forbidden, then a lower bound of f(log n) on the number of steps easily follows (from S. Cook, C. Dwork, and R. Reischuk, SIAM J. Comput., 15 (1986), pp. 87-97.) When several (different) values can be written simultaneously to any cell, then there is an simple algorithm requiring O(1) steps. We consider the intermediate model, in which simultaneous writes to a single cell are allowed only if all values written are equal. We prove a lower bound of f((logn) 1/2) steps, improving the previous lower bound of f(log log log n) steps (F.E. Fich, F. Meyer auf der Heide, and A. Wigderson, Adv. in Comput., 4 (1987), pp. 1-15). The proof uses Ramsey-theoretic and combinatorial arguments. The result implies a separation between the powers of some variants of the PRAM model of parallel computation.
Limits on the Power of Parallel Random Access Machines with Weak Forms of Write Conflict Resolution
, 1993
"... this paper, we use P 1 ; : : : ; P p to denote the p processors of a PRAM and M 1 ; : : : ; Mm to denote its m memory cells. If the input consists of n variables, ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
this paper, we use P 1 ; : : : ; P p to denote the p processors of a PRAM and M 1 ; : : : ; Mm to denote its m memory cells. If the input consists of n variables,
Solving an Algebraic Path Problem and Some Related Graph Problems on a Hyper-Bus Broadcast Network
- IEEE Trans. Parallel Distrib. Syst
, 1997
"... The parallel computation model upon which the proposed algorithms are based is the hyper-bus broadcast network. The hyper-bus broadcast network consists of processors which are connected by global buses only. Based on such an improved architecture, we first design two O(1) time basic operations fo ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The parallel computation model upon which the proposed algorithms are based is the hyper-bus broadcast network. The hyper-bus broadcast network consists of processors which are connected by global buses only. Based on such an improved architecture, we first design two O(1) time basic operations for finding the maximum and minimum of N numbers each of size O(log N)-bit and computing the matrix multiplication operation of two N N matrices, respectively. Then, based on these two basic operations, three of the most important instances in the algebraic path problem, the connectivity problem, and several related problems are all solved in O(log N) time. These include the all-pair shortest paths, the minimum-weight spanning tree, the transitive closure, the connected component, the biconnected component, the articulation point, and the bridge problems, either in an undirected or a directed graph, respectively.
Reconfigurable architectures and algorithms: A research survey
- IJCSA
, 2009
"... Ever since the introduction of the Dynamically Reconfigurable Buses, the architecture gained a lot of popularity amongst the researchers and scientists for its high performance computing with general purpose processor used. It is a powerful model of computation in which communication pattern between ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Ever since the introduction of the Dynamically Reconfigurable Buses, the architecture gained a lot of popularity amongst the researchers and scientists for its high performance computing with general purpose processor used. It is a powerful model of computation in which communication pattern between the processors could be changed during the execution. Following the years several new architectures and efficient algorithms for these were proposed, and their implementation using FPGA’s have been shown. This paper presents a survey on the different architectures proposed, and few important algorithms presented for these specialized architectures over the period of last two decades. Keywords: PARBS, R-MESH, RN, LARPBS, Polymorphic Torus Network, AROB. 1.
Some Topics in Parallel Computation and Branching Programs
, 1995
"... Some Topics in Parallel Computation and Branching Programs by Rakesh Kumar Sinha Chairperson of the Supervisory Committee: Professor Paul Beame Department of Computer Science and Engineering There are two parts of this thesis: the first part gives two constructions of branching programs; the second ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Some Topics in Parallel Computation and Branching Programs by Rakesh Kumar Sinha Chairperson of the Supervisory Committee: Professor Paul Beame Department of Computer Science and Engineering There are two parts of this thesis: the first part gives two constructions of branching programs; the second part contains three results on models of parallel machines. The branching program model has turned out to be very useful for understanding the computational behavior of problems. In addition, several restrictions of branching programs, for example ordered binary decision diagrams, have proven to be successful data structures in several VLSI design and verification applications. We construct a branching program of o(n log 3 n) nodes for computing any threshold function on n variables and a branching program of o(n log 4 n) nodes for determining the sum of n variables modulo a fixed divisor. These are improvements over constructions of size 2(n 3=2 ) due to Lupanov [Lup65]. The second p...

