Results 1 -
7 of
7
Provably efficient scheduling for languages with fine-grained parallelism
- IN PROC. SYMPOSIUM ON PARALLEL ALGORITHMS AND ARCHITECTURES
, 1995
"... Many high-level parallel programming languages allow for fine-grained parallelism. As in the popular work-time framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A ..."
Abstract
-
Cited by 68 (22 self)
- Add to MetaCart
Many high-level parallel programming languages allow for fine-grained parallelism. As in the popular work-time framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A common concern in executing such programs is to schedule tasks to processors dynamically so as to minimize not only the execution time, but also the amount of space (memory) needed. Without careful scheduling, the parallel execution on p processors can use a factor of p or larger more space than a sequential implementation of the same program. This paper first identifies a class of parallel schedules that are provably efficient in both time and space. For any
Shared Memory Simulations with Triple-Logarithmic Delay (Extended Abstract)
, 1995
"... ) Artur Czumaj 1 , Friedhelm Meyer auf der Heide 2 , and Volker Stemann 1 1 Heinz Nixdorf Institute, University of Paderborn, D-33095 Paderborn, Germany 2 Heinz Nixdorf Institute and Department of Computer Science, University of Paderborn, D-33095 Paderborn, Germany Abstract. We conside ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
) Artur Czumaj 1 , Friedhelm Meyer auf der Heide 2 , and Volker Stemann 1 1 Heinz Nixdorf Institute, University of Paderborn, D-33095 Paderborn, Germany 2 Heinz Nixdorf Institute and Department of Computer Science, University of Paderborn, D-33095 Paderborn, Germany Abstract. We consider the problem of simulating a PRAM on a distributed memory machine (DMM). Our main result is a randomized algorithm that simulates each step of an n-processor CRCW PRAM on an n-processor DMM with O(log log log n log n) delay, with high probability. This is an exponential improvement on all previously known simulations. It can be extended to a simulation of an (n log log log n log n)- processor EREW PRAM on an n-processor DMM with optimal delay O(log log log n log n), with high probability. Finally a lower bound of \Omega (log log log n=log log log log n) expected time is proved for a large class of randomized simulations that includes all known simulations. 1 Introduction Para...
Modeling parallel bandwidth: Local vs. global restrictions
"... Recently there has been an increasing interest in models of parallel computation that account for the bandwidth limitations in communication networks. Some models (e.g., bsp and logp) account for bandwidth limitations using a per-processor parameter g> 1, such that eachpro cessor can send/receive at ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
Recently there has been an increasing interest in models of parallel computation that account for the bandwidth limitations in communication networks. Some models (e.g., bsp and logp) account for bandwidth limitations using a per-processor parameter g> 1, such that eachpro cessor can send/receive at most h messages in g h time. Other models (e.g., pram(m)) account for bandwidth limitations as an aggregate parameter m<p, such thatthe p processors can send at most m messages in total at each step. This paper provides the rst detailed study of the algorithmic implications of modeling parallel bandwidth as a per-processor (local) limitation versus an aggregate (global) limitation. We consider a number of basic problems
Contention Resolution in Hashing Based Shared Memory Simulations
"... In this paper we study the problem of simulating shared memory on the Distributed Memory Machine (DMM). Our approach uses multiple copies of shared memory cells, distributed among the memory modules of the DMM via universal hashing. Thus the main problem is to design strategies that resolve cont ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
In this paper we study the problem of simulating shared memory on the Distributed Memory Machine (DMM). Our approach uses multiple copies of shared memory cells, distributed among the memory modules of the DMM via universal hashing. Thus the main problem is to design strategies that resolve contention at the memory modules. Developing ideas from random graphs and very fast randomized algorithms, we present new simulation techniques that enable us to improve the previously best results exponentially. Particularly, we show that an n-processor CRCW PRAM can be simulated by an n-processor DMM with delay O(log log log n log n), with high probability. Next we show a general technique that can be used to turn these simulations to time-processor optimal ones, in the case of EREW PRAMs to be simulated. We obtain a time-processor optimal simulation of an (n log log log n log n)-processor EREW PRAM on an n-processor DMM with O(log log log n log n) delay. When a CRCW PRAM with (n...
Simple Fast Parallel Hashing by Oblivious Execution
- AT&T Bell Laboratories
, 1994
"... A hash table is a representation of a set in a linear size data structure that supports constanttime membership queries. We show how to construct a hash table for any given set of n keys in O(lg lg n) parallel time with high probability, using n processors on a weak version of a crcw pram. Our algo ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
A hash table is a representation of a set in a linear size data structure that supports constanttime membership queries. We show how to construct a hash table for any given set of n keys in O(lg lg n) parallel time with high probability, using n processors on a weak version of a crcw pram. Our algorithm uses a novel approach of hashing by "oblivious execution" based on probabilistic analysis to circumvent the parity lower bound barrier at the near-logarithmic time level. The algorithm is simple and is sketched by the following: 1. Partition the input set into buckets by a random polynomial of constant degree. 2. For t := 1 to O(lg lg n) do (a) Allocate M t memory blocks, each of size K t . (b) Let each bucket select a block at random, and try to injectively map its keys into the block using a random linear function. Buckets that fail carry on to the next iteration. The crux of the algorithm is a careful a priori selection of the parameters M t and K t . The algorithm uses only O(lg lg...
Fast, Efficient Mutual and Self Simulations for Shared Memory and Reconfigurable Mesh
- in Proceedings of the 7th IEEE Symposium on Parallel and Distributed Processing
, 1995
"... This paper studies relations between the parallel random access machine (pram) model, and the reconfigurable mesh (rmesh) model, by providing mutual simulations between the models. We present an algorithm simulating one step of an (n lg lg n)- processor crcw pram on an n \Theta n rmesh with delay O ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper studies relations between the parallel random access machine (pram) model, and the reconfigurable mesh (rmesh) model, by providing mutual simulations between the models. We present an algorithm simulating one step of an (n lg lg n)- processor crcw pram on an n \Theta n rmesh with delay O(lg lg n) with high probability. We use our pram simulation to obtain the first efficient self-simulation algorithm of an rmesh with general switches: An algorithm running on an n \Theta n rmesh is simulated on a p \Theta p rmesh with delay O((n=p) 2 + lg n lg lg p) with high probability, which is optimal for all p n= p lg n lg lg n. Finally, we consider the simulation of rmesh on the pram. We show that a 2 \Theta n rmesh can be optimally simulated on a crcw pram in \Theta(ff(n)) time, where ff(\Delta) is the slow-growing inverse Ackermann function. In contrast, a pram with polynomial number of processors cannot simulate the 3 \Theta n rmesh in less than \Omega\Gammaha n= lg lg n) e...

