Results 1 
5 of
5
Provably efficient scheduling for languages with finegrained parallelism
 IN PROC. SYMPOSIUM ON PARALLEL ALGORITHMS AND ARCHITECTURES
, 1995
"... Many highlevel parallel programming languages allow for finegrained parallelism. As in the popular worktime framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A ..."
Abstract

Cited by 84 (24 self)
 Add to MetaCart
Many highlevel parallel programming languages allow for finegrained parallelism. As in the popular worktime framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A common concern in executing such programs is to schedule tasks to processors dynamically so as to minimize not only the execution time, but also the amount of space (memory) needed. Without careful scheduling, the parallel execution on p processors can use a factor of p or larger more space than a sequential implementation of the same program. This paper first identifies a class of parallel schedules that are provably efficient in both time and space. For any
Efficient LowContention Parallel Algorithms
 the 1994 ACM Symp. on Parallel Algorithms and Architectures
, 1994
"... The queueread, queuewrite (qrqw) parallel random access machine (pram) model permits concurrent reading and writing to shared memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. The qrqw pram model reflects the contention prope ..."
Abstract

Cited by 30 (12 self)
 Add to MetaCart
The queueread, queuewrite (qrqw) parallel random access machine (pram) model permits concurrent reading and writing to shared memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. The qrqw pram model reflects the contention properties of most commercially available parallel machines more accurately than either the wellstudied crcw pram or erew pram models, and can be efficiently emulated with only logarithmic slowdown on hypercubetype noncombining networks. This paper describes fast, lowcontention, workoptimal, randomized qrqw pram algorithms for the fundamental problems of load balancing, multiple compaction, generating a random permutation, parallel hashing, and distributive sorting. These logarithmic or sublogarithmic time algorithms considerably improve upon the best known erew pram algorithms for these problems, while avoiding the highcontention steps typical of crcw pram algorithms. An illustrative expe...
An Effective Load Balancing Policy for Geometric Decaying Algorithms
"... Parallel algorithms are often first designed as a sequence of rounds, where each round includes any number of independent constant time operations. This socalled worktime presentation is then followed by a processor scheduling implementation ona more concrete computational model. Many parallel alg ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Parallel algorithms are often first designed as a sequence of rounds, where each round includes any number of independent constant time operations. This socalled worktime presentation is then followed by a processor scheduling implementation ona more concrete computational model. Many parallel algorithms are geometricdecaying in the sense that the sequence of work loads is upper bounded by a decreasing geometric series. A standard scheduling implementation of such algorithms consists of a repeated application of load balancing. We present a more effective, yet as simple, policy for the utilization of load balancing in geometric decaying algorithms. By making a more careful choice of when and how often load balancing should be employed, and by using a simple amortization argument, we showthat the number of required applications of load balancing should be nearlyconstant. The policy is not restricted to any particular model of parallel computation, and, up to a constant factor, it is the best possible.
Abstract
, 1996
"... Thispaperintroducesthe queueread, queuewrite(qrqw)parallelrandomaccess machine(pram)model,whichpermitsconcurrentreadingandwritingtosharedmemorylocations,butatacostproportionaltothenumberofreaders/writerstoanyone memorylocationinagivenstep.Priortothisworktherewerenoformalcomplexity modelsthataccoun ..."
Abstract
 Add to MetaCart
Thispaperintroducesthe queueread, queuewrite(qrqw)parallelrandomaccess machine(pram)model,whichpermitsconcurrentreadingandwritingtosharedmemorylocations,butatacostproportionaltothenumberofreaders/writerstoanyone memorylocationinagivenstep.Priortothisworktherewerenoformalcomplexity modelsthataccountedforthecontentiontomemorylocations,despiteitslargeimpact ontheperformanceofparallelprograms.The qrqw prammodelreectsthecontention propertiesofmostcommerciallyavailableparallelmachinesmoreaccuratelythaneither thewellstudied crcw pramor erew prammodels:the crcwmodeldoesnotadequatelypenalizealgorithmswithhighcontentiontosharedmemorylocations,whilethe erewmodelistoostrictinitsinsistenceonzerocontentionateachstep. The qrqw pramisstrictlymorepowerfulthanthe erew pram.Thispapershows aseparationof p lg nbetweenthetwomodels,andpresentsfasterandmoreecient qrqwalgorithmsforseveralbasicproblems,suchaslinearcompaction,leaderelection,andprocessorallocation.Furthermore,wepresentaworkpreservingemulationof the qrqw pramwithonlylogarithmicslowdownonValiant's bspmodel,andhence onhypercubetypenoncombiningnetworks, even when latency, synchronization, and memory granularity overheads are taken into account.Thismatchesthebestknown emulationresultforthe erew pram,andconsiderablyimprovesuponthebestknown ecientemulationforthe crcw pramonsuchnetworks.Finally,thepaperpresents severallowerboundresultsforthismodel,includinglowerboundsonthetimerequired forbroadcastingandforleaderelection.