## Randomized Priority Queues for Fast Parallel Access (1997)

### Cached

### Download Links

- [www.mpi-sb.mpg.de]
- [ftp.ira.uka.de]
- [algo2.iti.uni-karlsruhe.de]
- [algo2.iti.kit.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | Journal of Parallel and Distributed Computing |

Citations: | 11 - 1 self |

### BibTeX

@ARTICLE{Sanders97randomizedpriority,

author = {Peter Sanders},

title = {Randomized Priority Queues for Fast Parallel Access},

journal = {Journal of Parallel and Distributed Computing},

year = {1997},

volume = {49},

pages = {86--97}

}

### OpenURL

### Abstract

Applications like parallel search or discrete event simulation often assign priority or importance to pieces of work. An effective way to exploit this for parallelization is to use a priority queue data structure for scheduling the work; but a bottleneck free implementation of parallel priority queue access by many processors is required to make this approach scalable. We present simple and portable randomized algorithms for parallel priority queues on distributed memory machines with fully distributed storage. Accessing O(n) out of m elements on an n-processor network with diameter d requires amortized time O with high probability for many network types. On logarithmic diameter networks, the algorithms are as fast as the best previously known EREWPRAM methods. Implementations demonstrate that the approach is already useful for medium scale parallelism.

### Citations

8542 | Introduction to Algorithms - Cormen, Leiserson, et al. - 1990 |

1311 |
Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. MorganKau man
- Leighton
- 1992
(Show Context)
Citation Context ...hypercubes and related constant degree networks (butterfly, perfect shuffle, : : : ) or a combination of a multistage network for routing and a tree network. All the necessary results can be found in =-=[17]-=-. 2.2 Analysis of Randomized Algorithms The analysis of the randomized algorithms described here is based on the notion of behavior with high probability. Among the various variants of this notion we ... |

697 |
Parallel discrete event simulation
- Fujimoto
- 1989
(Show Context)
Citation Context ...r is not closely adhered to, other load units may be more difficult to process or superfluous work may be necessary. One example for priorities are time-stamps in optimistic discrete event simulation =-=[18, 8]-=-. A sequential simulator processing events in time stamp order never has to perform a roll back. For parallel simulation this is not possible. But the closer the simulator adheres to the time-stamp or... |

628 |
MPI: The Complete Reference
- Snir, Otto, et al.
- 1995
(Show Context)
Citation Context ...der to get an idea how practical the bottleneck free priority queues are on contemporary machines, we have implemented the algorithm in a portable way using the library MPI (Message Passing Interface =-=[31]-=-). Since a quite large message startup overhead is common, it was clear from the start that the cost of local queue access was not so important. Also, the size of the queue elements was bound to have ... |

497 | LogP: Towards a Realistic Model of Parallel Computation
- CULLER, KARP, et al.
- 1993
(Show Context)
Citation Context ...y multiplying the counts with the execution times T Routing (n), T Broadcast (n), T Reduction (n), T Prefix (n) and T p Sort (n). The results are also easy to translate into abstract models like LogP =-=[5]-=- or BSP [19] although this may be less accurate for some machines with tuned implementations for the above collective operations. In order to simplify the discussion, we define a common upper bound T ... |

135 | Synchronization and communication in the T3E multiprocessor
- Scott
- 1996
(Show Context)
Citation Context ... server processor receiving 3 Partly, because there are few commercial machines with sufficient hardware support for message passing. (With the notable exception of the shared memory machine Cray T3E =-=[29]-=-.) 4 Unfortunately we cannot give the average execution time since at the time of the measurements the machine was operated in such a way that some huge delays due to external reasons occured. But the... |

100 |
The Art of Computer Programming: Sorting and Searching, volume 3
- Knuth
- 1973
(Show Context)
Citation Context ... it is false, elements larger than minsQ 1 can be excluded from consideration for this iteration. Queue maintenance costs can be reduced by using the leftist tree variant for representing Q 1 and Q 0 =-=[15]-=-. Emptying Q 0 into Q 1 can then be performed in time O(log m) by merging the two trees. In addition, those elements which are immediately fetched back into Q 1 when deleteMin is called after the end ... |

83 | Universal computing
- McColl
(Show Context)
Citation Context ...ng the counts with the execution times T Routing (n), T Broadcast (n), T Reduction (n), T Prefix (n) and T p Sort (n). The results are also easy to translate into abstract models like LogP [5] or BSP =-=[19]-=- although this may be less accurate for some machines with tuned implementations for the above collective operations. In order to simplify the discussion, we define a common upper bound T coll such th... |

75 |
Tarjan, “Relaxed heaps: An alternative to Fibonacci heaps with applications to parallel computation
- Driscoll, Gabow, et al.
- 1988
(Show Context)
Citation Context ...iven in [19]. Future Work We are currently working on a refinement of the algorithm which also supportsdelete* anddecreaseKey* efficiently. This can be done by representing Q0 and Q1 as relaxed heaps =-=[6]-=- and introducing an additional relaxed heap Qd holding decreased keys. Elements are placed in a reproducible way using a hash function. The probabilistic analysis still applies as long as the operatio... |

62 |
Randomized parallel algorithms for backtrack search and branch-and-bound computation
- Karp, Zhang
- 1993
(Show Context)
Citation Context ...ammaffl 2 np=2 for 0 ! ffl ! 1 (5) P [Xsffnp]se (1\Gammalog ff)ffnp for ff ? 1 (6) (Throughout this paper log denotes the natural logarithm.) 2.3 Branch-and-Bound We adopt the model of Karp and Zhang =-=[14]-=-. Let H denote the search tree with a set of nodes V . Node degrees are bounded by a constant. All node costs c(v) are assumed to be different and c(v) is monotonously increasing on any path from the ... |

57 |
Probabilistic Parallel Algorithms for Sorting and Selection
- Reischuk
- 1985
(Show Context)
Citation Context ...he smallest elements, those which are certainly not among the smallest ones, and a (hopefully small) set of remaining candidates for the next iteration. (Very similar algorithms are also described in =-=[25, 21]-=-.) First, a random sample of size n 1 2 is selected. (It simplifies the analysis to assume that this is done with replacement, i.e., elements may be selected for 10 multiple samples). We then rank the... |

56 | Diffracting trees
- Shavit, Zemach
- 1996
(Show Context)
Citation Context ... in reserve which can be retrieved completely asynchronously using a distributed FIFO queue. Such a data structure can be implemented in a bottleneck-free way using parallel counting algorithms (e.g. =-=[30]-=-). When (or even before) this reserve is exhausted a new batch of (say n) elements is retrieved. A price we pay for this approach is that the elements in the reserve might already be outdated when the... |

51 | Concurrent Access of Priority Queues
- Rao, Kumar
- 1988
(Show Context)
Citation Context ...ch a way that an individual access takes constant time (e.g., [23]) but in practice this saving may be more than offset by a worsening of the communication bottleneck at the access point to the queue =-=[24]-=-. More scalable algorithms exploit the fact that our definition of parallel priority queue calls for a method to quickly remove a rather large number of elements at once. One approach is to use a gene... |

22 |
Randomized Parallel Selection
- Rajasekaran
- 1990
(Show Context)
Citation Context ...he smallest elements, those which are certainly not among the smallest ones, and a (hopefully small) set of remaining candidates for the next iteration. (Very similar algorithms are also described in =-=[25, 21]-=-.) First, a random sample of size n 1 2 is selected. (It simplifies the analysis to assume that this is done with replacement, i.e., elements may be selected for 10 multiple samples). We then rank the... |

19 |
Parallel heap: An optimal parallel priority queue
- Deo, Prasad
- 1992
(Show Context)
Citation Context ...nd to substitute the compare and exchange operations of the usual heap algorithm by parallel sorting and merging operations. n insertions and deletions can be performed in time O(log m) on EREW PRAMs =-=[7]-=- and on pipelined hypercubes [6]. (This algorithm requires newly inserted elements to be sorted, so we must add another O(log n log log n) or ~ O (log n) term for the sorting operation.) The parallel ... |

19 | Selection on the bulk-synchronous parallel model with applications to priority queues
- Gerbessiotis, Siniolakis
- 1996
(Show Context)
Citation Context ... and by modifying an efficient selection algorithm to exploit that the elements are randomly distributed. The present paper further refines this approach. The same selection algorithm is also used in =-=[9, 1]-=- for accessing k AE n priority queue elements in parallel. These algorithms can be considered a combination of k-bandwidth heaps and random placement. 4 An Efficient Algorithm and its Analysis We now ... |

15 | Prioritization in parallel symbolic computing
- Kale, Ramkumar, et al.
- 1993
(Show Context)
Citation Context ...in Section 2.3). Even if the sequential algorithm does not explicitly use prioritization, it often makes sense to use the sequential evaluation order itself as the priority for parallel execution. In =-=[12, 13]-=- this turns out to reduce (adverse) speedup anomalies in particular in presence of strong heuristics. For all these applications, an attractive approach to parallelization is to manage a global priori... |

15 |
A simpler analysis of the Karp-Zhang parallel branch-and-bound method
- RANADE
- 1990
(Show Context)
Citation Context ...ribute the elements over more or less independent local queues which exchange elements in order to approximate the behavior of a global priority queue. For example, in the algorithm of Karp and Zhang =-=[14, 22]-=- newly inserted elements are sent to randomly selected PEs while deleteMin requests simply access the 7 locally present queue. For branch-and-bound this only increases the number of expanded nodes by ... |

15 | Parallelism and locality in priority queues
- Ranade, Cheng, et al.
- 1994
(Show Context)
Citation Context ...s the size of the queue. In principle this can be slightly improved using algorithms which are able to pipeline up to log m requests in such a way that an individual access takes constant time (e.g., =-=[23]-=-) but in practice this saving may be more than offset by a worsening of the communication bottleneck at the access point to the queue [24]. More scalable algorithms exploit the fact that our definitio... |

15 | Fast priority queues for parallel branch-and-bound
- Sanders
- 1995
(Show Context)
Citation Context ...ed random routing and parallel selection. The basic approach has been independently developed in [28] and [23]. These simple versions are already asymptotically optimal on mesh connected machines. In =-=[26]-=- improvements for logarithmic diameter networks like butterflies are introduced by avoiding work imbalance due to local queue access and by modifying an efficient selection algorithm to exploit that t... |

11 | On the network complexity of selection
- Plaxton
- 1989
(Show Context)
Citation Context ...algorithm from Section 4.3 is of independent interest since with its execution time in ~ O (T coll ) for randomly placed data it beats the worst case lower bound for deterministic algorithms given in =-=[20]-=-. Future Work An efficient and portable implementation of asynchronous parallel priority queues raises some interesting questions. On the implementation side, there is some hope that in the near futur... |

10 |
Optimal and Load Balanced Mapping of Parallel Priority Queues in Hypercubes
- Das, Pinotti, et al.
- 1996
(Show Context)
Citation Context ... exchange operations of the usual heap algorithm by parallel sorting and merging operations. n insertions and deletions can be performed in time O(log m) on EREW PRAMs [7] and on pipelined hypercubes =-=[6]-=-. (This algorithm requires newly inserted elements to be sorted, so we must add another O(log n log log n) or ~ O (log n) term for the sorting operation.) The parallel sorting and merging routines req... |

10 |
Lastverteilungsalgorithmen für parallele Tiefensuche. Number 463
- Sanders
- 1997
(Show Context)
Citation Context ...o more complex results. In this paper we need the following rules which we present without proof because they are based on quite straightforward elementary probability theory. (Proofs can be found in =-=[27]-=-.) 4 Lemma 1. Let X 1 2 ~ O (f 1 ), : : : , X k 2 ~ O (f k ) be random variables (k constant) . k O i=1 X i 2 ~ O / k O i=1 f i ! for O 2 n max; X ; Yo (1) Lemma 2. Let fX 1 ; : : : ; Xm g ` ~ O (f) b... |

10 |
Constraint solving for combinatorial search problems: a tutorial,” in Principle and Practice of Constraint Programming CP’95
- Hentenryck
- 1995
(Show Context)
Citation Context ...over centralized ones. The usage pattern and even the type of operations used for a parallel priority data structure very much depends on the underlying application. For example, branch-and-cut (e.g. =-=[32]-=-) can be considered a variant of branch-andbound where most nodes have degree one and it is more efficient to perform the expansion of a child node on the PE of the parent. In this context an operatio... |

10 |
Optimal parallel initialization algorithms for a class of priority queues
- Olariu, Wen
- 1991
(Show Context)
Citation Context ...on is initialization. This is even faster than batched insertion because we can use the linear time sequential initialization algorithm locally — we do not need specialized algorithms as described in =-=[18]-=- for k-bandwidth heaps. Note that if a batch of ω(m/logm) is to be inserted, it is faster to reinitialize all the local queues rather than inserting all new elements inidividually. 5.3 Asynchronous Op... |

9 | Realistic parallel algorithms: Priority queue operations and selection for the BSP model
- Baumker, Dittrich, et al.
- 1996
(Show Context)
Citation Context ... and by modifying an efficient selection algorithm to exploit that the elements are randomly distributed. The present paper further refines this approach. The same selection algorithm is also used in =-=[9, 1]-=- for accessing k AE n priority queue elements in parallel. These algorithms can be considered a combination of k-bandwidth heaps and random placement. 4 An Efficient Algorithm and its Analysis We now ... |

6 | A parallel priority data structure with applications - Brodal, Traff, et al. - 1997 |

6 |
2d-Bubblesorting in average time O(N lg
- Ierardi
- 1994
(Show Context)
Citation Context ...ed Algorithms The analysis of the randomized algorithms described here is based on the notion of behavior with high probability. Among the various variants of this notion we have adopted the one from =-=[11]-=-. Definition 1. A positive real valued random variable X is in O(f(n)) with high probability -- or X 2 ~ O (f(n)) for short -- iff 8fi ? 0 : 9c ? 0; n 0 ? 0 : 8nsn 0 : P [X ? cf(n)]sn \Gammafi ; i.e.,... |

5 | Priority queues on parallel machines
- Brodal
(Show Context)
Citation Context ...eloping algorithms along the lines of Section 5.3. An interesting topic that was not relevant for the present implementation is the question which sequential priority queues should be used as a basis =-=[16, 2]-=-. The local queue implementation might become relevant if more efficient communication interfaces are used (which are currently only available as proprietary systems on a few machines.) This might in ... |

5 |
Parallel heap
- Deo, Prasad
- 1990
(Show Context)
Citation Context ...change operations of the usual heap algorithm by sorting and merging operations. For a recent literature survey refer to [4]. n insertions and deletions can be performed in time O(logm) on EREW PRAMs =-=[5]-=- and on pipelined hypercubes [4]. This algorithm requires newly inserted elements to be sorted, so we must add another O(lognloglogn) or Õ(logn) term for the sorting operation. The parallel sorting an... |

3 |
2d-Bubblesorting in average time O( p N lgN
- Ierardi
- 1994
(Show Context)
Citation Context ...ed Algorithms The analysis of the randomized algorithms described here is based on the notion of behavior with high probability. Among the various variants of this notion we have adopted the one from =-=[10]-=-. Definition 1. A positive real valued random variable X is in O( f (n)) with high probability – or X ∈ Õ( f (n)) for short – iff ∀β > 0 : ∃c > 0,n0 > 0 : ∀n ≥ n0 : P[X > c f (n)] ≤ n −β , i.e., the p... |

2 |
Load balanced priority queue implementations on distributed memory parallel machines
- Gupta, Phoutiou
- 1994
(Show Context)
Citation Context ... these algorithms are slower on weaker models of parallel computation. On single-ported hypercubes and meshes, access times of O(log m log n) respectively O \Gamma p n log m \Delta have been achieved =-=[10, 23]-=-. A radical approach is to relax the priority queue semantics and to distribute the elements over more or less independent local queues which exchange elements in order to approximate the behavior of ... |

2 |
Diskrete Simulation - Prinzipien und Probleme der Effizienzsteigerung durch Parallelisierung
- Mattern, Mehl
- 1989
(Show Context)
Citation Context ...r is not closely adhered to, other load units may be more difficult to process or superfluous work may be necessary. One example for priorities are time-stamps in optimistic discrete event simulation =-=[18, 8]-=-. A sequential simulator processing events in time stamp order never has to perform a roll back. For parallel simulation this is not possible. But the closer the simulator adheres to the time-stamp or... |

2 |
Flaschenhalsfreie parallele Priority queues
- Sanders
- 1994
(Show Context)
Citation Context ...ts for a deleteMin* operation. It turns out that instead of parallel sorting and merging we now only need random routing and parallel selection. The basic approach has been independently developed in =-=[28]-=- and [23]. These simple versions are already asymptotically optimal on mesh connected machines. In [26] improvements for logarithmic diameter networks like butterflies are introduced by avoiding work ... |

1 |
Efficient graph coloring with prioritization
- Kale, Richards, et al.
- 1995
(Show Context)
Citation Context ...in Section 2.3). Even if the sequential algorithm does not explicitly use prioritization, it often makes sense to use the sequential evaluation order itself as the priority for parallel execution. In =-=[12, 13]-=- this turns out to reduce (adverse) speedup anomalies in particular in presence of strong heuristics. For all these applications, an attractive approach to parallelization is to manage a global priori... |

1 |
Concurrent data structures for tree structured search algorithms
- Cun, Roucairol
- 1995
(Show Context)
Citation Context ...eloping algorithms along the lines of Section 5.3. An interesting topic that was not relevant for the present implementation is the question which sequential priority queues should be used as a basis =-=[16, 2]-=-. The local queue implementation might become relevant if more efficient communication interfaces are used (which are currently only available as proprietary systems on a few machines.) This might in ... |