## Load Sharing with Parallel Priority Queues (1991)

Venue: | Center for |

Citations: | 3 - 0 self |

### BibTeX

@TECHREPORT{Parberry91loadsharing,

author = {Ian Parberry},

title = {Load Sharing with Parallel Priority Queues},

institution = {Center for},

year = {1991}

}

### OpenURL

### Abstract

For maximum efficiency in a multiprocessor system the load should be shared evenly over all processors, that is, there should be no idle processors when tasks are available. The delay in a load sharing algorithm is the larger of the maximum time that any processor can be idle before a task is assigned to it, and the maximum time that it must wait to be relieved of an excess task. A simple parallel priority queue architecture for load sharing in a p-processor multiprocessor system is proposed. This architecture uses O(p log(n=p)) special-purpose processors (where n is the maximal size of the priority queue), an interconnection pattern of bounded degree, and achieves delay O(logp), which is optimal for any bounded degree system. 1 Introduction One advantage that multiprocessor computers have over uniprocessors is the ability to speed up computation by having the processors compute in parallel. The archetypal model studied is the PRAM, in which it is assumed that concurrent access to a ...

### Citations

499 | Sorting networks and their applications
- Batcher
- 1968
(Show Context)
Citation Context ...ther processor is replaced by a constant degree network of p processors (such as the shuffle-exchange [28] or cube-connected cycles [23]) which implement Batcher's odd-even merging algorithm (Batcher =-=[5]-=-). This gives a delay of O(log p) on 8 9 15 19 26 27 32 45 20 25 30 40 6 1 2 5 3 10 50 23 8 9 15 19 26 27 32 45 20 25 30 40 6 1 2 5 8 9 15 19 26 27 32 45 20 40 6 1 2 5 3 10 50 23 25 30 8 9 15 19 26 27... |

253 | Sorting and Searching, volume 3 of The Art of Computer Programming - Knuth - 1998 |

247 | A taxonomy of scheduling in general-purpose distributed computing systems
- Casavant, Kuhl
- 1988
(Show Context)
Citation Context ...cause of assumption 8 above must be\Omega\Gamma297 p). Our algorithm falls into the global, dynamic, physically distributed, cooperative, suboptimal, one-time assignment category of Casavant and Kuhl =-=[7]-=-. The delay in a load sharing algorithm is the larger of the maximum time that any processor can be idle before a task is assigned to it and the maximum time that it must wait to be relieved of a newl... |

211 |
A taxonomy of problems with fast parallel algorithms
- Cook
- 1985
(Show Context)
Citation Context ... system by a factor of \Theta(log p), which can be achieved using Columnsort [15] and techniques described in Parberry [19]. However, many of the fastest parallel algorithms (for example, those in NC =-=[8, 22]-=-) require a massive amount of communication. It is not unusual to require interprocessor communication from distant processors for almost every local instruction executed. It is unreasonable to expect... |

207 |
An 0(nlogn) sorting network
- Ajtai, Komlós, et al.
- 1983
(Show Context)
Citation Context ...s of the heap. We describe how to extend our algorithm to the parallel priority queue, giving O(log p) delay with O(p log(n=p)) processors using the AKS sorting network (Ajtai, Koml'os and Szemer'edi =-=[1, 2]-=-). The use of the AKS sorting network unfortunately results in an extremely large constant multiple (more than 6000, see Paterson [21]) in the delay and processor bounds. If a weaker form of load shar... |

207 |
The cubeconnected cycles: A versatile network for parallel computation
- Preparata, Vuillemin
- 1981
(Show Context)
Citation Context ...sing the AKS sorting network and Leighton's Columnsort [15]. Every other processor is replaced by a constant degree network of p processors (such as the shuffle-exchange [28] or cube-connected cycles =-=[23]-=-) which implement Batcher's odd-even merging algorithm (Batcher [5]). This gives a delay of O(log p) on 8 9 15 19 26 27 32 45 20 25 30 40 6 1 2 5 3 10 50 23 8 9 15 19 26 27 32 45 20 25 30 40 6 1 2 5 8... |

195 |
How to Emulate Shared Memory
- Ranade
- 1987
(Show Context)
Citation Context ...t of communication. It is not unusual to require interprocessor communication from distant processors for almost every local instruction executed. It is unreasonable to expect even Ranade's algorithm =-=[26]-=- to provide the interprocessor communication bandwidth that is required by these algorithms. The world faced by programmers of today's parallel computers is vastly different from that of the theoretic... |

182 |
NP-complete scheduling problems
- Ullman
- 1975
(Show Context)
Citation Context ...arbitrary set of jobs within the smallest amount of elapsed time. An efficient algorithm to find the optimal schedule under these conditions is highly unlikely since the problem is NPcomplete (Ullman =-=[29]-=-). There are many papers on the load sharing problem that assume some probability distribution on the arrival of the tasks, and measure a performance metric designed to quantify the average throughput... |

168 |
Tight Bounds on the Complexity of Parallel Sorting
- Leighton
- 1984
(Show Context)
Citation Context ...twork of processors each with a local memory. Such an implementation requires an increase in running time for a p processor system by a factor of \Theta(log p), which can be achieved using Columnsort =-=[15]-=- and techniques described in Parberry [19]. However, many of the fastest parallel algorithms (for example, those in NC [8, 22]) require a massive amount of communication. It is not unusual to require ... |

164 |
Parallel processing with the perfect shuffle
- Stone
- 1971
(Show Context)
Citation Context ...of O(p) processors that sort using the AKS sorting network and Leighton's Columnsort [15]. Every other processor is replaced by a constant degree network of p processors (such as the shuffle-exchange =-=[28]-=- or cube-connected cycles [23]) which implement Batcher's odd-even merging algorithm (Batcher [5]). This gives a delay of O(log p) on 8 9 15 19 26 27 32 45 20 25 30 40 6 1 2 5 3 10 50 23 8 9 15 19 26 ... |

148 |
Algorithm 232: Heapsort
- Williams
- 1964
(Show Context)
Citation Context ...t halving networks. A preliminary version of this paper appears in Parberry [20]. 2 Ragged Heaps We assume that the reader is familiar with the heap implementation of the ADT priority queue (Williams =-=[31]-=-). The root of a heap is said to be at level 1. The child of a node at level i is said to be at level i + 1. A ragged heap is a heap with the following modifications. There are at most O(log n) empty ... |

124 |
Load Sharing in Distributed Systems
- WANG, MORRIS
- 1985
(Show Context)
Citation Context ...em that assume some probability distribution on the arrival of the tasks, and measure a performance metric designed to quantify the average throughput of the system (for a survey, see Wang and Morris =-=[30]-=-). We will instead measure the maximum amount of time that a processor must remain idle at a stretch, which because of assumption 8 above must be\Omega\Gamma297 p). Our algorithm falls into the global... |

111 |
Sorting in c log n parallel steps
- Ajtai, Komlós, et al.
- 1983
(Show Context)
Citation Context ...s of the heap. We describe how to extend our algorithm to the parallel priority queue, giving O(log p) delay with O(p log(n=p)) processors using the AKS sorting network (Ajtai, Koml'os and Szemer'edi =-=[1, 2]-=-). The use of the AKS sorting network unfortunately results in an extremely large constant multiple (more than 6000, see Paterson [21]) in the delay and processor bounds. If a weaker form of load shar... |

73 |
On simultaneous resource bounds
- Pippenger
- 1979
(Show Context)
Citation Context ... system by a factor of \Theta(log p), which can be achieved using Columnsort [15] and techniques described in Parberry [19]. However, many of the fastest parallel algorithms (for example, those in NC =-=[8, 22]-=-) require a massive amount of communication. It is not unusual to require interprocessor communication from distant processors for almost every local instruction executed. It is unreasonable to expect... |

66 |
Data broadcasting in SIMD computers
- Nassimi, Sahni
- 1981
(Show Context)
Citation Context ...e tasks of highest priority. Steps 1 and 3 of the insert phase, and steps 1, 4, and 5 of the deletemin phase can be implemented in O(log p) time using the Concentrate procedure from Nassimi and Sahni =-=[17]-=- (using standard techniques to implement it on a bounded degree network described in Parberry [18, Section 7.1]). Step 2 of the insert phase can be implemented in time O(log p) using standard techniqu... |

51 | Concurrent Access of Priority Queues
- Rao, Kumar
- 1988
(Show Context)
Citation Context ...e local processing. Many PRAM algorithms for standard priority queue operations (i.e. p = 1) can be found in the literature (for example, Biswas and Brown [6], Munro and Robertson [16], Rao and Kumar =-=[27]-=-, and Jones [13]). Quinn and Yoo [25] describe parallel algorithms for filling and emptying a heap on a bounded degree network, but their algorithm deadlocks if insert and deletemin operations are int... |

40 |
Concurrent Operations on Priority Queues
- Jones
- 1989
(Show Context)
Citation Context ...ng. Many PRAM algorithms for standard priority queue operations (i.e. p = 1) can be found in the literature (for example, Biswas and Brown [6], Munro and Robertson [16], Rao and Kumar [27], and Jones =-=[13]-=-). Quinn and Yoo [25] describe parallel algorithms for filling and emptying a heap on a bounded degree network, but their algorithm deadlocks if insert and deletemin operations are interleaved. Fan an... |

36 |
Parallel Complexity Theory
- Parberry
- 1987
(Show Context)
Citation Context ...e stack operations from Table 2 using the elementary operations from Table 3 includes arbitrary-length shifts. The drawbacks and advantages of using shifts are discussed at greater length in Parberry =-=[18]-=- (see the restricted arithmetic instruction set). The stack operations can then be implemented in constant time as shown in Figure 6, where s is the stack and c is a counter (both initially zero). We ... |

34 |
Improved Sorting Networks with O(log n) Depth, Algorithmica
- Paterson
- 1990
(Show Context)
Citation Context ...rs using the AKS sorting network (Ajtai, Koml'os and Szemer'edi [1, 2]). The use of the AKS sorting network unfortunately results in an extremely large constant multiple (more than 6000, see Paterson =-=[21]-=-) in the delay and processor bounds. If a weaker form of load sharing is desired, then it is sufficient to use halving networks, which separate the inputs smaller than the median from those larger tha... |

30 |
Simultaneous update of priority structures
- Biswas, Browne
- 1987
(Show Context)
Citation Context ... of the tasks of highest priority, or continue local processing. Many PRAM algorithms for standard priority queue operations (i.e. p = 1) can be found in the literature (for example, Biswas and Brown =-=[6]-=-, Munro and Robertson [16], Rao and Kumar [27], and Jones [13]). Quinn and Yoo [25] describe parallel algorithms for filling and emptying a heap on a bounded degree network, but their algorithm deadlo... |

26 |
Parallel Graph Algorithms
- Quinn, Deo
- 1984
(Show Context)
Citation Context ... the priority queue and returns them in no particular order. Parallel priority queues have applications in multiprocessor scheduling, and in parallel graph algorithms (see, for example, Quinn and Deo =-=[24]-=-). A parallel heap is a data structure with the same tree structure as a heap, but with p values per node. It also has the property that all of the values in a node are smaller than the values in its ... |

23 |
Heaps on heaps
- Gonnet, Munro
- 1986
(Show Context)
Citation Context ...e and is destroyed when it reaches a leaf. Routing of replace message to the next unused node, and routing of fetch messages to the last used node is performed using the technique of Gonnet and Munro =-=[12]-=- and Rao and Kumar [27], by having each processor keep a record of the next unused node, which it updates every time it sees a replace message when it has an empty stack. The updating can be carried o... |

13 |
Halvers and Expanders
- Ajtai, Komlos, et al.
- 1992
(Show Context)
Citation Context ...s results in a parallel priority queue algorithm with the same asymptotic time and processor bounds as before, and constant multiples that are smaller by a factor of 18. Ajtai, Koml'os and Szemer'edi =-=[3]-=- also have an improved bound on the depth of halving networks. The main body of this paper is divided into four sections. The first section describes the ragged heap data structure and the priority qu... |

9 |
Managing a parallel heap efficiently
- Das, Horng
- 1991
(Show Context)
Citation Context ...if insert and deletemin operations are interleaved. Fan and Cheng [11] have parallel priority queue algorithms for a bounded degree network with O(log p) delay on O(n+ p 2 ) processors. Das and Horng =-=[9]-=-, and Deo and Prasad [10] have PRAM algorithms with delay O(log n) on p processors. We present a new data structure for the priority queue called a ragged heap. Priority queue operations for the ragge... |

7 |
Sorting algorithms with minimum memory
- Alekseyev
- 1969
(Show Context)
Citation Context ...ich separate the inputs smaller than the median from those larger than it, instead of sorting networks. It is obvious that the depth of an n input halving network must be\Omega\Gamma261 n). Alekseyev =-=[4]-=- has shown that the size must be \Omega\Gamma n log n) (see also Knuth [14, pp. 234--235]). We demonstrate that halving networks of depth less than 327 log n exist. This results in a parallel priority... |

5 |
Parallel heap
- Deo, Prasad
- 1990
(Show Context)
Citation Context ...operations are interleaved. Fan and Cheng [11] have parallel priority queue algorithms for a bounded degree network with O(log p) delay on O(n+ p 2 ) processors. Das and Horng [9], and Deo and Prasad =-=[10]-=- have PRAM algorithms with delay O(log n) on p processors. We present a new data structure for the priority queue called a ragged heap. Priority queue operations for the ragged heap can be pipelined i... |

5 |
Data structures for the efficient solution of graph theoretic problems on tightly-coupled computers
- Quinn, Yoo
- 1984
(Show Context)
Citation Context ...hms for standard priority queue operations (i.e. p = 1) can be found in the literature (for example, Biswas and Brown [6], Munro and Robertson [16], Rao and Kumar [27], and Jones [13]). Quinn and Yoo =-=[25]-=- describe parallel algorithms for filling and emptying a heap on a bounded degree network, but their algorithm deadlocks if insert and deletemin operations are interleaved. Fan and Cheng [11] have par... |

2 |
A simultaneous access priority queue
- Fan, Cheng
- 1987
(Show Context)
Citation Context ... and Yoo [25] describe parallel algorithms for filling and emptying a heap on a bounded degree network, but their algorithm deadlocks if insert and deletemin operations are interleaved. Fan and Cheng =-=[11]-=- have parallel priority queue algorithms for a bounded degree network with O(log p) delay on O(n+ p 2 ) processors. Das and Horng [9], and Deo and Prasad [10] have PRAM algorithms with delay O(log n) ... |

2 |
Parallel algorithms and serial data structures
- Munro, Robertson
- 1979
(Show Context)
Citation Context ...priority, or continue local processing. Many PRAM algorithms for standard priority queue operations (i.e. p = 1) can be found in the literature (for example, Biswas and Brown [6], Munro and Robertson =-=[16]-=-, Rao and Kumar [27], and Jones [13]). Quinn and Yoo [25] describe parallel algorithms for filling and emptying a heap on a bounded degree network, but their algorithm deadlocks if insert and deletemi... |

1 |
Some practical simulations of impractical parallel computers
- Parberry
- 1987
(Show Context)
Citation Context ...ry. Such an implementation requires an increase in running time for a p processor system by a factor of \Theta(log p), which can be achieved using Columnsort [15] and techniques described in Parberry =-=[19]-=-. However, many of the fastest parallel algorithms (for example, those in NC [8, 22]) require a massive amount of communication. It is not unusual to require interprocessor communication from distant ... |