Results 1 - 10
of
28
Fast and Scalable Priority Queue Architecture for High-Speed Network Switches
, 2000
"... In this paper, we present a fast and scalable pipelined priority queue architecture for use in high-performance switches with support for fine-grained quality of service (QoS) guarantees. Priority queues are used to implement highest-priority-first scheduling policies. Our hardware architecture i ..."
Abstract
-
Cited by 30 (7 self)
- Add to MetaCart
In this paper, we present a fast and scalable pipelined priority queue architecture for use in high-performance switches with support for fine-grained quality of service (QoS) guarantees. Priority queues are used to implement highest-priority-first scheduling policies. Our hardware architecture is based on a new data structure called a Pipelined heap, or P-heap for short. This data structure enables the pipelining of the enqueue and dequeue operations, thereby allowing these operations to execute in essentially constant time. In addition to being very fast, the architecture also scales very well to a large number of priority levels and to large queue sizes. We give a detailed description of this new data structure, the associated algorithms and the corresponding hardware implementation. We have implemented this new architecture using a 0.35 micron CMOS technology.
An Efficient Algorithm for Concurrent Priority Queue Heaps
- Inf. Proc. Letters
, 1996
"... We present a new algorithm for concurrent access to array-based priority queue heaps. Deletions proceed top-down as they do in a previous algorithm due to Rao and Kumar [6], but insertions proceed bottom-up, and consecutive insertions use a bit-reversal technique to scatter accesses across the fring ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
We present a new algorithm for concurrent access to array-based priority queue heaps. Deletions proceed top-down as they do in a previous algorithm due to Rao and Kumar [6], but insertions proceed bottom-up, and consecutive insertions use a bit-reversal technique to scatter accesses across the fringe of the tree, to reduce contention. Because insertions do not have to traverse the entire height of the tree (as they do in previous work), as many as O(M) operations can proceed in parallel, rather than O(log M) on a heap of size M . Experimental results on a Silicon Graphics Challenge multiprocessor demonstrate good overall performance for the new algorithm on small heaps, and significant performance improvements over known alternatives on large heaps with mixed insertion/deletion workloads. This work was supported in part by NSF grants nos. CDA-8822724 and CCR-9319445, and by ONR research grant no. N00014-92-J-1801 (in conjunction with the DARPA Research in Information Science and Tech...
Selection on the Bulk-Synchronous Parallel Model with Applications to Priority Queues
- IN PROCEEDINGS OF THE 1996 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS
, 1996
"... In this paper we present a new randomized selection algorithm on the Bulk-Synchronous Parallel (BSP) model of computation, along with an application of this algorithm to dynamic data structures, namely Parallel Priority Queues (PPQs). We show that our methods improve previous results upon both the c ..."
Abstract
-
Cited by 19 (7 self)
- Add to MetaCart
In this paper we present a new randomized selection algorithm on the Bulk-Synchronous Parallel (BSP) model of computation, along with an application of this algorithm to dynamic data structures, namely Parallel Priority Queues (PPQs). We show that our methods improve previous results upon both the communication requirements and the amount of parallel slack required to achieve optimal performance. We also establish that optimality to within small multiplicative constant factors can be achieved for a wide range of parallel machines. While these algorithms are fairly simple themselves, descriptions of their performance in terms of the BSP parameters is somewhat involved. The main reward of quantifying these complications is that it allows transportable software to be written for parallel machines that fit the model. We also present experimental results for the selection algorithm that reinforce our claims. 1 Introduction and the BSP Model The main technical contribution of this work is ...
Lazy Queue: A new approach to implementing the Pending-event Set
"... In discrete event simulation, very often the future event set is represented by a priority queue. The data structure used to implement the queue and the way operations are performed on it are often crucial to the execution time of a simulation. In this paper a new priority queue implementation strat ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
In discrete event simulation, very often the future event set is represented by a priority queue. The data structure used to implement the queue and the way operations are performed on it are often crucial to the execution time of a simulation. In this paper a new priority queue implementation strategy, the Lazy Queue, is presented. It is tailored to handle operations on the pending event set efficiently. The Lazy Queue is a kind of multi-list data structure that delays the sorting process until a point near the time where the elements are to be dequeued. In this way, the time needed to sort new elements in the queue is reduced. We have performed several experiments comparing queue access times with the access times of the implicit heap and the calendar queue. Our experimental results indicate that the Lazy Queue is superior to these priority queue implementations. Key words: Discrete Event Simulation, Priority Queue, Event List implementation, performance measurement. 1 Introduction...
Parallelism and Locality in Priority Queues
- In Sixth IEEE Sypmposium on Parallel and Distributed Processing
, 1994
"... We explore two ways of incorporating parallelism into priority queues. The first is to speed up the execution of individual priority operations so that they can be performed one operation per time step, unlike sequential implementations which require O(log N ) time steps per operation for an N eleme ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
We explore two ways of incorporating parallelism into priority queues. The first is to speed up the execution of individual priority operations so that they can be performed one operation per time step, unlike sequential implementations which require O(log N ) time steps per operation for an N element heap. We give an optimal parallel implementation that uses a linear array of O(log N ) processors. Second, we consider parallel operations on the priority queue. We show that using a d-dimensional array (constant d) of P processors we can insert or delete the smallest P elements from a heap in time O(P 1=d log 1\Gamma1=d P ), where the number of elements in the heap is assumed to be polynomial in P . We also show a matching lower bound, based on communication complexity arguments, for a range of deterministic implementations. Finally, using randomization, we show that the time can be reduced to the optimal O(P 1=d ) time with high probability. 1 Introduction Much of the theoret...
Fast Priority Queues for Parallel Branch-and-Bound
- In Workshop on Algorithms for Irregularly Structured Problems, number 980 in LNCS
, 1995
"... . Currently used parallel best first branch-and-bound algorithms either suffer from contention at a centralized priority queue or can only approximate the best first strategy. Bottleneck free algorithms for parallel priority queues are known but they cannot be implemented very efficiently on contemp ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
. Currently used parallel best first branch-and-bound algorithms either suffer from contention at a centralized priority queue or can only approximate the best first strategy. Bottleneck free algorithms for parallel priority queues are known but they cannot be implemented very efficiently on contemporary machines. We present quite simple randomized algorithms for parallel priority queues on distributed memory machines. For branch-and-bound they are asymptotically as efficient as previously known PRAM algorithms with high probability. The simplest versions require not much more communication than the approximated branch-and-bound algorithm of Karp and Zhang. Keywords: Analysis of randomized algorithms, distributed memory, load balancing, median selection, parallel best first branch-and-bound, parallel pritority queue. 1 Introduction Branch-and-bound search is an important technique for many combinatorial optimization problems. Since it can be a quite time consuming technique, paralleli...
Parallel Priority Queues
, 1991
"... This paper introduces the Parallel Priority Queue (PPQ) abstract data type. A PPQ stores a set of integer-valued items and provides operations such as insertion of n new items or deletion of the n smallest ones. Algorithms for realizing PPQ operations on an n-processor CREW-PRAM are based on two new ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
This paper introduces the Parallel Priority Queue (PPQ) abstract data type. A PPQ stores a set of integer-valued items and provides operations such as insertion of n new items or deletion of the n smallest ones. Algorithms for realizing PPQ operations on an n-processor CREW-PRAM are based on two new data structures, the n-Bandwidth-Heap (n-H) and the n-Bandwidth- Leftist-Heap (n-L), that are obtained as extensions of the well known sequential binary-heap and leftist-heap, respectively. Using these structures, it is shown that insertion of n new items in a PPQ of m elements can be performed in parallel time O(h + log n), where h = log m n , while deletion of the n smallest items can be performed in time O(h + log log n). Keywords Data structures, parallel algorithms, analysis of algorithms, heaps, PRAM model. This work has been partly supported by the Ministero della Pubblica Istruzione of Italy and by the C.N.R. project "Sistemi Informatici e Calcolo Parallelo" y Istituto di Ela...
The Performance of Concurrent Data Structure Algorithms
- Transactions on Database Systems
, 1994
"... This thesis develops a validated model of concurrent data structure algorithm performance, concentrating on concurrent B-trees. The thesis first develops two analytical tools, which are explained in the next two paragraphs, for the analysis. Yao showed that the space utilization of a B-tree built fr ..."
Abstract
-
Cited by 13 (9 self)
- Add to MetaCart
This thesis develops a validated model of concurrent data structure algorithm performance, concentrating on concurrent B-trees. The thesis first develops two analytical tools, which are explained in the next two paragraphs, for the analysis. Yao showed that the space utilization of a B-tree built from random inserts is 69%. Assuming that nodes merge only when empty, we show that the utilization is 39% when the number of insert and delete operations is the same. However, if there are just 5% more inserts than deletes, then the utilization is at least 62%. In addition to the utilization, we calculate the probabilities of splitting and merging, important parameters for calculating concurrent B-tree algorithm performance. We compare merge-at-empty B-trees with merge-at-half B-trees. We conclude that merge-at-empty Btrees have a slightly lower space utilization but a much lower restructuring rate than merge-at-half B-trees, making merge-at-empty B-trees preferable for concurrent B-tree algo...
Scalable Concurrent Priority Queue Algorithms
- In Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
, 1999
"... This paper addresses the problem of designing bounded range priority queues, that is, queues that support a fixed range of priorities. Bounded range priority queues are fundamental in the design of modern multiprocessor algorithms -- from the application level to lowest levels of the operating sy ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
This paper addresses the problem of designing bounded range priority queues, that is, queues that support a fixed range of priorities. Bounded range priority queues are fundamental in the design of modern multiprocessor algorithms -- from the application level to lowest levels of the operating system kernel. While most of the available priority queue literature is directed at existing small-scale machines, we chose to evaluate algorithms on a broader concurrency scale using a simulated 256 node shared memory multiprocessor architecture similar to the MIT Alewife. Our empirical evidence suggests that the priority queue algorithms currently available in the literature do not scale. Based on these findings, we present two simple new algorithms, LinearFunnels and FunnelTree, that provide true scalability throughout the concurrency range. 1 Introduction Priority queues are a fundamental class of data structures used in the design of modern multiprocessor algorithms. Their uses r...
Concurrent Heaps on the BSP Model
, 1996
"... In this paper we present a new randomized selection algorithm on the Bulk-Synchronous Parallel (BSP) model of computation along with an application of this algorithm to dynamic data structures, namely Parallel Priority Queues (PPQs). We show that our algorithms improve previous results upon both the ..."
Abstract
-
Cited by 11 (11 self)
- Add to MetaCart
In this paper we present a new randomized selection algorithm on the Bulk-Synchronous Parallel (BSP) model of computation along with an application of this algorithm to dynamic data structures, namely Parallel Priority Queues (PPQs). We show that our algorithms improve previous results upon both the communication requirements and the amount of parallel slack required to achieve optimal performance. We also establish that optimality to within small multiplicative constant factors can be achieved for a wide range of parallel machines. While these algorithms are fairly simple themselves, descriptions of their performance in terms of the BSP parameters is somewhat involved. The main reward of quantifying these complications is that it allows transportable software to be written for parallel machines that fit the model. We also present experimental results for the selection algorithm that reinforce our claims.

