Results 1  10
of
53
Fast and Scalable Priority Queue Architecture for HighSpeed Network Switches
, 2000
"... In this paper, we present a fast and scalable pipelined priority queue architecture for use in highperformance switches with support for finegrained quality of service (QoS) guarantees. Priority queues are used to implement highestpriorityfirst scheduling policies. Our hardware architecture i ..."
Abstract

Cited by 34 (8 self)
 Add to MetaCart
In this paper, we present a fast and scalable pipelined priority queue architecture for use in highperformance switches with support for finegrained quality of service (QoS) guarantees. Priority queues are used to implement highestpriorityfirst scheduling policies. Our hardware architecture is based on a new data structure called a Pipelined heap, or Pheap for short. This data structure enables the pipelining of the enqueue and dequeue operations, thereby allowing these operations to execute in essentially constant time. In addition to being very fast, the architecture also scales very well to a large number of priority levels and to large queue sizes. We give a detailed description of this new data structure, the associated algorithms and the corresponding hardware implementation. We have implemented this new architecture using a 0.35 micron CMOS technology.
An Efficient Algorithm for Concurrent Priority Queue Heaps
 Inf. Proc. Letters
, 1996
"... We present a new algorithm for concurrent access to arraybased priority queue heaps. Deletions proceed topdown as they do in a previous algorithm due to Rao and Kumar [6], but insertions proceed bottomup, and consecutive insertions use a bitreversal technique to scatter accesses across the fring ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
We present a new algorithm for concurrent access to arraybased priority queue heaps. Deletions proceed topdown as they do in a previous algorithm due to Rao and Kumar [6], but insertions proceed bottomup, and consecutive insertions use a bitreversal technique to scatter accesses across the fringe of the tree, to reduce contention. Because insertions do not have to traverse the entire height of the tree (as they do in previous work), as many as O(M) operations can proceed in parallel, rather than O(log M) on a heap of size M . Experimental results on a Silicon Graphics Challenge multiprocessor demonstrate good overall performance for the new algorithm on small heaps, and significant performance improvements over known alternatives on large heaps with mixed insertion/deletion workloads. This work was supported in part by NSF grants nos. CDA8822724 and CCR9319445, and by ONR research grant no. N0001492J1801 (in conjunction with the DARPA Research in Information Science and Tech...
Selection on the BulkSynchronous Parallel Model with Applications to Priority Queues
 IN PROCEEDINGS OF THE 1996 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS
, 1996
"... In this paper we present a new randomized selection algorithm on the BulkSynchronous Parallel (BSP) model of computation, along with an application of this algorithm to dynamic data structures, namely Parallel Priority Queues (PPQs). We show that our methods improve previous results upon both the c ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
In this paper we present a new randomized selection algorithm on the BulkSynchronous Parallel (BSP) model of computation, along with an application of this algorithm to dynamic data structures, namely Parallel Priority Queues (PPQs). We show that our methods improve previous results upon both the communication requirements and the amount of parallel slack required to achieve optimal performance. We also establish that optimality to within small multiplicative constant factors can be achieved for a wide range of parallel machines. While these algorithms are fairly simple themselves, descriptions of their performance in terms of the BSP parameters is somewhat involved. The main reward of quantifying these complications is that it allows transportable software to be written for parallel machines that fit the model. We also present experimental results for the selection algorithm that reinforce our claims. 1 Introduction and the BSP Model The main technical contribution of this work is ...
Lazy Queue: A new approach to implementing the Pendingevent Set
"... In discrete event simulation, very often the future event set is represented by a priority queue. The data structure used to implement the queue and the way operations are performed on it are often crucial to the execution time of a simulation. In this paper a new priority queue implementation strat ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
In discrete event simulation, very often the future event set is represented by a priority queue. The data structure used to implement the queue and the way operations are performed on it are often crucial to the execution time of a simulation. In this paper a new priority queue implementation strategy, the Lazy Queue, is presented. It is tailored to handle operations on the pending event set efficiently. The Lazy Queue is a kind of multilist data structure that delays the sorting process until a point near the time where the elements are to be dequeued. In this way, the time needed to sort new elements in the queue is reduced. We have performed several experiments comparing queue access times with the access times of the implicit heap and the calendar queue. Our experimental results indicate that the Lazy Queue is superior to these priority queue implementations. Key words: Discrete Event Simulation, Priority Queue, Event List implementation, performance measurement. 1 Introduction...
Fast Priority Queues for Parallel BranchandBound
 In Workshop on Algorithms for Irregularly Structured Problems, number 980 in LNCS
, 1995
"... . Currently used parallel best first branchandbound algorithms either suffer from contention at a centralized priority queue or can only approximate the best first strategy. Bottleneck free algorithms for parallel priority queues are known but they cannot be implemented very efficiently on contemp ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
. Currently used parallel best first branchandbound algorithms either suffer from contention at a centralized priority queue or can only approximate the best first strategy. Bottleneck free algorithms for parallel priority queues are known but they cannot be implemented very efficiently on contemporary machines. We present quite simple randomized algorithms for parallel priority queues on distributed memory machines. For branchandbound they are asymptotically as efficient as previously known PRAM algorithms with high probability. The simplest versions require not much more communication than the approximated branchandbound algorithm of Karp and Zhang. Keywords: Analysis of randomized algorithms, distributed memory, load balancing, median selection, parallel best first branchandbound, parallel pritority queue. 1 Introduction Branchandbound search is an important technique for many combinatorial optimization problems. Since it can be a quite time consuming technique, paralleli...
Parallel Priority Queues
, 1991
"... This paper introduces the Parallel Priority Queue (PPQ) abstract data type. A PPQ stores a set of integervalued items and provides operations such as insertion of n new items or deletion of the n smallest ones. Algorithms for realizing PPQ operations on an nprocessor CREWPRAM are based on two new ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
This paper introduces the Parallel Priority Queue (PPQ) abstract data type. A PPQ stores a set of integervalued items and provides operations such as insertion of n new items or deletion of the n smallest ones. Algorithms for realizing PPQ operations on an nprocessor CREWPRAM are based on two new data structures, the nBandwidthHeap (nH) and the nBandwidth LeftistHeap (nL), that are obtained as extensions of the well known sequential binaryheap and leftistheap, respectively. Using these structures, it is shown that insertion of n new items in a PPQ of m elements can be performed in parallel time O(h + log n), where h = log m n , while deletion of the n smallest items can be performed in time O(h + log log n). Keywords Data structures, parallel algorithms, analysis of algorithms, heaps, PRAM model. This work has been partly supported by the Ministero della Pubblica Istruzione of Italy and by the C.N.R. project "Sistemi Informatici e Calcolo Parallelo" y Istituto di Ela...
Parallelism and Locality in Priority Queues
 In Sixth IEEE Sypmposium on Parallel and Distributed Processing
, 1994
"... We explore two ways of incorporating parallelism into priority queues. The first is to speed up the execution of individual priority operations so that they can be performed one operation per time step, unlike sequential implementations which require O(log N ) time steps per operation for an N eleme ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
We explore two ways of incorporating parallelism into priority queues. The first is to speed up the execution of individual priority operations so that they can be performed one operation per time step, unlike sequential implementations which require O(log N ) time steps per operation for an N element heap. We give an optimal parallel implementation that uses a linear array of O(log N ) processors. Second, we consider parallel operations on the priority queue. We show that using a ddimensional array (constant d) of P processors we can insert or delete the smallest P elements from a heap in time O(P 1=d log 1\Gamma1=d P ), where the number of elements in the heap is assumed to be polynomial in P . We also show a matching lower bound, based on communication complexity arguments, for a range of deterministic implementations. Finally, using randomization, we show that the time can be reduced to the optimal O(P 1=d ) time with high probability. 1 Introduction Much of the theoret...
Scalable Concurrent Priority Queue Algorithms
 In Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
, 1999
"... This paper addresses the problem of designing bounded range priority queues, that is, queues that support a fixed range of priorities. Bounded range priority queues are fundamental in the design of modern multiprocessor algorithms  from the application level to lowest levels of the operating sy ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
This paper addresses the problem of designing bounded range priority queues, that is, queues that support a fixed range of priorities. Bounded range priority queues are fundamental in the design of modern multiprocessor algorithms  from the application level to lowest levels of the operating system kernel. While most of the available priority queue literature is directed at existing smallscale machines, we chose to evaluate algorithms on a broader concurrency scale using a simulated 256 node shared memory multiprocessor architecture similar to the MIT Alewife. Our empirical evidence suggests that the priority queue algorithms currently available in the literature do not scale. Based on these findings, we present two simple new algorithms, LinearFunnels and FunnelTree, that provide true scalability throughout the concurrency range. 1 Introduction Priority queues are a fundamental class of data structures used in the design of modern multiprocessor algorithms. Their uses r...
The Performance of Concurrent Data Structure Algorithms
 Transactions on Database Systems
, 1994
"... This thesis develops a validated model of concurrent data structure algorithm performance, concentrating on concurrent Btrees. The thesis first develops two analytical tools, which are explained in the next two paragraphs, for the analysis. Yao showed that the space utilization of a Btree built fr ..."
Abstract

Cited by 13 (9 self)
 Add to MetaCart
This thesis develops a validated model of concurrent data structure algorithm performance, concentrating on concurrent Btrees. The thesis first develops two analytical tools, which are explained in the next two paragraphs, for the analysis. Yao showed that the space utilization of a Btree built from random inserts is 69%. Assuming that nodes merge only when empty, we show that the utilization is 39% when the number of insert and delete operations is the same. However, if there are just 5% more inserts than deletes, then the utilization is at least 62%. In addition to the utilization, we calculate the probabilities of splitting and merging, important parameters for calculating concurrent Btree algorithm performance. We compare mergeatempty Btrees with mergeathalf Btrees. We conclude that mergeatempty Btrees have a slightly lower space utilization but a much lower restructuring rate than mergeathalf Btrees, making mergeatempty Btrees preferable for concurrent Btree algo...
Randomized Priority Queues for Fast Parallel Access
 Journal of Parallel and Distributed Computing
, 1997
"... Applications like parallel search or discrete event simulation often assign priority or importance to pieces of work. An effective way to exploit this for parallelization is to use a priority queue data structure for scheduling the work; but a bottleneck free implementation of parallel priority ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Applications like parallel search or discrete event simulation often assign priority or importance to pieces of work. An effective way to exploit this for parallelization is to use a priority queue data structure for scheduling the work; but a bottleneck free implementation of parallel priority queue access by many processors is required to make this approach scalable. We present simple and portable randomized algorithms for parallel priority queues on distributed memory machines with fully distributed storage. Accessing O(n) out of m elements on an nprocessor network with diameter d requires amortized time O with high probability for many network types. On logarithmic diameter networks, the algorithms are as fast as the best previously known EREWPRAM methods. Implementations demonstrate that the approach is already useful for medium scale parallelism.